# Deep learning framework example: Clickbate dataset

**Important**: Please see the [MNIST notebook example](https://github.com/plandes/deeplearn/blob/master/notebook/mnist.ipynb) in the [zensols.deeplearn](https://github.com/plandes/deeplearn) API first, as it contains more explaination of how the framework is used.

See the [saved notebook](https://htmlpreview.github.io/?https://github.com/plandes/deepnlp/blob/master/example/clickbate/notebook/clickbate.html) to see the output of this example.

In [None]:
# environemnt configuration and set up: add this (deepnlp) library to the Python path and framework entry point
from mngfac import JupyterManagerFactory
fac = JupyterManagerFactory()
mng = fac()

## Print information about 

Use the factory to create the model executor.  The `write` method gives statistics on the data set that is configured on the executor.  Note that the first time this runs, the framework automatically downloads the corpus, vectorizers and creates batches for quick experimentation.

In [None]:
# read the configuration from glove.conf using the same command line process to load the config and models
facade = mng.create_facade('glove_50')
mng.write()

## Baseline model

Clear language attributes for a baseline to get a feel for where they need to be before changing features.  Start with Glove 50 dimensional word embeddings (set in the last cell).

In [None]:
# remove language features for a baseline, then add back later
facade.language_attributes = set()
mng.run()
facade.persist_result()

# Tune parameters

Try a lower learning rate for more epochs to see if it improves performance.  Over estimating the epoch count is hedged by the model only saving on validation loss decreases.

In [None]:
default_lr = facade.learning_rate
default_epochs = facade.epochs
facade.learning_rate = facade.learning_rate - (facade.learning_rate/10)
facade.epochs = 50
mng.run()

## Add language features

Since adjusting the learning rate didn't show a significant positive change, restore the previous learning rate.  Instead, we'll try to add spaCy generated language features by appending them to the embedding layer.  The enumerations are one-hot encoded vectors of POS tags, NER entities and dependenccy parent/child relationship.

In [None]:
facade.learning_rate = default_lr
facade.epochs = default_epochs
facade.language_attributes = {'enums'}
mng.run()
facade.persist_result()

## More linguistic features

Add the syntactic dependency head parent/child relationship as a feature.

In [None]:
facade.language_attributes = {'enums', 'dependencies'}
mng.run()
facade.persist_result()

## Try fasttext news embeddings

Note that we can experiment by setting the embedings directly in the old facade.  However, recreating the facade is usually better to capture the proper set up by starting fresh.  All configuration is reset and reloaded so the language features are added back automatically.  Another advantage to using the `create_facade` method is that all random state is reset for consistency of each new test.

In [None]:
# fasttext embeddings converge faster so lower the epoch count
facade.epochs = 25
facade = mng.create_facade('fasttext_news_300')
mng.run()
facade.persist_result()

## Compare results

Generate a dataframe with the performance metrics of the previous runs.

In [None]:
from zensols.deeplearn.result import ModelResultManager, ModelResultReporter
rm: ModelResultManager = facade.result_manager
reporter = ModelResultReporter(rm, include_validation=False)
reporter.dataframe.drop(columns=['file'])