### This is a use guide to demo the functionality in 0.3.1

In [None]:
from config import basic_conf as conf
from libs import ModelManager as mm
from config.constants import HyperParamKey
import logging

import matplotlib.pyplot as plt
%matplotlib inline

## In order to use the text example please load your home work 1 data like this:

![require_data](img/required_data.png)

## 1. How to init the logger

The first parameter when initializing logger will determine the log level for the rest of the session

In [None]:
# in this example we init logger with level = INFO and see that the info logs get outputted
conf.init_logger(logging.INFO, logfile=None)
logger = logging.getLogger('__main__')
mgr = mm.ModelManager(mode='notebook')

In [None]:
# in this example we init logger with level = WARNING, we see that all the info logs are suppressed
conf.init_logger(logging.WARNING, logfile=None)
logger = logging.getLogger('__main__')
mgr = mm.ModelManager(mode='notebook')

The 2nd parameter in logger controls where the log file lives, by default this is mt.log in root

If you pass None, like here, then no output will be written to the log file

In [None]:
# in this example we init logger the default logfile='mt.log' parameter and see that the log was written to the file
conf.init_logger(logging.INFO)
logger = logging.getLogger('__main__')
mgr = mm.ModelManager(mode='notebook')

In [None]:
!head mt.log

## 2. Initializing the Model Manager in notebook or console mode

Since the tqdm handler works differently in notebook vs console, we had to branch the execution based on where we are working. Here I will initialize in in 'notebook' mode, by default it operates in console mode.

In [None]:
# let me first turn off the logging to mt.log
conf.init_logger(logging.INFO, logfile=None)

In [None]:
# this line inits the ModelManager, and puts all of the default parameters into memory
# mode='notebook' is passed to initialize in notebook mode, if nothing is passed then, it goes in console mode
mgr = mm.ModelManager(mode='notebook')

### There are 3 types of parameters
All of these parameters are passed into the Model constructor when creating a new_model

- **Hyperparameters** (self.hparam): the standard hyperparameters used in your model, such as batch_size, embedding_dim, lr ... etc. (The keys are in config.constants.HyperParamKey)
- **Control Parameters** (self.cparams): controls how the model in/out behaves, these will include paths, and controls for when the model runs the save routine, whether to save each new best model, or save each epoch, or both. (The keys are in config.constants.PathKey and ControlKey)
- **Loader Parameters** (self.lparams): these are parameters that the model constructor will need to know to initialize, so far I found only the act_vocab_size was needed as when running the loader, even with a specified vocab size, the actual vocab size that gets loaded is dependent on what is in the data. Therefore these parameters can only be set after running the load routine. I've specified for the load routines to return a dict of key:param so that this variable can be set. (The keys are in config.constants.LoaderParamKey)

### 3. Calling the data loader

There is a register at mm.loaderRegister. It will contain a loader_list of available loaders and attributes with the names of the available loaders.

In [None]:
print(mm.loaderRegister.loader_list)

To call the load function using a specifict loader, just give the ModelManager.load_data function the registry key for the loader:

In [None]:
mgr.load_data(mm.loaderRegister.IMDB)

## 3b) Adding new dataloader handlers

- add the implementation child class of libs.data_loaders.BaseLoader to libs.data_loaders
- add the registry key to libs.data_loaders.registry

A example dataloader used for homework 1 is provided in libs.data_loaders.ImdbLoader

## 4. Initializing Models

New models are initialized with the ModelManager.new_model function, the parameter is the key in the model registry. All of the parameters on the ModelManager is then passed to the Model constructor

In [None]:
print(mm.modelRegister.model_list)

In [None]:
mgr.new_model(mm.modelRegister.BagOfWords)

## ** Important generalized concepts on Models:

Each epoch is subdivided into "iteration" with each iteration as a collection of n batches. n is determined by the self.hparams.train_loop_check_freq parameter.

The check_early_stop function is called for each iteration. 

The BaseModel class implements the basic training loop as well as save and load functions, save/load function folder is determined by the BaseModel.label property

Each model will have 2 training curves: self.iter_curves is saved each training "iteration". self.epoch_curves is saved at each epoch

the self.model property on the BaseModel or child model class should point to the graph (nn.Module) that implements forward and backward passing

The child model class (in the example case: BagOfWords) should implement:
- eval_model() which is a forward pass but in eval mode (no drop out, no autograd)
- check_early_stop() which is called at each training iteration




## 5. Running the training loop

In [None]:
# here our hyperparameter for number of epoch was set only to 1, so the loop runs just 1 epoch
mgr.train()

since we have enabled saving at each epoch and each best, we should find 2 saved files:
- model_saves/checkpoint.tar - saved after the last **completed** epoch
- model_saves/model_best.tar - saved after a new best iteration was achieved (best on val acc)

## 6. Saving and Loading

In [None]:
# we can also explicitly save the model with a comment in markdown format, 
# and can override the default filename = 'checkpoint.tar'
comment = """
## Trial 1

I ran the basic training model for 1 iteration and wanted to demonstrate the save functionality

**hope you all find this useful!**
"""

mgr.save_model(comment, fn='mysave.tar')

This saves the model to model_saves/<model_label>/ and adds a README.md with the comment in that folder - once checked in, the readme.md will be automatically rendered on github.

In this case, my label was the default "scratch" so the model was saved to model_saves/scratch/

### 6b Loading

If I wanted to continue to train that model, I can load it like this (first we'll clear the model)

In [None]:
mgr.model = None  # gc-ing the existing model

# reinit
mgr.new_model(mm.modelRegister.BagOfWords, nolog=True)

# loading state
mgr.load_model(which_model='mysave.tar')

### 6c. Continuing Training

We can add more epochs to the model and continue to train it:

**Note that the model continues where it left off in epoch 2!**

In [None]:
mgr.model.add_epochs(2)

In [None]:
mgr.train()

## 7. Visualizing Training Curves

I implemented a graph_training_curves() method just to see the training/val acc and loss in a graph

If the ModelManager is in notebook mode, it will output results here, if console it will output results to the model folder: model_saves/<model_label>/

It can graph either the iteration curves or the epoch curves: see below

In [None]:
mgr.graph_training_curves()  # no mode param defaults to iteration mode

In [None]:
mgr.graph_training_curves(mode=mgr.GRAPH_MODE_EPOCH)

## 8. Reporting other training metrics back up to ModelManager

models should collect their own results values in self.output_dict in {'field': value} format.

After **each** mgr.train() call the ModelManager will collect these outputs

you can use these results to do your hyperparameter optimization

In [None]:
mgr.get_results()

## 9. Comprehensive example of training multiple models and collecting results:

In [None]:
# resetting the manager
mgr = mm.ModelManager(mode='notebook')

In [None]:
# simple grid search
lr_list = [0.1, 0.01]
epoch_list = [1, 2]

for lr in lr_list:
    for epoch in epoch_list:
        # update hparams
        hparam_overrides = {HyperParamKey.LR:lr, HyperParamKey.NUM_EPOCH:epoch}
        mgr.hparams.update(hparam_overrides)
        
        # load data is needed in the loop if the data loaded is dependant on hparams, otherwise can put outside
        mgr.load_data(mm.loaderRegister.IMDB)  
        
        # reinits a model with
        mgr.new_model(mm.modelRegister.BagOfWords)
        
        # trains and collects results
        mgr.train()
        
        # free up memory
        mgr.dump_model()  
        
mgr.get_results()

In [None]:
# can save the results
mgr.get_results().to_csv('model_saves/results.csv')