# Keep track of your BIGML models with storemagic and pickle
The Jupyter extension storemagic allows you to store python objects in a datastore

https://ipython.readthedocs.io/en/stable/config/extensions/storemagic.html

This notebook will present a methology that can help you
* keeping track of your models between files
* avoiding needlessly rebuilding existing models

In [129]:
from pprint import pprint

## Here is an exemple of the data structure you could use.

Save the filenames of the fulltrain and test sets

In [130]:
project_data={}
project_data['fulltrain file']='full.csv'
project_data['test file']='test.csv'

Save the dataset ids after creation in bigml.

In [131]:
project_data['bigml fulltrain ds']='dataset/156755'
project_data['bigml test ds']='dataset/156755'

After splitting in BigML, save the id of the training and validation data sets.

In [132]:
project_data['models']=[]
project_data['models'].append({'model type':'ensemble'})
project_data['models'][0]['bigml trainind ds'] = 'dataset/123456'
project_data['models'][0]['bigml validation ds'] = 'dataset/234567'
project_data['models'][0]['name'] = 'my ensemble'

project_data['models'].append({'model type':'deepnet'})
project_data['models'][1]['bigml trainind ds'] = 'dataset/123456'
project_data['models'][1]['bigml validation ds'] = 'dataset/234567'
project_data['models'][1]['name'] = 'my deepnet'
pprint(project_data)

{'bigml fulltrain ds': 'dataset/156755',
 'bigml test ds': 'dataset/156755',
 'fulltrain file': 'full.csv',
 'models': [{'bigml trainind ds': 'dataset/123456',
             'bigml validation ds': 'dataset/234567',
             'model type': 'ensemble',
             'name': 'my ensemble'},
            {'bigml trainind ds': 'dataset/123456',
             'bigml validation ds': 'dataset/234567',
             'model type': 'deepnet',
             'name': 'my deepnet'}],
 'test file': 'test.csv'}


Add comment on your model.

In [133]:
project_data['models'][0]['comment'] = 'This is a ensemble model'

## Save your data

In [134]:
%store project_data

Stored 'project_data' (dict)


Let say you do your training in an other file

In [135]:
project_data={}
project_data

{}

## Get your data back

In [136]:
%store -r project_data

You get back every information necessary for your project.

In [137]:
pprint(project_data)

{'bigml fulltrain ds': 'dataset/156755',
 'bigml test ds': 'dataset/156755',
 'fulltrain file': 'full.csv',
 'models': [{'bigml trainind ds': 'dataset/123456',
             'bigml validation ds': 'dataset/234567',
             'comment': 'This is a ensemble model',
             'model type': 'ensemble',
             'name': 'my ensemble'},
            {'bigml trainind ds': 'dataset/123456',
             'bigml validation ds': 'dataset/234567',
             'model type': 'deepnet',
             'name': 'my deepnet'}],
 'test file': 'test.csv'}


Save the id of your model.

In [138]:
project_data['models'][0]['bigml model'] = 'ensemble/213456'

Add a new model.

In [139]:
project_data['models'].append({'model type':'ensemble'})
project_data['models'][-1]['name'] = 'my ensemble 2'

Train all ensemble models not already trained.

In [140]:
%store -r project_data
ensembles = [model for model in project_data['models'] if model['model type'] == 'ensemble']
for model in ensembles:
    if 'bigml model' not in model:
        #train your ensemble
        model['bigml model'] = 'ensemble/456789'
%store project_data

Stored 'project_data' (dict)


In [141]:
pprint(project_data['models'])

[{'bigml model': 'ensemble/213456',
  'bigml trainind ds': 'dataset/123456',
  'bigml validation ds': 'dataset/234567',
  'comment': 'This is a ensemble model',
  'model type': 'ensemble',
  'name': 'my ensemble'},
 {'bigml trainind ds': 'dataset/123456',
  'bigml validation ds': 'dataset/234567',
  'model type': 'deepnet',
  'name': 'my deepnet'},
 {'bigml model': 'ensemble/456789',
  'model type': 'ensemble',
  'name': 'my ensemble 2'}]


Do a batch prediction for all your model if none exist yet.

In [142]:
%store -r project_data
for model in project_data['models']:
    if 'prediction file' not in model:
        #do batch prediction on model['bigml model'] with model['bigml validation ds']
        model['valid batchpred'] = 'batchprediction/54521238'
        model['valid batchpred file'] = 'valid-prediction-' + model['name'] + '.csv'
%store project_data

Stored 'project_data' (dict)


In [143]:
pprint(project_data)

{'bigml fulltrain ds': 'dataset/156755',
 'bigml test ds': 'dataset/156755',
 'fulltrain file': 'full.csv',
 'models': [{'bigml model': 'ensemble/213456',
             'bigml trainind ds': 'dataset/123456',
             'bigml validation ds': 'dataset/234567',
             'comment': 'This is a ensemble model',
             'model type': 'ensemble',
             'name': 'my ensemble',
             'valid batchpred': 'batchprediction/54521238',
             'valid batchpred file': 'valid-prediction-my ensemble.csv'},
            {'bigml trainind ds': 'dataset/123456',
             'bigml validation ds': 'dataset/234567',
             'model type': 'deepnet',
             'name': 'my deepnet',
             'valid batchpred': 'batchprediction/54521238',
             'valid batchpred file': 'valid-prediction-my deepnet.csv'},
            {'bigml model': 'ensemble/456789',
             'model type': 'ensemble',
             'name': 'my ensemble 2',
             'valid batchpred': 'batchpred

## Detecting changes
If you change your features and upload new data sets, your model and prediction need to be refreshed.

Here is an exemple with the refresh of a batch prediction:

```
%store -r project_data
for model in project_data['models']
    if 'valid batchpred' not in model:
        do_prediction = True
    else:
        batch_prediction = api.get_batch_prediction(model['valid batchpred'])
        model_changed = batch_prediction['object']['ensemble'] != model['bigml model']
        do_prediction = model_changed

    if do_prediction:
        api.delete_batch_prediction(batch_prediction)
        batch_prediction = api.create_batch_prediction(model['bigml model'], model['bigml validation ds'])
        model['valid batchpred']=batch_prediction['resource']
%store project_data
```

If the batch prediction doesn't exist or if your model has changed, we delete the previous batch prediction and redo a batch prediction.

## Other storemagic commands

List all stored variables
```
%store -z
```
Load all variables
```
%store -r
```
Remove your variable from the datastore
```
%store -d project_data
```

## Store your data in a file with pickle

Storemagic is local to your notebook server.

If you need to exchange data between several Jupyter servers you can use pickle.

In [144]:
from pickle import load, dump

Generate a file name for your project.
You will store the information of your models in that file.

In [145]:
project = 'gmsc'
version = '1.1'
jar_filename = project + '-' + version + '-picklejar'

In [146]:
jar_filename

'gmsc-1.1-picklejar'

In [147]:
with open(jar_filename, 'wb') as file:
    dump(project_data,file)

In [148]:
project_data={}
project_data

{}

In [149]:
with open(jar_filename, 'rb') as file:
    project_data = load(file)

In [150]:
project_data

{'fulltrain file': 'full.csv',
 'test file': 'test.csv',
 'bigml fulltrain ds': 'dataset/156755',
 'bigml test ds': 'dataset/156755',
 'models': [{'model type': 'ensemble',
   'bigml trainind ds': 'dataset/123456',
   'bigml validation ds': 'dataset/234567',
   'name': 'my ensemble',
   'comment': 'This is a ensemble model',
   'bigml model': 'ensemble/213456',
   'valid batchpred': 'batchprediction/54521238',
   'valid batchpred file': 'valid-prediction-my ensemble.csv'},
  {'model type': 'deepnet',
   'bigml trainind ds': 'dataset/123456',
   'bigml validation ds': 'dataset/234567',
   'name': 'my deepnet',
   'valid batchpred': 'batchprediction/54521238',
   'valid batchpred file': 'valid-prediction-my deepnet.csv'},
  {'model type': 'ensemble',
   'name': 'my ensemble 2',
   'bigml model': 'ensemble/456789',
   'valid batchpred': 'batchprediction/54521238',
   'valid batchpred file': 'valid-prediction-my ensemble 2.csv'}]}