# I Know What You Did Last Summer
## Experiment Tracking Tools for Data Science

> Sarah Braden

> Phoenix Data Science Meetup

> 13 September 2016


<img style="float: center;" src="img/got_great_result.jpg">

### What

* Two Python libraries: Sacred and Sumatra
* Why use an experiment tracking system?
* How to get started with each library
* Discussion of differences


### Why

* Reproducibility of experiments is critical to all science, including data science.
* Reduce errors in recording inputs, parameters, and outcomes.
* Automating experiment tracking makes it easier: Work smarter, not harder
* “I really need an experiment tracker, but I want to roll my own experiment tracking system. It needs to be specific for my needs.” Why reinvent the wheel?

### What should an experiment manager be?

What are the needs of a data scientist?

* organize
* log all parameters
* reproduce experiments

<img style="float: center;" src="img/dostupid.jpg">

### What is the best thing about both automated experiment trackers?

<img style="float: center;" src="img/record_all_the_experiments.jpg">

# Sumatra

# Sumatra

* Documentation: 
    * https://pythonhosted.org/Sumatra/
    * https://pypi.python.org/pypi/Sumatra
* Github: https://github.com/open-research/sumatra
* Licence: 2-clause BSD
* Started in 2009
* supports Python versions 2.6, 2.7, 3.4 or 3.5. 

# Sumatra Features
* Sumatra is a command line tool
* Uses SQLite db by default, option to use PostgreSQL
* Has a web interface
* Creates directories and files in the repository
* Sumatra requires that you keep your own code in a version control system (currently Subversion, Mercurial, Git and Bazaar are supported).
* By default Sumatra will refuse to run until you have committed your changes.
* a Python API

# How to install Sumatra

The web interface requires Django (>= 1.6) and the django-tagging package (Installed automatically if you pip install Sumatra).

Install directly from the Python Package Index: (version 0.7.4)
    
    pip install gitpython
    pip install sumatra

If you have downloaded the source package, Sumatra-0.7.0.tar.gz:

    tar xzf Sumatra-0.7.0.tar.gz
    cd Sumatra-0.7.0
    python setup.py install



# Getting started

* The command-line interface is called `smt`
* The web interface with a built-in web-server is called `smtweb`
* The CLI is configurable
* type `smt help` to see a list of commands

Create a new Sumatra project in this directory using the `smt init` command:

    cd project_git_repo_directory
    smt init my_project

This creates a sub-directory named `.smt` and a directory called `Data`. If you place output files generated by your script into the `Data` directory, Sumatra will record them automatically.

Run a model/experiment/simulation by executing on the command line:

    smt run --executable=python --main=imdb_cnn_sumatra.py
    
After the run is complete, leave a comment and a tag:
    
    smt comment "final result: loss: 0.2893 - acc: 0.8799 - val_loss: 0.2843 - val_acc: 0.8848"
    smt tag example

Look at the recording of the run in the command-line:

    smt list --long

In [None]:
--------------------------------------------------------------------------------
Label            : 20160913-134734
Timestamp        : 2016-09-13 13:47:34.921770
Reason           : 
Outcome          : final result: loss: 0.2893 - acc: 0.8799 - val_loss: 0.2843 - val_acc: 0.8848
Duration         : 162.372911215
Repository       : GitRepository at /Users/sbraden/workspace/scarecrow (upstream:
                 : git@git.myserver.com:sarah/scarecrow.git)
Main_File        : imdb_cnn_sumatra.py
Version          : 48581a7e48ed148e8327b3f38a4c42e8e858640a
Script_Arguments : 
Executable       : Python (version: 2.7.10) at /Users/sbraden/.venvs/sumatra_v2/bin/python
Parameters       : 
Input_Data       : []
Launch_Mode      : serial
Output_Data      : [new_JSONData.json(da39a3ee5e6b4b0d3255bfef95601890afd80709 [2016-09-13
                 : 13:50:22]), new_weights.hdf5(fc30d30f96d7db7e744db3a462687d6bac0244be
                 : [2016-09-13 13:50:22])]
User             : Sarah Braden <sarah.braden@myemail.com>
Tags             : example
Repeats          : None

Add a reason and custom label at runtime:

    smt run --executable=python --main=imdb_cnn_sumatra.py --reason="showing what sumatra can do" --label=second_run

In [None]:
--------------------------------------------------------------------------------
Label            : second_run
Timestamp        : 2016-09-13 14:47:08.509532
Reason           : showing what sumatra can do
Outcome          : final result: loss: 0.2893 - acc: 0.8799 - val_loss: 0.2843 - val_acc: 0.8848
Duration         : 155.247848034
Repository       : GitRepository at /Users/sbraden/workspace/scarecrow (upstream:
                 : git@git.myserver.com:sarah/scarecrow.git)
Main_File        : imdb_cnn_sumatra.py
Version          : 48581a7e48ed148e8327b3f38a4c42e8e858640a
Script_Arguments : 
Executable       : Python (version: 2.7.10) at /Users/sbraden/.venvs/sumatra_v2/bin/python
Parameters       : 
Input_Data       : []
Launch_Mode      : serial
Output_Data      : [new_JSONData.json(da39a3ee5e6b4b0d3255bfef95601890afd80709 [2016-09-13
                 : 14:49:48]), new_weights.hdf5(1a161ab87d394fd613512a1149d3f3c05f2e15f5
                 : [2016-09-13 14:49:48])]
User             : Sarah Braden <sarah.braden@myemail.com>
Tags             : example
Repeats          : None

## Repeat
(It's neat!)

The repeat command re-runs a previous simulation, and checks that the output is identical to that of the original run:

    smt repeat labelname

You can run rerun your code by specifying previous version (not the currently checked-out version):

    smt run --version=3e6f02a

# Input Data

If you want Sumatra to record input variables you have two choices:

1. your script or executable reads data from standard input
2. the names of the input data files are given in the command-line invocation of your program

A list of formats (including yaml and json) supported for parameter files is at: 
https://pythonhosted.org/Sumatra/parameter_files.html

Run your script specifying the parameter file(s):

    smt run --executable=python --main=imdb_cnn_sumatra.py parameters.yaml

# Output Datafiles

1. Each run will be given a separate, unique label by Sumatra (by default, based on the current date and time)
2. After changing the config in Sumatra and editing your Python code, the Python script can read this label from the command line and uses it to create a unique subdirectory into which it saves the output data
3. Sumatra knows to look only in this directory for files associated with the given run, so there is no chance of mixing up data from different runs.

The ugly truth: if you don't do this, Sumatra may crash.

Tell Sumatra to append the run label to the end of the command line when you run a script:

    smt configure --addlabel=cmdline

Just a note:

Sumatra's default data subdirectory is “Data” of the working directory. You can change that by using:

    smt configure --datapath /path/to/data

In [None]:
config = model.get_config()  # defines the model

label = sys.argv[-1]  # Sumatra appends the label to the command line
subdir = os.path.join("Data", label)

if not os.path.exists(subdir):
    os.makedirs(subdir)

with open(os.path.join(subdir, 'new_JSONData.json'), 'w') as f:
    json.dumps(config, f)

# Starting the smtweb interface

Before using the web interface, create a Sumatra project!

In your project directory run:

    smtweb

This will launch a simple web server that listens on port 8000, and will  open a new tab in your browser at http://127.0.0.1:8000/.

Specify a different port with the -p option to smtweb:

    smtweb -p 8001

If you are using a single record store for multiple projects, you can run smtweb from anywhere and specify the location of the record store on the command line:

    smtweb ~/sumatra.db

<img style="float: center;" src="img/record_view.png">

<img style="float: center;" src="img/specific_run.png">

<img style="float: center;" src="img/platform.png">

<img style="float: center;" src="img/standard_out.png">

# Sacred

# Sacred
<img style="float: center;" src="img/Monty_Python_Series.png">

> Every experiment is sacred

> Every experiment is great

> If an experiment is wasted

> God gets quite irate

# Sacred

* Documentation: https://pypi.python.org/pypi/sacred
* Github: https://github.com/IDSIA/sacred
* License: MIT
* Started in 2014

# Sacred Features
* Sacred has a command-line interface, but it is not a command line tool like Sumatra
* Uses MongoDB only
* Does Sacred have a cool web interface like Sumatra? No.
* Unlike Sumatra, it does not have an option to force a commit before running
* Automatic seeding helps controlling the randomness in your experiments, such that the results remain reproducible.

# How to Install Sacred

Install it from the Python Package Index (version 0.6.10):

    pip install sacred

Install manually:
    
    git clone https://github.com/IDSIA/sacred.git
    cd sacred
    python setup.py install

Recommended:

    pip install numpy pymongo pandas

# Getting Started

In [None]:
from sacred import Experiment  # central class of the Sacred framework
from sacred.observers import MongoObserver  # MongoObserver adds to db


ex = Experiment('example_experiment')

ex.observers.append(MongoObserver.create(
        url='127.0.0.1:27017',
        db_name='sacred'
    )
)

@ex.config
def my_config():
    recipient = "world"
    message = "Hello %s!" % recipient

@ex.automain
def my_main(message):
    print(message)

## Getting Started: Output
<img style="float: center;" src="img/level1.png">

# Observers

Experiments in Sacred collect lots of information about their runs:

* time it was started and time it stopped
* stdout/stderr
* the used configuration
* the result or any errors that occurred
* basic information about the machine it runs on
* packages the experiment depends on and their versions
* all imported local source-files
* files opened with `ex.open_resource`
* files added with `ex.add_artifact`
* custom info

# Looking at the data
Let's use a combination of jupyter notebook, pymongo, and pandas!

In [1]:
import pymongo
import pandas as pd

In [2]:
connection = pymongo.MongoClient('127.0.0.1', 27017)
connection.database_names()

[u'local', u'sacred']

In [3]:
db = connection['sacred']
db.collection_names()

[u'default.runs', u'default.files', u'default.chunks']

In [4]:
collection = db.runs
cursor = db.default.runs.find()
# Expand the cursor and construct the DataFrame
df = pd.DataFrame(list(cursor))

In [5]:
df.sort_values('heartbeat')

Unnamed: 0,_id,artifacts,captured_out,comment,config,experiment,heartbeat,host,info,resources,result,start_time,status,stop_time
0,57d7473bc5ebfc2e373559fd,[],Hello world!\n,,"{u'message': u'Hello world!', u'seed': 9448817...","{u'doc': None, u'dependencies': [[u'numpy', u'...",2016-09-12 17:24:27.588,"{u'hostname': u'sake', u'python_version': u'2....",{},[],,2016-09-12 17:24:27.523,COMPLETED,2016-09-12 17:24:27.589
1,57d74873c5ebfc2f8af90772,[],Hello world!\n,,"{u'message': u'Hello world!', u'seed': 3129688...","{u'doc': None, u'dependencies': [[u'numpy', u'...",2016-09-12 17:29:39.543,"{u'hostname': u'sake', u'python_version': u'2....",{},[],,2016-09-12 17:29:39.541,COMPLETED,2016-09-12 17:29:39.544
2,57d75045c5ebfc32c2c7c3f2,[],Hello world!\n,,"{u'message': u'Hello world!', u'seed': 1079323...","{u'doc': None, u'dependencies': [[u'numpy', u'...",2016-09-12 18:03:01.083,"{u'hostname': u'sake', u'python_version': u'2....",{},[],,2016-09-12 18:03:01.077,COMPLETED,2016-09-12 18:03:01.084
3,57d750f7c5ebfc32e5a1da06,[],Hello what's up?!\n,,"{u'message': u'Hello what's up?!', u'seed': 59...","{u'doc': None, u'dependencies': [[u'numpy', u'...",2016-09-12 18:05:59.009,"{u'hostname': u'sake', u'python_version': u'2....",{},[],,2016-09-12 18:05:59.006,COMPLETED,2016-09-12 18:05:59.009


In [None]:
df.iloc[15]['config’]

{
    u'batch_size': 32,
    u'embedding_dims': 50,
    u'filter_length': 3,
    u'hidden_dims': 250,
    u'max_features': 5000,
    u'maxlen': 400,
    u'nb_epoch': 2,
    u'nb_filter': 250,
    u'seed': 264108952
}

In [None]:
df.iloc[15]['experiment']

{
    u'dependencies': [
        [u'keras', u'1.0.4'],
        [u'numpy', u'1.11.1'],
        [u'sacred', u'0.6.10']
    ],
    u'doc': None,
    u'name': u'hello_config',
    u'sources': [
        [
            u'/Users/sbraden/workspace/scarecrow/keras_examples/modified/custom_sacred_observer.py',
            u'e683e2a72e5a800aeb9b7d183f136b05'
        ],
        [
            u'/Users/sbraden/workspace/scarecrow/keras_examples/modified/imdb_cnn_sacred.py',
            u'5fa465229ae6d898ca661155ef58861a'
        ]
    ]
}

In [None]:
df.iloc[15]['host']

{
    u'cpu': u'Intel(R) Core(TM) i7-5650U CPU @ 2.20GHz',
    u'cpu_count': 4,
    u'hostname': u'sangria.local',
    u'os': u'Darwin',
    u'os_info': u'Darwin-15.6.0-x86_64-i386-64bit',
    u'python_compiler': u'GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)',
    u'python_version': u'2.7.10'
}


# Capturing stdout / stderr

By default Sacred captures everything that is written to sys.stdout and sys.stderr and transmits that information to the observers.

### But what about the progress bars?

<img style="float: center;" src="img/progress_bar.png">

```
u'Loading data...\n20000 train sequences\n5000 test sequences\nPad sequences (samples x time)\nX_train shape: (20000, 400)\nX_test shape: (5000, 400)\nBuild model...\nTrain on 20000 samples, validate on 5000 samples\nEpoch 1/2\n\r   32/20000 [..............................] - ETA: 82s - loss: 0.6894 - acc: 0.6250\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r   64/20000 [..............................] - ETA: 79s - loss: 0.6894 - acc: 0.5938\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r   96/20000 [..............................] - ETA: 77s - loss: 0.6995 - acc: 0.5208\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r  128/20000 [..............................] - ETA: 77s - loss: 0.6912 - acc: 0.5625\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r  160/20000 [..............................] - ETA: 76s - loss: 0.6929 - acc: 0.5500\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r  192/20000 [..............................] - ETA: 75s - loss: 0.6927 - acc: 0.5469\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r  224/20000 [..............................] - ETA: 75s - loss: 0.6984 - acc: 0.5089\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r  256/20000 [..............................] - ETA: 74s - loss: 0.6998 - acc: 0.4961\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08
```

<img style="float: center;" src="img/bad.jpg">

## Fix for the Progress Bar Issue

In [None]:
from sacred import Experiment  # central class of the Sacred framework
from sacred.observers import MongoObserver  # MongoObserver adds to db
from sacred.utils import apply_backspaces_and_linefeeds

ex = Experiment('example_experiment')

ex.captured_out_filter = apply_backspaces_and_linefeeds

ex.observers.append(MongoObserver.create(
        url='127.0.0.1:27017',
        db_name='sacred'
    )
)

@ex.config
def my_config():
    recipient = "world"
    message = "Hello %s!" % recipient

@ex.automain
def my_main(message):
    print(message)

captured_out now looks like:

u'Loading data...\n20000 train sequences\n5000 test sequences\nPad sequences (samples x time)\nX_train shape: (20000, 400)\nX_test shape: (5000, 400)\nBuild model...\nTrain on 20000 samples, validate on 5000 samples\nEpoch 1/2\n20000/20000 [==============================] - 82s - loss: 0.4559 - acc: 0.7700 - val_loss: 0.3096 - val_acc: 0.8732\nEpoch 2/2\n20000/20000 [==============================] - 79s - loss: 0.2893 - acc: 0.8799 - val_loss: 0.2843 - val_acc: 0.8848\n'

# Customization
Sometimes you want to add custom information about the run of an experiment, like the dataset, error curves during training, or the final trained model.

Three different mechanisms:
* Info Dictionary
* Resource File
* Artifact File

## Info Dictionary

Use: store small amounts of information about the experiment, like training loss for each epoch.
 
* updated on each heartbeat
* content is accessible in the database already during runtime
* easy to add

In [None]:
ex.info['name_of_dictionary'] = results.history  # dictionary with history of neural network run.

In [None]:
# When looking at the database in jupyter notebook:
df.iloc[15]['info']

In [None]:
{
    u'history': {
        u'acc': [0.77, 0.8799],
        u'loss': [0.4558811149120331, 0.28932980523109436],
        u'val_acc': [0.8732, 0.8848],
        u'val_loss': [0.3095988413333893, 0.28427145659923553]
    }
}

## Resource Files

Use: A file that your experiment needs to read during a run. 

When you open a file using `ex.open_resource(filename)` then a `resource_event` will be fired and the `MongoObserver` will check whether that file is in the database already. If not it will store it there. The filename along with its MD5 hash is logged.

## Artifact Files

Use: A file created during the run. Used for storing big custom chunks of data like a trained model. 

With `ex.add_artifact(filename)` such a file can be added, which will fire an artifact_event. The `MongoObserver` will store that file in the database and log it in the run entry.

# Caveats

By default, Sacred experiments will fail if run in an interactive environment like a REPL or a Jupyter Notebook.

Only variables that are JSON serializable (i.e. a numbers, strings, lists, tuples, dictionaries) become part of the configuration. Other variables are ignored.

For running from the command line to work the automain function needs to be at the end of the file. Otherwise everything below it is not defined yet when the experiment is run.

You can only store information in info that is JSON-serializable and contains only valid python identifiers as keys in dictionaries. Otherwise the Observer might not be able to store it in the Database and crash. If the info dict contains numpy arrays or pandas Series/DataFrame/Panel then these will be converted to json automatically. The result is human readable (nested lists for numpy and a dict for pandas). Note that this process looses information about the precise datatypes (e.g. uint8 will be just int afterwards).

## Which manager did I chose in the end?

After testing out both libraries I chose Sacred over Sumatra.

# Pros and Cons

Sumatra Pros:
* great if you like CLIs
* smtweb is pretty awesome
* forces you to commit changes to your repo before running
* the repeat command (does Sacred have an equivalent?)
* Latex package for academics: see https://pythonhosted.org/Sumatra/publishing.html

Sumatra Cons:
* if you don't specify different directories for output files from each run, Sumatra will eventually crash
* extra work to set up for teams
* makes files in the git repo (just .gitignore them)

# Pros and Cons

Sacred Pros:
* faster to get started
* lightweight compared to Sumatra
* easy to set up for teams: just point at the right MongoDB server

Sacred Cons:
* no web server, but there is a github project for one: https://github.com/Qwlouse/prophet
* doesn't care about your repository version

### Final Slide

* Help develop these open source projects!
* Questions? Twitter: @ifmoonwascookie

### Further Reading
Davison A.P. (2012) Automated capture of experiment context for easier reproducibility in computational research. Computing in Science and Engineering 14: 48-56.

I made this presentation using RISE:

https://github.com/damianavila/RISE