# Lab Notebook: MNIST with PyTorch

This notebook demonstrates how to use labnotebook for MNIST in PyTorch. 

In [1]:
import labnotebook

import numpy as np

import torch.utils
from torchvision import datasets, transforms
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import SGD
from torch.autograd import Variable

## Data Setup

First, let's make our train and test `DataLoader` objects using the build in MNIST dataset. We're going to keep *all* our parameters in a dictionary, `model_desc`, so we can easily pass it to `labnotebook`.

In [2]:
model_desc = {'batch_size': 128,  #  train and test batch size
             'n_filters1': 32,   #  size of first conv layer
             'n_filters2': 32,   #  size of second conv layer
             'n_fc': 32,        #  size of fully connected layer
             'dropout': False,   #  wether to use dropout or not
             'n_epochs': 5,     #  number of epochs to train for
             'lr': 0.001}        #  learning rate

In [3]:
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('data', train=True, download=True, transform=transforms.ToTensor()),
    batch_size=model_desc['batch_size'],
    shuffle=True)


test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('data', train=False, download=True, transform=transforms.ToTensor()),
    batch_size=model_desc['batch_size'],
    shuffle=True)

## Model Setup

We specifiy a traditional convnet with two conv layers, max pool, dropout, the usual. As above, we keep our architecture parameters in the `model_desc` dictionary:

In [4]:
class MnistModel(nn.Module):
    def __init__(self,
                n_filters1=32,
                n_filters2=32,
                n_fc=128,
                dropout=False):
    
        super(MnistModel, self).__init__()

        self.n_filters1 = n_filters1
        self.n_filters2 = n_filters2
        self.n_fc = n_fc
        self.dropout = dropout

        self.conv1 = nn.Conv2d(1, self.n_filters1, 5, padding=2)
        self.conv2 = nn.Conv2d(self.n_filters1, self.n_filters2, 5, padding=2)
        self.fc1 = nn.Linear(self.n_filters2*7*7, self.n_fc)
        self.fc2 = nn.Linear(self.n_fc, 10)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), 2)
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.n_filters2*7*7)   # reshape Variable
        x = F.relu(self.fc1(x))
        if self.dropout: x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x)

In [5]:
model = MnistModel(
    n_filters1=model_desc['n_filters1'],
    n_filters2=model_desc['n_filters2'],
    n_fc=model_desc['n_fc'],
    dropout=model_desc['dropout'])
model = model.cuda() # optional

## LabNotbook Setup

Just like in `basic_usage.ipynb`, the first step is to initialize the package by providing it the address of the database you want to use.

It will create three tables: `experiments`, `steps`, and `model_params`.

- `experiments` is used to store a list of experiments, along with their hyperparameters and final results.

- `steps` is used to store the intermediary results for each step of each experiment. This is what you would want to plot if you're monitoring your experiments.

- Finally, `model_params` is used to store your model parameters; what you would use to save the weights of your neural network for later inference. This can get pretty big so it's recommended not to save all the parameters at every step.

In [6]:
db_url = 'postgres://postgres:1418@localhost/experiments'

In [7]:
experiments, steps, model_params = labnotebook.initialize(db_url)

  """)


## Training our ConvNet

We'll use a normal training loop, but instead of printing results out, we'll just log them with `step_experiment`.

There are only three extra lines of code added to permanently record this experiment in your database and plot it using the web app: `start_experiment`, `step_experiment` and `stop_experiment`.

In [8]:
optimizer = SGD(model.parameters(), lr=model_desc['lr'])

In [10]:
# we start the experiment and output it to an 'experiment' variable
# we can then pass this experiment to step_experiment and end_experiment
experiment = labnotebook.start_experiment(model_desc = model_desc)
timestep = 0

for epoch in range(model_desc['n_epochs']):
    print('epoch ', epoch)
    
    model.train()
    
    for data, target in train_loader:
        data, target = Variable(data), Variable(target)
        data, target = data.cuda(), target.cuda()
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        prediction = output.data.max(1)[1]
        accuracy = np.mean(prediction.eq(target.data)) * 100
        
        if timestep%100 == 0:
            model.eval()
            for data, target in test_loader:
                data, target = Variable(data, volatile=True), Variable(target, volatile=True)
                data, target = data.cuda(), target.cuda()
                output=model(data)
                prediction = output.data.max(1)[1]
                val_accuracy = np.mean(prediction.eq(target.data))*100
                
        labnotebook.step_experiment(experiment, timestep,
                   trainloss=loss.data[0],
                   trainacc=accuracy,
                   valacc=val_accuracy,
                   epoch=epoch,
                    custom_fields={'whatever': val_accuracy})
        
        timestep += 1
        
labnotebook.end_experiment(experiment,
                            final_trainloss=np.mean(loss.data[0]),
                            final_valacc=np.mean(accuracy),
                            final_trainacc=np.mean(val_accuracy))
    

epoch  0




epoch  1
epoch  2
epoch  3
epoch  4


Run 64 on GPU 0 at 2018-03-21 16:19:25.042932

## Accessing our experiments
### Through the web app

Two steps are needed:
- Launch the backend flask API by running from the command line:
```
start_backend <database_url>
```
- Double click on labnotebook/frontend/index.html

You should see something like this after selecting experiments from the left menu:

![](./img/labnotebook.png)

You can change what you see, turn live updating on or off, etc... from the `options` menu. 

## Through sqlalchemy ORM commands

You essentially have access to *all* your data in a relational database, and can query it in sophisticated ways that are beyond the scope of this notebook. 

I recommend looking through [sqlalchemy's documentation](http://docs.sqlalchemy.org/en/latest/orm/tutorial.html#querying) , but here are some simple example queries.


In [6]:
# list all experiments and print some of their properties:

experiment_list = labnotebook.session.query(experiments).all()

for experiment in experiment_list: 
    print("run id: ", experiment.run_id, end='\t')
    print("model_desc: ", experiment.model_desc)

run id:  1	model_desc:  {'sigma': 0.1}
run id:  2	model_desc:  {'sigma': 0.1}
run id:  3	model_desc:  {'sigma': 0.1}
run id:  4	model_desc:  {'sigma': 0.1}
run id:  5	model_desc:  {'sigma': 0.1}
run id:  6	model_desc:  {'sigma': 0.1}


In [7]:
# list all steps of experiment #1 and print train accuracies:

step_list = labnotebook.session.query(steps).filter(steps.run_id == 4).all()

for step in step_list:
    print("training accuracy: ", step.trainacc)

training accuracy:  0.0881643584718841
training accuracy:  0.114199389001045
training accuracy:  -0.20606842946817
training accuracy:  0.00735350586226073
training accuracy:  -0.149939680518542
training accuracy:  -0.146279798077668
training accuracy:  0.0126899091536999
training accuracy:  -0.0281140716874202
training accuracy:  0.0586363642532856
training accuracy:  0.00123095750017543


In [8]:
# list all experiments where sigma is greater than 0:
import sqlalchemy

experiment_list_highsigma = labnotebook.session.query(
    experiments).filter(experiments.model_desc['sigma'].astext.cast(sqlalchemy.types.Float) > 0).all()

for experiment in experiment_list_highsigma:
    print(experiment)

Run 1 on GPU 0 at 2018-03-18 14:11:50.702179
Run 2 on GPU 0 at 2018-03-18 14:11:50.702179
Run 3 on GPU 0 at 2018-03-18 14:11:50.702179
Run 4 on GPU 0 at 2018-03-18 14:11:50.702179
Run 5 on GPU 0 at 2018-03-18 14:52:56.322151
Run 6 on GPU 0 at 2018-03-18 14:53:17.418460


Note in this last example that we're filtering with respect to items inside a dictionary; they are passed as text so we have to cast them to Float to run comparisons.