# Classifying handwritten digits

**What are the steps to build a network in PyTorch?**

* Data pipeline - DataLoader, Transforms, Working with images
* "Babysitting the learning process" - Batch size, validation set, 1st-layer visualizations
* First approach to work with images - Fully-connected network for [MNIST data set][mnist]

"Build a neural network in your browser" - [deeplearnjs.org/demos/model-builder][model-builder]

<a href="https://deeplearnjs.org/demos/model-builder/">
    <img src="images/tfjs-mnist.png" width="400px" />
</a>

[mnist]:http://yann.lecun.com/exdb/mnist/index.html
[model-builder]:https://deeplearnjs.org/demos/model-builder/

## Import libraries

In [None]:
# Load libraries
import torch
print("Torch version:", torch.__version__)

import torchvision
print("Torchvision version:", torchvision.__version__)

import numpy as np
print("Numpy version:", np.__version__)

import matplotlib
print("Matplotlib version:", matplotlib.__version__)

import PIL
print("PIL version:", PIL.__version__)

import IPython
print("IPython version:", IPython.__version__)

In [None]:
# Setup Matplotlib
%matplotlib inline
#%config InlineBackend.figure_format = 'retina' # If you have a retina screen
import matplotlib.pyplot as plt

## Data loading and preprocessing

Why data loaders - Source: [Data Loading and Processing Tutorial][dataloader-tutorial] by Sasank Chilamkurthy

> A lot of effort in solving any machine learning problem goes in to preparing the data. PyTorch provides many tools to make data loading easy and hopefully, to make your code more readable.

Why is it important?

* **Data augmentation** - improve training and generalization
* **Handle large data sets** - memory issues / speed ex. prefetching

[dataloader-tutorial]:https://pytorch.org/tutorials/beginner/data_loading_tutorial.html

In [None]:
from torchvision import transforms
from torch.utils.data import DataLoader

# Data set
preprocessing = transforms.Compose([
    transforms.ToTensor(), # To PyTorch tensors
    transforms.Normalize((0.1307,), (0.3081,))
])
train_set = torchvision.datasets.MNIST(root='data', train=True, download=True, transform=preprocessing)
valid_set = torchvision.datasets.MNIST(root='data', train=False, download=True, transform=preprocessing)

# Data loader for the "train" set
# - Set number of workers with num_workers
train_loader = DataLoader(train_set, batch_size=64, shuffle=True)

# Data loader for the "validation" step
valid_sampler = torch.utils.data.sampler.SubsetRandomSampler(
    # Validation set is too large, take a subset for efficiency
    np.random.choice(np.arange(len(valid_set)), size=200, replace=False)
)
valid_loader = DataLoader(valid_set, batch_size=64, shuffle=False, sampler=valid_sampler)

# Get first batch of data from the "train" set
train_iter = iter(train_loader)
images, labels = next(train_iter)

print('Shape of image Tensor:', images.shape)
print('Labels:', labels)

# Plot it
grid = torchvision.utils.make_grid(images, normalize=True)
plt.imshow(grid.numpy().transpose((1, 2, 0)))
plt.show()

## "Babysitting the learning process"

How to monitor the learning process?

* **Monitor loss value** - Oscillations, how it decreases
* **Validation set** - Overfitting, model complexity

**First-layer visualizations** - Source [CS231n course][cs231-baby]

<img src="images/cs231n-layer1vis.png" width="600px" />

[cs231-baby]:http://cs231n.github.io/neural-networks-3/#baby

PyTorch implementation

In [None]:
# Arbitrary 2-layers model
model = torch.nn.Sequential(
    torch.nn.Linear(in_features=28*28, out_features=64),
    torch.nn.ReLU(),
    torch.nn.Linear(in_features=64, out_features=10)
)

# Get the weights matrix
weights = model[0].weight.data

# Visualize hidden units
def plot_layer1(weights, axis):
    # Shape of weights matrix
    n_out, n_in = weights.shape
    assert n_in == 784 # Should be the first layer

    # Create a grid
    n_cells = min(16, n_out)
    grid = torchvision.utils.make_grid(
        weights[:n_cells].view(n_cells, 1, 28, 28),
        nrow=4, normalize=True
    )
    
    # Plot it
    axis.imshow(grid.numpy().transpose((1, 2, 0)), aspect='auto')
    
fig = plt.figure(figsize=(3, 3))
plot_layer1(weights, fig.gca())

## Build the digit classifier

**Training with small batches of data**

```python
# Create the model
...

t = 0 # Number of samples seen
print_step = 200 # Refresh rate

for epoch in range(1, 10**5):
    # Train by small batches of data
    for batch, (batch_X, batch_y) in enumerate(train_loader, 1):
        # Forward pass
        ...

        # Backpropagation
        ...

        if t%print_step == 0:
            # Visualize what the network learned
            ...
            
        # Update t
        t += train_loader.batch_size
```

PyTorch implementation

In [None]:
from collections import defaultdict

# Create model
model = torch.nn.Sequential(
    torch.nn.Linear(in_features=28*28, out_features=10),
)

# Criterion and optimizer for "training"
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=1)

# Forward step
def forward(X):
    X_tensor = torch.FloatTensor(X)
    X_reshaped = X_tensor.view(-1, 28*28) # Reshape the input!
    X_variable = torch.autograd.Variable(X_reshaped)
    return model(X_variable)

# Backprop step
def compute_loss(output, target):
    y_tensor = torch.LongTensor(target)
    y_variable = torch.autograd.Variable(y_tensor)
    return criterion(output, y_variable)

def backpropagation(output, target):
    optimizer.zero_grad() # Clear the gradients
    loss = compute_loss(output, target) # Compute loss
    loss.backward() # Backpropagation
    optimizer.step() # Let the optimizer adjust our model
    return loss.data

# Helper function
def get_accuracy(output, y):
    predictions = torch.argmax(output, dim=1) # Max activation
    is_correct = np.equal(predictions, y)
    return is_correct.numpy().mean()
    
# Create a figure to visualize the results
fig, (ax1, ax2, ax3) = plt.subplots(nrows=1, ncols=3, figsize=(12, 3))
    
try:
    # Collect loss / accuracy values
    stats = defaultdict(list)
    t = 0 # Number of samples seen
    print_step = 200 # Refresh rate
    
    for epoch in range(1, 10**5):
        # Train by small batches of data
        for batch, (batch_X, batch_y) in enumerate(train_loader, 1):
            # Forward pass & backpropagation
            output = forward(batch_X)
            loss = backpropagation(output, batch_y)
            
            # Log "train" stats
            stats['train_loss'].append(loss)
            stats['train_acc'].append(get_accuracy(output, batch_y))
            stats['train_t'].append(t)

            if t%print_step == 0:
                # Log "validation" stats
                loss_vals, acc_vals = [], []
                for X, y in valid_loader:
                    output = forward(X)
                    loss_vals.append(compute_loss(output, y).data)
                    acc_vals.append(get_accuracy(output, y))
                    
                stats['val_loss'].append(np.mean(loss_vals))
                stats['val_acc'].append(np.mean(acc_vals))
                stats['val_t'].append(t)
                
                # Plot what the network learned
                ax1.cla()
                ax1.set_title('Epoch {}, batch {:,}'.format(epoch, batch))
                plot_layer1(model[0].weight.data, ax1)
                ax2.cla()
                ax2.set_title('Loss, val: {:.3f}'.format(np.mean(stats['val_loss'][-10:])))
                ax2.plot(stats['train_t'], stats['train_loss'], label='train')
                ax2.plot(stats['val_t'], stats['val_loss'], label='valid')
                ax2.legend()
                ax3.cla()
                ax3.set_title('Accuracy, val: {:.3f}'.format(np.mean(stats['val_acc'][-10:])))
                ax3.plot(stats['train_t'], stats['train_acc'], label='train')
                ax3.plot(stats['val_t'], stats['val_acc'], label='valid')
                ax3.set_ylim(0, 1)
                ax3.legend()

                # Jupyter trick
                IPython.display.clear_output(wait=True)
                IPython.display.display(fig)
                
            # Update t
            t += train_loader.batch_size

except KeyboardInterrupt:
    # Clear output
    IPython.display.clear_output()

**Tasks**

* Hidden layers - Add a hidden layer with 64 units and ReLU activation
* Sanity check - Achieve a loss of 0 on a small random subset of the data
* Learning rate - Test different values, what are the effects on the loss
* Model complexity - Play with the number of layers and units, observe overfitting

## Small challenge - Plot output activations for a few images

In [None]:
# TODO

## Additional resources

Nice visualizations

* Visualize MNIST data set with unsupervised learning - [projector.tensorflow.org][tf-projector]

To go deeper

* Weight Initialization - [cs231n course][cs231-init]

[tf-projector]:https://projector.tensorflow.org/
[cs231-init]:http://cs231n.github.io/neural-networks-2/#init