# Final Project Baseline Training

In this notebook, we train a baseline model for our final project on the kaggle data

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable
from torch.utils.data import TensorDataset
from torch.utils.data import DataLoader
from torch.utils.data import sampler

import torchvision.datasets as dset
import torchvision.transforms as T

import numpy as np
import pandas as pd
import os
import random
import string
import csv

import timeit

# Data processing
Turns our kaggle csv of (label, pixel array, datatype) into a file dataset of greyscale images along with an index csv of filepaths + types. This format is what PyTorch data loaders will expect later. 

In [2]:
#I cheated and just hard coded these from our data
NUM_TRAIN = 28709
NUM_VAL = 3589
NUM_TEST = 3589
pixel_dim = 48

X_train = np.zeros((NUM_TRAIN, 1, pixel_dim, pixel_dim))
X_val = np.zeros((NUM_VAL, 1, pixel_dim, pixel_dim))
X_test = np.zeros((NUM_TEST, 1, pixel_dim, pixel_dim))
y_train = np.zeros(NUM_TRAIN)
y_val = np.zeros(NUM_VAL)
y_test = np.zeros(NUM_TEST)


with open('./data/fer2013.csv', 'r') as f:
    reader = csv.reader(f)
    rownum = 0
    trainrow = 0
    valrow = 0
    testrow = 0
    for row in reader:
        if rownum != 0:
            label = int(row[0])
            pixels = np.fromstring(row[1], dtype=int, sep=' ')
            usage = row[2]
            pixarray = np.reshape(pixels,(1,pixel_dim, pixel_dim))
            
            if usage == 'Training':
                X_train[trainrow, :, :, :] = pixarray
                y_train[trainrow] = label
                trainrow += 1
            elif usage == 'PrivateTest':
                X_val[valrow, :, :, :] = pixarray
                y_val[valrow] = label
                valrow += 1
            else:
                X_test[testrow, :, :, :] = pixarray
                y_test[testrow] = label
                testrow += 1         
        rownum += 1

# save to file
SAVE_DIR = os.path.expanduser('~/touchy-feely/data/pics/')
if not os.path.exists(SAVE_DIR):
    os.mkdir(SAVE_DIR)
else:
    import shutil
    shutil.rmtree(SAVE_DIR)
    os.mkdir(SAVE_DIR)

loop = 0
names = ['train', 'val', 'test']
for (X,Y) in [(X_train, y_train), (X_val, y_val), (X_test, y_test)]: 
    print(names[loop])
    paths = []
    for x in X:

        file_path = os.path.join(SAVE_DIR,''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(6))
    )
        #print(file_path+'.npy')
        np.save('%s.npy' % file_path, x)
        paths.append(file_path+'.npy')

    # create data frame from file paths and labels
    df = pd.DataFrame(data={'files':paths, 'labels':Y})
    print(df.head())

    # save data frame as CSV file
    df.to_csv(os.path.join(SAVE_DIR, names[loop] + '_DATA.csv'), index=False)
    loop += 1


train
                                               files  labels
0  /home/dhruvamin/touchy-feely/data/pics/IMUVEQ.npy     0.0
1  /home/dhruvamin/touchy-feely/data/pics/QH56ZP.npy     0.0
2  /home/dhruvamin/touchy-feely/data/pics/W2ULEK.npy     2.0
3  /home/dhruvamin/touchy-feely/data/pics/67TJ2C.npy     4.0
4  /home/dhruvamin/touchy-feely/data/pics/DZGTQ9.npy     6.0
val
                                               files  labels
0  /home/dhruvamin/touchy-feely/data/pics/KDYUQC.npy     0.0
1  /home/dhruvamin/touchy-feely/data/pics/JP2GI0.npy     5.0
2  /home/dhruvamin/touchy-feely/data/pics/KQHTFX.npy     6.0
3  /home/dhruvamin/touchy-feely/data/pics/73K7WX.npy     4.0
4  /home/dhruvamin/touchy-feely/data/pics/CNUVE5.npy     2.0
test
                                               files  labels
0  /home/dhruvamin/touchy-feely/data/pics/1P8R82.npy     0.0
1  /home/dhruvamin/touchy-feely/data/pics/TEL6T9.npy     1.0
2  /home/dhruvamin/touchy-feely/data/pics/6H51F2.npy     4.0
3  /home/

## Load Datasets

We load the kaggle dataset.

In [3]:
class ChunkSampler(sampler.Sampler):
    """Samples elements sequentially from some offset. 
    Arguments:
        num_samples: # of desired datapoints
        start: offset where we should start selecting from
    """
    def __init__(self, num_samples, start = 0):
        self.num_samples = num_samples
        self.start = start

    def __iter__(self):
        return iter(range(self.start, self.start + self.num_samples))

    def __len__(self):
        return self.num_samples

X_train_tensor = torch.from_numpy(X_train)
X_val_tensor = torch.from_numpy(X_val)
X_test_tensor = torch.from_numpy(X_test)
y_train_tensor = torch.from_numpy(y_train).int()
y_val_tensor = torch.from_numpy(y_val).int()
y_test_tensor = torch.from_numpy(y_test).int()

greyscale_train = TensorDataset(X_train_tensor, y_train_tensor)
loader_train = DataLoader(greyscale_train, batch_size=64, sampler=ChunkSampler(NUM_TRAIN,0))

greyscale_val = TensorDataset(X_val_tensor, y_val_tensor)
loader_val = DataLoader(greyscale_val, batch_size=64, sampler=ChunkSampler(NUM_VAL,0))

greyscale_test = TensorDataset(X_test_tensor, y_test_tensor)
loader_test = DataLoader(greyscale_test, batch_size=64, sampler=ChunkSampler(NUM_TEST,0))
    
#NUM_TRAIN = 49000
#NUM_VAL = 1000

#cifar10_train = dset.CIFAR10('./cs231n/datasets', train=True, download=True,transform=T.ToTensor())
#loader_train = DataLoader(cifar10_train, batch_size=64, sampler=ChunkSampler(NUM_TRAIN, 0))

#cifar10_val = dset.CIFAR10('./cs231n/datasets', train=True, download=True,transform=T.ToTensor())
#loader_val = DataLoader(cifar10_val, batch_size=64, sampler=ChunkSampler(NUM_VAL, NUM_TRAIN))

#cifar10_test = dset.CIFAR10('./cs231n/datasets', train=False, download=True,transform=T.ToTensor())
#loader_test = DataLoader(cifar10_test, batch_size=64)

In [4]:
#dtype = torch.FloatTensor # the CPU datatype
gpu_dtype = torch.cuda.FloatTensor

# Constant to control how frequently we print train loss
print_every = 100

# This is a little utility that we'll use to reset the model
# if we want to re-initialize all our parameters
def reset(m):
    if hasattr(m, 'reset_parameters'):
        m.reset_parameters()

## Helper Functions

### Flatten Function

Remember that our image data (and more relevantly, our intermediate feature maps) are initially N x C x H x W, where:
* N is the number of datapoints
* C is the number of channels
* H is the height of the intermediate feature map in pixels
* W is the height of the intermediate feature map in pixels

The Flatten function below first reads in the N, C, H, and W values from a given batch of data, and then returns a "view" of that data. "View" is analogous to numpy's "reshape" method: it reshapes x's dimensions to be N x ??, where ?? is allowed to be anything (in this case, it will be C x H x W, but we don't need to specify that explicitly). 

In [5]:
class Flatten(nn.Module):
    def forward(self, x):
        N, C, H, W = x.size() # read in N, C, H, W
        return x.view(N, -1)  # "flatten" the C * H * W values into a single vector per image

### GPU Check

If this returns false, or otherwise fails in a not-graceful way (i.e., with some error message), you may not have an NVIDIA GPU available on your machine.

In [6]:
# Verify that CUDA is properly configured and you have a GPU available

torch.cuda.is_available()

True

### Training and Accuracy Functions

In [7]:
def train(model, loss_fn, optimizer, num_epochs = 1):
    
    train_accuracies = []
    val_accuracies = []
    
    for epoch in range(num_epochs):
        print('Starting epoch %d / %d' % (epoch + 1, num_epochs))
        model.train()
        for t, (x, y) in enumerate(loader_train):
            x_var = Variable(x.type(gpu_dtype))
            y_var = Variable(y.type(gpu_dtype).long())

            scores = model(x_var)
            
            loss = loss_fn(scores, y_var)
            if (t + 1) % print_every == 0:
                print('t = %d, loss = %.4f' % (t + 1, loss.data[0]))

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
    

def check_accuracy(model, loader):
    #if loader.dataset.train:
    #    print('Checking accuracy on validation set')
    #else:
    #    print('Checking accuracy on test set')   
    num_correct = 0
    num_samples = 0
    model.eval() # Put the model in test mode (the opposite of model.train(), essentially)
    for x, y in loader:
        x_var = Variable(x.type(gpu_dtype), volatile=True)

        scores = model(x_var)
        _, preds = scores.data.cpu().max(1)
        num_correct += (preds == y).sum()
        num_samples += preds.size(0)
    acc = float(num_correct) / num_samples
    print('Got %d / %d correct (%.2f)' % (num_correct, num_samples, 100 * acc))

## Model Specification

In [8]:
# Train your model here, and make sure the output of this cell is the accuracy of your best model on the 
# train, val, and test sets. Here's some code to get you started. The output of this cell should be the training
# and validation accuracy on your best model (measured by validation accuracy).

model = None
loss_fn = None
optimizer = None

num_filters1 = 32
num_filters2 = 96
num_filters3 = 256
num_filters4 = 512
kernal_size1 = 7
kernal_size2 = 5
kernal_size3 =3
affine_layer_size1 = 4048
affine_layer_size2 = 1024
num_epochs = 5

# This one is currently overwritten, but thought it might be useful later
model_base = nn.Sequential( # You fill this in!
                nn.Conv2d(1, num_filters1, kernel_size=kernal_size1, stride=1), #Hout = (input_size - kernal_size)/stride + 1
                nn.ReLU(inplace=True),
                nn.BatchNorm2d(num_filters1),
                nn.Conv2d(num_filters1, num_filters2, kernel_size=kernal_size2, stride=1),
                nn.ReLU(inplace=True),
                nn.MaxPool2d(2, stride=2), #Hout = (input_size - pool_size)/stride + 1
                nn.Conv2d(num_filters2, num_filters3, kernel_size=kernal_size3, stride=1),
                nn.ReLU(inplace=True),
                nn.MaxPool2d(2, stride=2), #Hout = (input_size - pool_size)/stride + 1
                nn.Conv2d(num_filters3, num_filters4, kernel_size=kernal_size3, stride=1),
                nn.ReLU(inplace=True),
                nn.MaxPool2d(2, stride=2), #Hout = (input_size - pool_size)/stride + 1
                Flatten(),                   
                nn.Linear(3*3*num_filters4, affine_layer_size1),
                nn.ReLU(inplace=True),
                nn.Dropout(),
                nn.Linear(affine_layer_size1, affine_layer_size2),
                nn.ReLU(inplace=True),
                nn.Dropout(),
                nn.Linear(affine_layer_size2,7),
            )


model = model_base.type(gpu_dtype)

loss_fn = nn.CrossEntropyLoss().type(gpu_dtype)
optimizer = optim.Adam(model.parameters(), lr=1e-3) # lr sets the learning rate of the optimizer

### Train Model and Evaluate on Validation Set

In [9]:
model.apply(reset)
train(model, loss_fn, optimizer, num_epochs=num_epochs)
check_accuracy(model, loader_train)
check_accuracy(model, loader_val)

Starting epoch 1 / 5
t = 100, loss = 1.8142
t = 200, loss = 1.7888
t = 300, loss = 1.5804
t = 400, loss = 1.7315
Starting epoch 2 / 5
t = 100, loss = 1.5529
t = 200, loss = 1.6496
t = 300, loss = 1.4338
t = 400, loss = 1.6402
Starting epoch 3 / 5
t = 100, loss = 1.4893
t = 200, loss = 1.4649
t = 300, loss = 1.3010
t = 400, loss = 1.6161
Starting epoch 4 / 5
t = 100, loss = 1.3687
t = 200, loss = 1.4486
t = 300, loss = 1.2450
t = 400, loss = 1.5707
Starting epoch 5 / 5
t = 100, loss = 1.3184
t = 200, loss = 1.4567
t = 300, loss = 1.2543
t = 400, loss = 1.5743
Got 14086 / 28709 correct (49.06)
Got 1637 / 3589 correct (45.61)


## Test set -- run this only once

Now that we've gotten a result we're happy with, we test our final model on the test set (which you should store in best_model).  This would be the score we would achieve on a competition. Think about how this compares to your validation set accuracy.

In [None]:
best_model = model
check_accuracy(best_model, loader_test)

### Things you should try:
- **Filter size**: Above we used 7x7; this makes pretty pictures but smaller filters may be more efficient
- **Number of filters**: Above we used 32 filters. Do more or fewer do better?
- **Pooling vs Strided Convolution**: Do you use max pooling or just stride convolutions?
- **Batch normalization**: Try adding spatial batch normalization after convolution layers and vanilla batch normalization after affine layers. Do your networks train faster?
- **Network architecture**: The network above has two layers of trainable parameters. Can you do better with a deep network? Good architectures to try include:
    - [conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    - [conv-relu-conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    - [batchnorm-relu-conv]xN -> [affine]xM -> [softmax or SVM]
- **Global Average Pooling**: Instead of flattening and then having multiple affine layers, perform convolutions until your image gets small (7x7 or so) and then perform an average pooling operation to get to a 1x1 image picture (1, 1 , Filter#), which is then reshaped into a (Filter#) vector. This is used in [Google's Inception Network](https://arxiv.org/abs/1512.00567) (See Table 1 for their architecture).
- **Regularization**: Add l2 weight regularization, or perhaps use Dropout.

### Tips for training
For each network architecture that you try, you should tune the learning rate and regularization strength. When doing this there are a couple important things to keep in mind:

- If the parameters are working well, you should see improvement within a few hundred iterations
- Remember the coarse-to-fine approach for hyperparameter tuning: start by testing a large range of hyperparameters for just a few training iterations to find the combinations of parameters that are working at all.
- Once you have found some sets of parameters that seem to work, search more finely around these parameters. You may need to train for more epochs.
- You should use the validation set for hyperparameter search, and save your test set for evaluating your architecture on the best parameters as selected by the validation set.

### Going above and beyond
If you are feeling adventurous there are many other features you can implement to try and improve your performance. You are **not required** to implement any of these; however they would be good things to try for extra credit.

- Alternative update steps: For the assignment we implemented SGD+momentum, RMSprop, and Adam; you could try alternatives like AdaGrad or AdaDelta.
- Alternative activation functions such as leaky ReLU, parametric ReLU, ELU, or MaxOut.
- Model ensembles
- Data augmentation
- New Architectures
  - [ResNets](https://arxiv.org/abs/1512.03385) where the input from the previous layer is added to the output.
  - [DenseNets](https://arxiv.org/abs/1608.06993) where inputs into previous layers are concatenated together.
  - [This blog has an in-depth overview](https://chatbotslife.com/resnets-highwaynets-and-densenets-oh-my-9bb15918ee32)

PyTorch supports many other layer types, loss functions, and optimizers - you will experiment with these next. Here's the official API documentation for these (if any of the parameters used above were unclear, this resource will also be helpful). One note: what we call in the class "spatial batch norm" is called "BatchNorm2D" in PyTorch.

* Layers: http://pytorch.org/docs/nn.html
* Activations: http://pytorch.org/docs/nn.html#non-linear-activations
* Loss functions: http://pytorch.org/docs/nn.html#loss-functions
* Optimizers: http://pytorch.org/docs/optim.html#algorithms

## Going further with PyTorch

The next assignment will make heavy use of PyTorch. You might also find it useful for your projects. 

Here's a nice tutorial by Justin Johnson that shows off some of PyTorch's features, like dynamic graphs and custom NN modules: http://pytorch.org/tutorials/beginner/pytorch_with_examples.html

If you're interested in reinforcement learning for your final project, this is a good (more advanced) DQN tutorial in PyTorch: http://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html