# Neural Networks: part 1, tuning FashionMNIST classifier

Goal is to:

* Get familiar with various basic building blocks
* Understand why we need train, valid and test split

When you are done, please compile a simple pdf report (e.g. you can copy paste figures into a google doc, and save as a pdf) and put it into Dropbox folder. 

Refs:

* Introduction to Convolutional networks: http://cs231n.github.io/convolutional-networks/

## List of things you can tune

* Add/remove blocks:
    - Batch Normalization
    - Dropout
    - Convolution
    - Pooling
    - Dense
    - Activation
* Alter parameters of blocks
    - Number of units
    - Nonlinearity type
* Change optimization hyperparameters
    - Learning rate
    - Batch size

# Setup

In [1]:
%load_ext autoreload
%autoreload 2

import numpy as np
import tqdm
import json

import torch
import torch.nn.functional as F

from torch import optim
from torch import nn
from torch.autograd import Variable

from keras.datasets import fashion_mnist
from keras.utils import np_utils

%matplotlib inline
import matplotlib.pylab as plt
import matplotlib as mpl

from torch.autograd import gradcheck

mpl.rcParams['lines.linewidth'] = 2
mpl.rcParams['figure.figsize'] = (7, 7)
mpl.rcParams['axes.titlesize'] = 12
mpl.rcParams['axes.labelsize'] = 12

# Get FashionMNIST (see 1b_FMNIST.ipynb for data exploration)
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

# Logistic regression needs 2D data
x_train = x_train.reshape(-1, 784)
x_test = x_test.reshape(-1, 784)

# 0-1 normalization
x_train = x_train / 255.
x_test = x_test / 255.

# Convert to Torch Tensor. Just to avoid boilerplate code later
x_train = torch.from_numpy(x_train).type(torch.FloatTensor)
x_test = torch.from_numpy(x_test).type(torch.FloatTensor)
y_train = torch.from_numpy(y_train).type(torch.LongTensor)
y_test = torch.from_numpy(y_test).type(torch.LongTensor)

# Use only first 1k examples. Just for notebook to run faster
x_valid, y_valid = x_train[1000:2000], y_train[1000:2000]
x_train, y_train = x_train[0:1000], y_train[0:1000]
x_test, y_test = x_test[0:1000], y_test[0:1000]

Using Theano backend.


# Starting point

This section gives basic model. Please adapt yourself training loop from the previous notebook.

In [1]:
def build_simple_mlp(input_dim, output_dim):
    model = torch.nn.Sequential()
    model.add_module("linear_1", torch.nn.Linear(input_dim, 512, bias=False))
    model.add_module("nonlinearity_1", torch.nn.Sigmoid())
    model.add_module("linear_2", torch.nn.Linear(512, output_dim, bias=False))
    return model

## Training

Our goal is to go through different types of blocks without very in-depth understanding. 

In [None]:
# Simple tuning routine

# Convolutional networks

We will have separate lab on convolutions. A crash course on CNNs:

<img width=300 src=http://cs231n.github.io/assets/nn1/neural_net2.jpeg>

<img width=400 src=http://cs231n.github.io/assets/cnn/stride.jpeg>

CNN hyperparameters:

* Number of filters
* Filter size
* Stride (less important usually)

Ref: 
* Images from http://cs231n.github.io/convolutional-networks/
* How to create CNNs in PyTorch https://github.com/vinhkhuc/PyTorch-Mini-Tutorials/blob/master/5_convolutional_net.py

# MLP vs CNN

Compare a good CNN (tune its hyperparameters on valid) to a good MLP (tune its hyperparameters on valid).

# CNN: effect of filter size

Starting from a good CNN in previous section examine effect of filter size.

# Depth 

Starting from the basic MLP, examine effect of depth going from 1 to deepest you can on your machine.

# Width

Starting from the basic MLP, examine effect of width, going to widest you can on your machine.