# Homework: Not So Basic Artificial Neural Networks

Your task is to implement a simple framework for convolutional neural networks training. While convolutional neural networks is a subject of lecture 3, we expect that there are a lot of students who are familiar with the topic.

In order to successfully pass this homework, you will have to:

- Implement all the blocks in `homework_modules.ipynb` (esp `Conv2d` and `MaxPool2d` layers). Good implementation should pass all the tests in `homework_test_modules.ipynb`.
- Settle with a bit of math in `homework_differentiation.ipynb`
- Train a CNN that has at least one `Conv2d` layer, `MaxPool2d` layer and `BatchNormalization` layer and achieves at least 97% accuracy on MNIST test set.

Feel free to use `homework_main-basic.ipynb` for debugging or as source of code snippets. 

Note, that this homework requires sending **multiple** files, please do not forget to include all the files when sending to TA. The list of files:
- This notebook with cnn trained
- `homework_modules.ipynb`
- `homework_differentiation.ipynb`

In [1]:
%matplotlib inline
from time import time, sleep
import numpy as np
import matplotlib.pyplot as plt
from IPython import display

np.random.seed(42)

In [2]:
# (re-)load layers
%run homework_modules.ipynb

In [3]:
# batch generator
def get_batches(dataset, batch_size):
    X, Y = dataset
    n_samples = X.shape[0]
        
    # Shuffle at the start of epoch
    indices = np.arange(n_samples)
    np.random.shuffle(indices)
    
    for start in range(0, n_samples, batch_size):
        end = min(start + batch_size, n_samples)
        
        batch_idx = indices[start:end]
    
        yield X[batch_idx], Y[batch_idx]

In [4]:
import mnist
X_train, y_train, X_val, y_val, X_test, y_test = mnist.load_dataset()  # your dataset

In [5]:
# Let's add one channel to the [HxW] images -> [1xHxW]
X_train = X_train.reshape(X_train.shape[0], 1, X_train.shape[1], X_train.shape[2])
X_test = X_test.reshape(X_test.shape[0], 1, X_test.shape[1], X_test.shape[2])
X_val = X_val.reshape(X_val.shape[0], 1, X_val.shape[1], X_val.shape[2])

In [6]:
assert X_train.shape[1:] == (1, 28, 28), 'wrong X_train shape'
assert X_test.shape[1:] == (1, 28, 28), 'wrong X_test shape'
assert X_val.shape[1:] == (1, 28, 28), 'wrong X_val shape'

In [7]:
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import accuracy_score

encoder = OneHotEncoder(sparse=False)
y_train_enc = encoder.fit_transform(y_train.reshape(-1, 1))

In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.


In [8]:
# Let's start with defining our model's architecture

# [Conv2d -> BatchNorm -> ReLU -> MaxPool]*2 -> Dense 128 -> Dense 10 -> LogSoftMax
net = Sequential()
net.add(Conv2d(1, 32, 3))
net.add(BatchNormalization(alpha=.5))
net.add(ReLU())
net.add(MaxPool2d(2))
net.add(Conv2d(32, 48, 3))
net.add(ReLU())
net.add(MaxPool2d(2))
net.add(Flatten())
net.add(Dropout(0.25))
net.add(Linear(2352, 128))
net.add(Linear(128, 10))
net.add(LogSoftMax())

# Cross-Entropy Loss
criterion = ClassNLLCriterion()

In [9]:
# Define HyperParameters
n_epoch = 3
batch_size = 128

# Optimizer
optimizer_config = {'learning_rate' : 1e-3, 'beta1': 0.9, 'beta2': 0.999, 'epsilon': 1e-8}
optimizer_state = {}

In [10]:
for i in range(n_epoch):
    print("Epoch {} started...".format(i))
    
    net.train()
    epoch_losses = []
    for x_batch, y_batch in (get_batches((X_train, y_train_enc), batch_size)):
        
        net.zeroGradParameters()
        
        # Forward
        predictions = net.forward(x_batch)
        loss = criterion.forward(predictions, y_batch)
        
        # Backward
        dL = criterion.backward(predictions, y_batch)
        net.backward(x_batch, dL)
        
        # Update weights
        adam_optimizer(net.getParameters(), 
                       net.getGradParameters(), 
                       optimizer_config,
                       optimizer_state)      
        
        epoch_losses.append(loss)
    
    print("Epoch {} mean loss: {:.3f}".format(i, np.mean(epoch_losses)))
    
    net.evaluate()
    y_true, y_pred = [], []
    for x_batch, y_batch in (get_batches((X_val, y_val), batch_size)):
        batch_output = net.forward(x_batch)
        y_pred += list(np.argmax(batch_output, axis=1))
        y_true += list(y_batch)
        
    print("Validation {} epoch accuracy: {:.3f}".format(i, accuracy_score(y_true, y_pred)))

Epoch 0 started...
Epoch 0 mean loss: 0.197
Validation 0 epoch accuracy: 0.979
Epoch 1 started...
Epoch 1 mean loss: 0.070
Validation 1 epoch accuracy: 0.985
Epoch 2 started...
Epoch 2 mean loss: 0.054
Validation 2 epoch accuracy: 0.984


Print here your accuracy on test set. It should be >97%. Don't forget to switch the network in 'evaluate' mode

In [13]:
net.evaluate()

y_true, y_pred = [], []
for x_batch, y_batch in (get_batches((X_test, y_test), batch_size)):
    batch_output = net.forward(x_batch)
    
    y_pred += list(np.argmax(batch_output, axis=1))
    y_true += list(y_batch)

print("Test accuracy: {:.3f}".format(accuracy_score(y_true, y_pred)))

Test accuracy: 0.984
