# Homework: Not So Basic Artificial Neural Networks

Your task is to implement a simple framework for convolutional neural networks training. While convolutional neural networks is a subject of lecture 3, we expect that there are a lot of students who are familiar with the topic.

In order to successfully pass this homework, you will have to:

- Implement all the blocks in `homework_modules.ipynb` (esp `Conv2d` and `MaxPool2d` layers). Good implementation should pass all the tests in `homework_test_modules.ipynb`.
- Settle with a bit of math in `homework_differentiation.ipynb`
- Train a CNN that has at least one `Conv2d` layer, `MaxPool2d` layer and `BatchNormalization` layer and achieves at least 97% accuracy on MNIST test set.

Feel free to use `homework_main-basic.ipynb` for debugging or as source of code snippets. 

Note, that this homework requires sending **multiple** files, please do not forget to include all the files when sending to TA. The list of files:
- This notebook with cnn trained
- `homework_modules.ipynb`
- `homework_differentiation.ipynb`

In [78]:
%matplotlib inline
from time import time, sleep
import numpy as np
import matplotlib.pyplot as plt
from IPython import display

from sklearn.metrics import accuracy_score

from tqdm import tqdm

In [79]:
%pip install scikit-image

Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/usr/bin/python3.7 -m pip install --upgrade pip' command.[0m


In [80]:
# (re-)load layers
%run homework_modules.ipynb

In [81]:
# batch generator
def get_batches(dataset, batch_size):
    X, Y = dataset
    n_samples = X.shape[0]
        
    # Shuffle at the start of epoch
    indices = np.arange(n_samples)
    np.random.shuffle(indices)
    
    for start in range(0, n_samples, batch_size):
        end = min(start + batch_size, n_samples)
        
        batch_idx = indices[start:end]
    
        yield X[batch_idx], Y[batch_idx]

In [82]:
import mnist
X_train, y_train, X_val, y_val, X_test, y_test = mnist.load_dataset()  # your dataset

In [12]:
# Your turn - train and evaluate conv neural network

In [83]:
n_classes = 10

In [84]:
nonlinearity = ReLU  # ELU

model = Sequential()

model.add(Conv2d(in_channels=1, out_channels=8, kernel_size=3))
model.add(Dropout(p=0.5))
model.add(nonlinearity())

model.add(MaxPool2d(kernel_size=2))  # size: [batch, 60, 7, 7]

model.add(Conv2d(in_channels=8, out_channels=16, kernel_size=3))
model.add(Dropout(p=0.5))
model.add(nonlinearity())

model.add(MaxPool2d(kernel_size=2))  # size: [batch, 60, 7, 7]

model.add(Flatten())  # size: [batch, 80 * 7 * 7]

model.add(Linear(n_in=16*7*7, n_out=450))
model.add(nonlinearity())
model.add(BatchNormalization(alpha=0.9))
model.add(ChannelwiseScaling(n_out=450))

model.add(Linear(n_in=450, n_out=300))
model.add(nonlinearity())
model.add(BatchNormalization(alpha=0.9))
model.add(ChannelwiseScaling(n_out=300))

model.add(Linear(n_in=300, n_out=200))
model.add(nonlinearity())
model.add(BatchNormalization(alpha=0.9))
model.add(ChannelwiseScaling(n_out=200))

model.add(Linear(n_in=200, n_out=100))
model.add(nonlinearity())
model.add(BatchNormalization(alpha=0.9))
model.add(ChannelwiseScaling(n_out=100))

model.add(Linear(n_in=100, n_out=n_classes))
model.add(LogSoftMax())

In [87]:
criterion = ClassNLLCriterion()

optimizer = adam_optimizer
state = {}  
config = {'learning_rate': 1e-2, 'beta1': 0.9, 'beta2':0.999, 'epsilon':1e-8}

n_epochs = 4

batch_size = 32
print_every = 100


In [88]:
model.train()

for epoch in range(1, n_epochs):
    print(f"_____\nepoch = {epoch}")
    config['learning_rate'] /= 10
    running_loss = 0
    for i in tqdm(range(len(X_train) // batch_size - 2)):
        x_batch = X_train[i * batch_size: (i + 1) * batch_size][:, np.newaxis]
        y_class_labels = y_train[i * batch_size: (i + 1) * batch_size]

        y_batch = np.zeros((batch_size, n_classes))
        y_batch[range(batch_size), y_class_labels] = 1
        
        log_preds = model.updateOutput(x_batch)
        loss = criterion.updateOutput(log_preds, y_batch)
        
        running_loss += loss
        grad = criterion.updateGradInput(log_preds, y_batch)
        model.backward(x_batch, grad)
        adam_optimizer(model.getParameters(), model.getGradParameters(), config, state)

        if not i % print_every and i:
            print(f"batch number {i}, loss = {running_loss / (print_every)}")
            running_loss = 0

  0%|          | 0/1560 [00:00<?, ?it/s]

_____
epoch = 1


  6%|▋         | 101/1560 [00:56<12:48,  1.90it/s]

batch number 100, loss = 1.9484223803424414


 13%|█▎        | 201/1560 [01:50<12:08,  1.86it/s]

batch number 200, loss = 0.8223658088519105


 19%|█▊        | 292/1560 [02:38<11:13,  1.88it/s]

batch number 300, loss = 0.47750294081800754


 20%|█▉        | 311/1560 [02:49<11:14,  1.85it/s]

batch number 400, loss = 0.352127264268867


 32%|███▏      | 501/1560 [04:30<09:20,  1.89it/s]

batch number 500, loss = 0.3373278604064134


 38%|███▊      | 600/1560 [05:24<08:30,  1.88it/s]

batch number 600, loss = 0.2599499510528589


 44%|████▍     | 684/1560 [06:09<07:42,  1.89it/s]

batch number 700, loss = 0.23564697361274342


 51%|█████▏    | 801/1560 [07:12<06:43,  1.88it/s]

batch number 800, loss = 0.23636203712581305


 58%|█████▊    | 900/1560 [08:05<05:55,  1.86it/s]

batch number 900, loss = 0.23369812952953073


 64%|██████▍   | 1000/1560 [09:00<05:00,  1.86it/s]

batch number 1000, loss = 0.22921760344959072


 71%|███████   | 1100/1560 [09:53<04:00,  1.91it/s]

batch number 1100, loss = 0.21313579125205323


 77%|███████▋  | 1200/1560 [10:46<03:42,  1.62it/s]

batch number 1200, loss = 0.17858843911432293


 77%|███████▋  | 1201/1560 [10:47<03:33,  1.68it/s]

batch number 1200, loss = 0.17858843911432293


 83%|████████▎ | 1301/1560 [11:40<02:25,  1.78it/s]

batch number 1300, loss = 0.20180475748684365


 90%|████████▉ | 1400/1560 [12:34<01:26,  1.86it/s]

batch number 1400, loss = 0.18832178183012235


 96%|█████████▌| 1496/1560 [13:26<00:33,  1.88it/s]

batch number 1500, loss = 0.1999692544742342


100%|█████████▉| 1558/1560 [14:00<00:01,  1.86it/s]

_____
epoch = 2


100%|██████████| 1560/1560 [14:01<00:00,  1.85it/s]
  0%|          | 0/1560 [00:00<?, ?it/s]

_____
epoch = 2


  6%|▋         | 99/1560 [00:53<12:55,  1.88it/s]

batch number 100, loss = 0.16248538039989074


  6%|▋         | 101/1560 [00:54<13:00,  1.87it/s]

batch number 100, loss = 0.16248538039989074


 12%|█▏        | 193/1560 [01:43<13:50,  1.65it/s]

batch number 200, loss = 0.14207808449228404


 19%|█▉        | 300/1560 [02:40<12:09,  1.73it/s]

batch number 300, loss = 0.16546558996294106


 25%|██▍       | 389/1560 [03:28<10:18,  1.89it/s]

batch number 400, loss = 0.1473407511873035


 32%|███▏      | 492/1560 [04:24<09:19,  1.91it/s]

batch number 500, loss = 0.14685575023721964


 39%|███▊      | 601/1560 [05:22<08:27,  1.89it/s]

batch number 600, loss = 0.1261696167113259


 44%|████▍     | 686/1560 [06:09<07:52,  1.85it/s]

batch number 700, loss = 0.13656034091645455


 51%|█████     | 798/1560 [07:09<06:46,  1.88it/s]

batch number 800, loss = 0.11641888053786151


 58%|█████▊    | 900/1560 [08:04<05:50,  1.88it/s]

batch number 900, loss = 0.14983260030123732


 64%|██████▎   | 994/1560 [08:54<04:57,  1.90it/s]

batch number 1000, loss = 0.13704254814031497


 71%|███████   | 1101/1560 [09:52<04:06,  1.86it/s]

batch number 1100, loss = 0.1358627050254037


 77%|███████▋  | 1201/1560 [10:46<03:26,  1.74it/s]

batch number 1200, loss = 0.12082580733181465


 83%|████████▎ | 1293/1560 [11:35<02:23,  1.87it/s]

batch number 1300, loss = 0.11358033737962096


 83%|████████▎ | 1301/1560 [11:39<02:19,  1.86it/s]

batch number 1300, loss = 0.11358033737962096


 89%|████████▉ | 1396/1560 [12:30<01:26,  1.90it/s]

batch number 1400, loss = 0.11501009773908434


 96%|█████████▌| 1499/1560 [13:26<00:32,  1.88it/s]

batch number 1500, loss = 0.13379260636978155


100%|█████████▉| 1553/1560 [13:55<00:03,  1.95it/s]

_____
epoch = 3


100%|██████████| 1560/1560 [13:58<00:00,  1.86it/s]
  6%|▌         | 87/1560 [00:44<14:53,  1.65it/s]

batch number 100, loss = 0.1259638114427787


 13%|█▎        | 200/1560 [01:42<13:17,  1.71it/s]

batch number 200, loss = 0.1280626847016082


 13%|█▎        | 201/1560 [01:43<13:32,  1.67it/s]

batch number 200, loss = 0.1280626847016082


 19%|█▉        | 297/1560 [02:32<10:38,  1.98it/s]

batch number 300, loss = 0.14976576035816194


 26%|██▌       | 399/1560 [03:25<09:43,  1.99it/s]

batch number 400, loss = 0.1377332238360325


 31%|███▏      | 489/1560 [04:11<09:06,  1.96it/s]

batch number 500, loss = 0.1232185779164525


 38%|███▊      | 600/1560 [05:09<08:11,  1.95it/s]

batch number 600, loss = 0.10962602559112666


 45%|████▍     | 698/1560 [05:59<07:19,  1.96it/s]

batch number 700, loss = 0.11305170553866187


 51%|█████     | 799/1560 [06:52<06:17,  2.01it/s]

batch number 800, loss = 0.11296659595460673


 58%|█████▊    | 900/1560 [07:43<06:20,  1.74it/s]

batch number 900, loss = 0.13306157210352432


 64%|██████▍   | 999/1560 [08:33<04:44,  1.97it/s]

batch number 1000, loss = 0.1290403365089107


 70%|███████   | 1098/1560 [09:24<03:52,  1.99it/s]

batch number 1100, loss = 0.1178917698794898


 71%|███████   | 1101/1560 [09:25<03:48,  2.01it/s]

batch number 1100, loss = 0.1178917698794898


 77%|███████▋  | 1199/1560 [10:16<03:01,  1.99it/s]

batch number 1200, loss = 0.11582364708779845


 83%|████████▎ | 1300/1560 [11:08<02:08,  2.02it/s]

batch number 1300, loss = 0.11378826777969954


 89%|████████▉ | 1396/1560 [11:57<01:22,  2.00it/s]

batch number 1400, loss = 0.10742776784429077


 90%|████████▉ | 1401/1560 [12:00<01:18,  2.02it/s]

batch number 1400, loss = 0.10742776784429077


 96%|█████████▌| 1491/1560 [12:46<00:41,  1.66it/s]

batch number 1500, loss = 0.12192008096630047


 96%|█████████▌| 1501/1560 [12:51<00:29,  1.98it/s]

batch number 1500, loss = 0.12192008096630047


100%|██████████| 1560/1560 [13:21<00:00,  1.95it/s]


Print here your accuracy on test set. It should be >97%. Don't forget to switch the network in 'evaluate' mode

In [89]:
model.evaluate()
y_pred = []
batch_size = 32
for i in tqdm(range(0, len(X_test) - batch_size, batch_size)):
    x_batch = X_test[i : i + batch_size][:, np.newaxis]   

    x_batch = x_orig
    y_pred += model.updateOutput(x_batch).tolist()

100%|██████████| 312/312 [01:07<00:00,  4.65it/s]
Use %enable_full_walk to serialize all variables correctly
  {name: self._state[name] for name in self._state.varnames() if not self._skip_variable(name)}


In [70]:
y_pred += model.updateOutput(X_test[len(y_pred):][:, np.newaxis]).tolist()

Use %enable_full_walk to serialize all variables correctly
  {name: self._state[name] for name in self._state.varnames() if not self._skip_variable(name)}


In [73]:
accuracy_score(y_test, np.argmax(y_pred, axis=1))

Use %enable_full_walk to serialize all variables correctly
  {name: self._state[name] for name in self._state.varnames() if not self._skip_variable(name)}


0.975