# Homework: Not So Basic Artificial Neural Networks

Your task is to implement a simple framework for convolutional neural networks training. While convolutional neural networks is a subject of lecture 3, we expect that there are a lot of students who are familiar with the topic.

In order to successfully pass this homework, you will have to:

- Implement all the blocks in `homework_modules.ipynb` (esp `Conv2d` and `MaxPool2d` layers). Good implementation should pass all the tests in `homework_test_modules.ipynb`.
- Settle with a bit of math in `homework_differentiation.ipynb`
- Train a CNN that has at least one `Conv2d` layer, `MaxPool2d` layer and `BatchNormalization` layer and achieves at least 97% accuracy on MNIST test set.

Feel free to use `homework_main-basic.ipynb` for debugging or as source of code snippets. 

Note, that this homework requires sending **multiple** files, please do not forget to include all the files when sending to TA. The list of files:
- This notebook with cnn trained
- `homework_modules.ipynb`
- `homework_differentiation.ipynb`

In [11]:
%matplotlib inline
from time import time, sleep
import numpy as np
import matplotlib.pyplot as plt
from IPython import display

from tqdm import tqdm

In [12]:
# (re-)load layers
%run homework_modules.ipynb

In [13]:
# batch generator
def get_batches(dataset, batch_size):
    X, Y = dataset
    n_samples = X.shape[0]
        
    # Shuffle at the start of epoch
    indices = np.arange(n_samples)
    np.random.shuffle(indices)
    
    for start in range(0, n_samples, batch_size):
        end = min(start + batch_size, n_samples)
        
        batch_idx = indices[start:end]
    
        yield X[batch_idx], Y[batch_idx]

In [14]:
import mnist
X_train, y_train, X_val, y_val, X_test, y_test = mnist.load_dataset()  # your dataset

In [15]:
# Your turn - train and evaluate conv neural network

In [16]:
n_classes = 10

In [17]:
35*7*7

1715

# Добавить слой к батчнорму

In [111]:
nonlinearity = ReLU  # ELU

model = Sequential()

model.add(Conv2d(in_channels=1, out_channels=32, kernel_size=3))
model.add(nonlinearity())

model.add(Conv2d(in_channels=32, out_channels=64, kernel_size=3))
model.add(nonlinearity())

model.add(MaxPool2d(kernel_size=2))  # size: [batch, 60, 7, 7]
model.add(Dropout(p=0.25))

model.add(Flatten())  # size: [batch, 80 * 7 * 7]


model.add(Linear(n_in=64*14*14, n_out=128))
model.add(nonlinearity())
model.add(BatchNormalization(alpha=0.9))
model.add(ChannelwiseScaling(n_out=128))

model.add(Linear(n_in=128, n_out=n_classes))
model.add(LogSoftMax())

In [112]:
criterion = ClassNLLCriterion()

optimizer = adam_optimizer
state = {}  
config = {'learning_rate': 1e-2, 'beta1': 0.9, 'beta2':0.999, 'epsilon':1e-8}
variables = [[np.arange(10).astype(np.float64)]]
gradients = [[np.arange(10).astype(np.float64)]]


In [119]:
n_epochs = 15

batch_size = 2
print_every = 15

In [120]:
model.train()

for epoch in range(1, n_epochs):
    print(f"_____\nepoch = {epoch}")
    if not epoch % 2:
        config['learning_rate'] /= 10
    running_loss = 0
    for i in tqdm(range(len(X_train) // batch_size - 2)):
        x_orig = X_train[i * batch_size: (i + 1) * batch_size][:, np.newaxis]
        y_class_labels = y_train[i * batch_size: (i + 1) * batch_size]

        
        x_mean = np.mean(x_orig.reshape(batch_size, 1, -1), axis=-1, keepdims=True)[:, np.newaxis]
        x_std = np.std(x_orig.reshape(batch_size, 1, -1), axis=-1, keepdims=True)[:, np.newaxis]
        x_batch = (x_orig - x_mean) / x_std
#         x_batch = x_orig
        y_batch = np.zeros((batch_size, n_classes))
        y_batch[range(batch_size), y_class_labels] = 1
        
        log_preds = model.updateOutput(x_batch)
        loss = criterion.updateOutput(log_preds, y_batch)
        
        running_loss += loss
        grad = criterion.updateGradInput(log_preds, y_batch)
        model.backward(x_batch, grad)
        adam_optimizer(model.getParameters(), model.getGradParameters(), config, state)

        if not i % print_every and i:
            print(f"batch number {i}, loss = {running_loss / (print_every)}")
            running_loss = 0

  0%|          | 0/24998 [00:00<?, ?it/s]

_____
epoch = 1


  0%|          | 9/24998 [00:10<8:22:07,  1.21s/it]


KeyboardInterrupt: 

Print here your accuracy on test set. It should be >97%. Don't forget to switch the network in 'evaluate' mode

In [101]:
model.evaluate()
y_pred = []
batch_size = 32
for i in tqdm(range(0, len(X_test) - batch_size, batch_size)):
    x_orig = X_test[i : i + batch_size][:, np.newaxis]   
    x_mean = np.mean(x_orig.reshape(batch_size, 1, -1), axis=-1, keepdims=True)[:, np.newaxis]
    x_std = np.std(x_orig.reshape(batch_size, 1, -1), axis=-1, keepdims=True)[:, np.newaxis]
    x_batch = (x_orig - x_mean) / x_std
    y_pred += model.updateOutput(x_batch).tolist()

100%|██████████| 312/312 [00:32<00:00,  9.58it/s]


In [102]:
y_pred += model.updateOutput(X_test[len(y_pred):][:, np.newaxis]).tolist()

In [103]:
len(y_pred)

10000

In [104]:
from sklearn.metrics import accuracy_score

In [105]:
accuracy_score(y_test, np.argmax(y_pred, axis=1))

0.8398