# Module 1 - Implementing and training a neural network

## Environment verification
Start by confirming you have PyTorch, TorchVision and TensorBoard installed.


In [35]:
import torch
import torchvision
from torch.utils.tensorboard import SummaryWriter

## Question 1 - MLP evaluation

Import the example MLP:

In [36]:
from bobnet import BobNet

Create an instance of this model:

In [37]:
model1 = BobNet()

Get the training set and create a training data loader:

In [38]:
# get the training set
training_set = torchvision.datasets.MNIST("./data", train=True, download=True)

# define the batch size
MLP_BATCH_SIZE=4

# create the training loader
training_loader = torch.utils.data.DataLoader(training_set, batch_size=MLP_BATCH_SIZE, shuffle=True)

Get the validation set and create a validation data loader:

In [39]:
# get the training set
validation_set = torchvision.datasets.MNIST("./data", train=False, download=True)

# define the batch size
MLP_BATCH_SIZE=4

# create the training loader
validation_loader = torch.utils.data.DataLoader(validation_set, batch_size=MLP_BATCH_SIZE, shuffle=True)

Define the loss function and the optimizer:

In [40]:
loss_fn = torch.nn.CrossEntropyLoss()

# initialize the optimizer with the learning rate and momentum
MLP_LR=0.001
MLP_MOMENTUM=0.9

mlp_optimizer = torch.optim.SGD(model1.parameters(), lr=MLP_LR, momentum=MLP_MOMENTUM)

TensorBoard is a useful tool to analyze the model's performance during training. A good analysis of the model leads to informed and conscious hyper-parameter adjustements.

In [41]:
from datetime import datetime

# how many batches between logs
LOGGING_INTERVAL=500

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
writer = SummaryWriter("runs/mnist_mlp_{}".format(timestamp))

Run the training and validation:

In [42]:
MLP_EPOCHS=10

# for each epoch
for epoch in range(MLP_EPOCHS):
    
    print("EPOCH {}:".format(epoch+1))
    
    # enable gradient tracking
    model1.train(True)
    
    # the accumulated training loss of the epoch
    running_loss = 0.
    
    # average training loss
    avg_loss = 0.
    
    # average validation loss
    avg_vloss = 0

    # for each batch (epoch training)
    for i, data in enumerate(training_loader):
        inputs, labels = data
        
        # reset the optimizer gradients
        mlp_optimizer.zero_grad()
        
        # do the forward pass
        outputs = model1(inputs)
        
        # compute the loss and gradients
        loss = loss_fn(outputs, labels)
        loss.backward()
        
        # adjust the weights
        mlp_optimizer.step()
        
        # gather data
        running_loss += loss.item()
        if i % LOGGING_INTERVAL == LOGGING_INTERVAL - 1:
            last_loss = running_loss / LOGGING_INTERVAL
            print("batch {} loss: {}".format(i+1, last_loss))
            # tensorboard logger "x" axis value
            tb_x = epoch * len(training_loader) + i + 1
            writer.add_scalar("Loss/train", last_loss, tb_x)
            running_loss = 0
            
    # evaluate
    model1.eval()
    
    running_vloss = 0.
    
    # disable gradient computation during evaluation
    with torch.no_grad():
        for i, vdata in enumerate(validation_loader):
            vinputs, vlabels = vdata
            voutputs = model1(vinputs)
            vloss = loss_fn(voutputs, vlabels)
            
# TODO: review and finish loss calculations
    

EPOCH 1:


TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found <class 'PIL.Image.Image'>

### Questions
Explore the architecture on the script `mod1/bobnet.py`.
1. Why does the input layer have 784 input features?
2. Why does the output layer have 10 output features?

## Question 2 - CNN implementation

Head over to the `cnn.py` file and implement the following architecture:
- Convolutional layer with 6 output channels, 5x5 kernel size, stride 1 and tanh activation
- Average pooling layer with 6 output channels, 2x2 kernel size, stride 2 and tanh activation
- Convolutional layer with 16 output channels, 5x5 kernel size, stride 1 and tanh activation
- Average pooling layer with 16 output channels, 2x2 kernel size, stride 2 and tanh activation
- Convolutional layer with 120 output channels, 5x5 kernel size, stride 1 and tanh activation
- Fully-connected layer with 84 neurons and tanh activation
- Fully-connected layer with 10 neurons and softmax activation

Now, import the model:

In [None]:
from cnn import CNN

Create an instance of this model:

In [None]:
model2 = CNN()

Train the model:

In [None]:
# TODO: run training

Test the model:

In [None]:
# TODO: run testing