# Your name: 
your-name

# Assigned reading:
Chapters 1-3 of textbook

# Part 1: MNIST: PyTorch

To get you started we have provided a "deep" network in this notebook. Run the ipython
notebook as you go along and answer the questions. Most questions below have a *small* coding component (you only need to edit code when specifically asked to).

## Import libraries and dataset

In [None]:
import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import time, os
import matplotlib.pyplot as plt
import numpy as np
from torch.utils.data.sampler import SubsetRandomSampler
%matplotlib inline

num_epochs = 10

Import the MNIST dataset

In [None]:
transform = torchvision.transforms.Compose(
    [torchvision.transforms.ToTensor(),
     torchvision.transforms.Normalize((0.5,), (0.5,))])

trainset = torchvision.datasets.MNIST(root='./data', train=True,
                                        download=True, transform=transform)

testset = torchvision.datasets.MNIST(root='./data', train=False,
                                       download=True, transform=transform)

num_train = len(trainset)
indices = list(range(num_train))
split = 10000

# shuffle data
np.random.seed(6825)
np.random.shuffle(indices)

train_idx, valid_idx = indices[split:], indices[:split]
train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=50, sampler=train_sampler, shuffle=False)

validloader = torch.utils.data.DataLoader(trainset, batch_size=50, sampler=valid_sampler, shuffle=False)

testloader = torch.utils.data.DataLoader(testset, batch_size=50, shuffle=False)

## Set up the network

Network description

In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 4, 5, padding = 2)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(4, 8, 5, padding = 2)
        self.fc1 = nn.Linear(8 * 7 * 7, 256)
        self.fc2 = nn.Linear(256, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 8 * 7 * 7)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

net = Net()

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.0001)

## Network Analysis
### Question 1.

Read the code above and determine the type of each of the layers (convolutional, fully-connected, ReLU, or pooling) and then answer the following questions by updating the cell below:

1. How many layers deep is the network and what is the type of each layer (count nonlinear activations as their own layer, but do not count flattening)?
 
2. What is the size of the network input? Your answer should be a 4-D tuple `(batch_size, width, height, channels)`
3. Describe the parameters used in each layer:
    1. Convolutional layers: Specify a 4-tuple `(weight_width, weight_height, channels, filter_count)`
    2. Fully connected layers: Specify a 2-tuple `(num_output_nodes, num_input_nodes)`
    3. Pool layer: Specify `(x_window, y_window, x_stride, y_stride)`
    4. ReLU: Specify `None`

In [None]:
# update
num_layers = 4
layer_type = ['fc', 'pool', 'conv', 'relu']
network_input_sz = (0, 0, 0, 0)

layer_param = [
    (0, 0),
    (0, 0, 0, 0),
    (0, 0, 0, 0),
    None
]

### Question 2.

One way of finding the layer input sizes is simply by inspection. Since the inputs of a subsequent layer are the outputs of a previous layer, we can also compute the size of these outputs based on the inputs sizes and weight parameters. Complete the `get_output_size` function to do this.

In [None]:
def get_output_size(input_sz, layer_type, layer_param):
    if layer_type == 'conv':
        # Your code here   
        # Return format: (batch_size, width, height, channels)
        return (0, 0, 0, 0)
    elif layer_type == 'pool':
        # Your code here
        # Return format: (batch_size, width, height, channels)
        return (0, 0, 0, 0)
    elif layer_type == 'fc':
        # Your code here
        # Return format: (batch_size, num_outputs)
        return (0, 0)
    elif layer_type == 'relu':
        # Your code here
        # Return format: Input-dependent tuple 
        return None

layer_sz = []
print("Input  : ", network_input_sz)
for n in range(num_layers):
    if n == 0:
        layer_sz.append(get_output_size(network_input_sz, layer_type[n], layer_param[n]))
    else:
        layer_sz.append(get_output_size(layer_sz[n-1], layer_type[n], layer_param[n]))
    print("Layer %d: " % (n+1), layer_sz[n])

### Question 3.

Next, complete the `num_params` and `param_memory_size` functions to calculate the number of weights (i.e. parameters, including biases) required in each layer and the memory required for storing the weights (including biases) respectively. Assume weight is stored in single precision floating point format.

In [None]:
def num_params(layer_type, layer_param):
    if layer_type == 'conv':
        # Your code here
        # Return format: (number_of_params)
        return 0
    elif layer_type == 'pool':
        # Your code here
        # Return format: (number_of_params)
        return 0
    elif layer_type == 'fc':
        # Your code here
        # Return format: (number_of_params)
        return 0
    elif layer_type == 'relu':
        # Your code here
        # Return format: (number_of_params)
        return 0

# Required memory in bytes
def param_memory_size(layer_type, layer_param):
    # Your code here
    # Return format: (mem_size_for_params)
    return 0

layer_params_mem = []
for n in range(num_layers):
    layer_params_mem.append(param_memory_size(layer_type[n], layer_param[n]))
    print("Layer %d: " % (n+1), layer_params_mem[n])

### Question 4.

Determine the number of multiplications required per _batch_. Multiplications by zero should still be counted.

In [None]:
def num_mult(input_sz, layer_type, layer_param):
    if layer_type == 'conv':
        # Your code here
        # Return format: (number_of_mult)
        return 0
    elif layer_type == 'pool':
        # Your code here
        # Return format: (number_of_mult)
        return 0
    elif layer_type == 'fc':
        # Your code here
        # Return format: (number_of_mult)
        return 0
    elif layer_type == 'relu':
        # Your code here
        # Return format: (number_of_mult)
        return 0

layer_mult_count = []
for n in range(num_layers):
    if n == 0:
        layer_mult_count.append(num_mult(network_input_sz, layer_type[n], layer_param[n]))
    else:
        layer_mult_count.append(num_mult(layer_sz[n-1], layer_type[n], layer_param[n]))
    print("Layer %d: " % (n+1), layer_mult_count[n])

### Network Summary

Run this cell to summarize your results

In [None]:
print("Network Summary:")
print("Layer\tType\tInput Size\tWeight Param\tOutput Size\tWeight Memory\t#mult")
for layer_idx in range(num_layers):
    print("%d\t%s\t%s\t%s\t%s\t%s\t%s" % (
            (layer_idx+1),
            layer_type[layer_idx], 
            str(network_input_sz if layer_idx == 0 else layer_sz[layer_idx-1]).ljust(12), 
            str(layer_param[layer_idx]).ljust(12), 
            str(layer_sz[layer_idx]).ljust(12), 
            str(layer_params_mem[layer_idx]).ljust(12),
            str(layer_mult_count[layer_idx]).ljust(12)
        ))

## Training the network

The code below trains for `num_epochs` number of epochs and plots the training error. If you want (not graded), you can update this function to calculate the validation accuracy after each epoch so that you can plot it later. You can then play around with the number of epochs to see how it affects the validation accuracy (also not graded).

Note: The initialization below is not strictly necessary, as PyTorch will automoatically initialize the weights (including biases) for you. We've included initialization here so that if you run the cell more than once, you will start fresh.

In [None]:
training_acc_vect = np.zeros(num_epochs)
valid_acc_vect = np.zeros(num_epochs)

start_time = time.time()

# initialize weights and biases
nn.init.kaiming_uniform_(net.conv1.weight, nonlinearity = 'relu')
stdv = 1./np.sqrt(net.conv1.bias.size(0))
nn.init.uniform_(net.conv1.bias, -stdv, stdv)
nn.init.kaiming_uniform_(net.conv2.weight, nonlinearity = 'relu')
stdv = 1./np.sqrt(net.conv2.bias.size(0))
nn.init.uniform_(net.conv2.bias, -stdv, stdv)
nn.init.kaiming_uniform_(net.fc1.weight, nonlinearity = 'relu')
stdv = 1./np.sqrt(net.fc1.bias.size(0))
nn.init.uniform_(net.fc1.bias, -stdv, stdv)
nn.init.kaiming_uniform_(net.fc2.weight, nonlinearity = 'relu')
stdv = 1./np.sqrt(net.fc2.bias.size(0))
nn.init.uniform_(net.fc2.bias, -stdv, stdv)

# train network
for epoch in range(num_epochs):  # loop over the dataset multiple times while training
    correct_train = 0
    total_train = 0
    
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data # list of [inputs, labels]
        #print(labels.shape)
        optimizer.zero_grad() # clear gradients
        outputs = net(inputs) # forward step
        loss = criterion(outputs, labels)
        loss.backward() # backprop
        optimizer.step() # optimize weights

        # print statistics
        duration = time.time() - start_time
        _, predicted = torch.max(outputs.data, 1)
        total_train += labels.size(0)
        correct_train += (predicted == labels).sum().item()
    
    training_acc = correct_train / total_train * 100
    training_acc_vect[epoch] = training_acc
    
    print('Accuracy of the network on the 50000 training images after epoch %d: %.2f %% (%.1f sec)' % (
        epoch + 1, training_acc, duration))
    
    # your code here to calculate the validation error after each epoch
    
print('Finished Training')

In [None]:
epoch_vect = np.linspace(1, num_epochs, num_epochs)

plt.figure(1)
plt.plot(epoch_vect, 100-training_acc_vect)

print("Final Training Accuracy: %g" % (training_acc))

# Your code here to plot validation error


### Question 5.

(Theory question, no coding component.) You have trained a classifier to recognize handwritten digits with a training set of black digits on a white background. You then give it test images with white digits on a black background, however, it doesn't seem to perform the classification correctly. Please list the principles that could explain why it doesn't work.


-your explanation here-



## Exporting and loading the model, performing inference

The following 2 lines save the model to the location specified by PATH.

In [None]:
PATH = './my_mnist_net.pth'
torch.save(net.state_dict(), PATH)

### Question 6.

Run inference *using the saved model* and print the training, validation and test accuracies

In [None]:
loaded_net = Net()

def eval_model(PATH, trainloader, validloader, testloader):

    # your code here
    training_accuracy = 0
    validation_accuracy = 0
    test_accuracy = 0
        
    return training_accuracy, validation_accuracy, test_accuracy

training_accuracy, validation_accuracy, test_accuracy = eval_model(PATH, trainloader, validloader, testloader)
print('Training Accuracy: %g' % training_accuracy)        
print('Validation Accuracy: %g' % validation_accuracy)
print('Test Accuracy: %g' % test_accuracy)

### Question 7.

Edit the code below to save the weights and biases for the convolutional and fully-connected layers only. They should be saved as numpy arrays into a dictionary.

In [None]:
model_parameters = {}

# Your code here
# format (not necessarily as a for loop):
# for i in range(num_layers):
#     model_parameters['l' + str(i) + '_w'] = np.array([]) # _w for weights or _b for biases

np.save('my_model_parameters.npy', model_parameters, allow_pickle=True)

### Question 8.
We will evaluate the inference-time energy and latency of the neural network we trained above on the model of a custom acclerator design. The profiler uses the [timeloop/accelergy](http://accelergy.mit.edu/tutorial.html) commands, which we will cover more in-depth in the later part of this course, on the convolution/fully-connected layers in the neural network to obtain the energy estimates. This part may take several minutes to run.

In [None]:
!accelergyTables

In [None]:
from profiler import Profiler
profiler = Profiler(
    top_dir='workloads',
    sub_dir='mnist',
    timeloop_dir='simple_weight_stationary',
    model=net,
    input_size=(1, 28, 28),
    batch_size=1,
    convert_fc=True,
    exception_module_names=[]
)

results = profiler.profile()

total_energy = 0
total_cycle = 0

for layer_id, info in results.items():
    print(f"ID: {layer_id} \t Energy: {info['energy']} \t Cycle: {info['cycle']} \t Number of same architecture layers: {info['num']}")
    total_energy += info['energy'] * info['num']
    total_cycle += info['cycle'] * info['num']
    
print(f'\nTotal Energy: {total_energy} pj \nTotal Cycles: {total_cycle}')

Report the total energy and cycles below:

Energy:


-your answers here-



Cycles:


-your answers here-

