# Mini Mobilenet

In order to test the DPU model's separate convolution engines, this notebook trains a simple network featureing depthwise separable convolution.

Based on Assignment 3 code.

## Part 1
Fill in object class named `ConvNet` in `cnn_model.py`. You’ll define the network layers in the method `__init__` and the forward pass in the method `forward`. We create an instance of `ConvNet` in `train.py`, which takes a tensor with dimension $(B, 3, 32, 32)$ as an input, where $B$ is the batch size. The output should be a matrix of dimension $(B, 10)$ representing the confidence (before applying the softmax function) for the 10 categories. Your CNN should have 4 convolutional layers with ReLU, Dropout, Batch Normalization and Max pooling operations in each (if required), followed by a linear layer and then a classifier layer at the end. You can use the function `_initialize_weight` in `ConvNet` to initialize weights of the convolutional layers. You are allowed to choose stride, weight decay, dropout rate, filter kernel size, number of output feature maps for each layer and other hyper-parameters. FYI: A 4 layer network (written properly) should be able to obtain at least an accuracy of 65% on the test set.

In [None]:
########################################################################
# 2. DEFINE YOUR CONVOLUTIONAL NEURAL NETWORK
########################################################################

import torch.nn as nn
import torch.nn.functional as F

class DwConv(nn.Module):
	def __init__(self, in_channels, out_channels, kernel_size):
		super(DwConv, self).__init__()
		self.depthwise = nn.Conv2d(
			in_channels, in_channels,
			kernel_size=kernel_size, padding=kernel_size//2,
			groups=in_channels
		)
		self.pointwise = nn.Conv2d(
			in_channels, out_channels,
			kernel_size=1
		)
	def forward(self, x):
		return self.pointwise(
			self.depthwise(x)
		)

class ConvNet(nn.Module):
	def __init__(self, init_weights=False):
		super(ConvNet, self).__init__()
		self.conv1 = nn.Conv2d(3, 6, 5)
		self.pool = nn.MaxPool2d(2, 2)
		self.conv2 = DwConv(6, 16, 5)
		self.fc1 = nn.Linear(16 * 7 * 7, 120)
		self.fc2 = nn.Linear(120, 84)
		self.fc3 = nn.Linear(84, 10)

	def forward(self, x):
		x = self.pool(F.relu(self.conv1(x)))
		x = self.pool(F.relu(self.conv2(x)))
		x = x.view(-1, 16 * 7 * 7)
		x = F.relu(self.fc1(x))
		x = F.relu(self.fc2(x))
		out = self.fc3(x)
		return out

## Part 2
Fill in loss function with cross-entropy loss and write the code to obtain accuracy of current batch of data.

In [None]:
"""
---------------------------------------------------------------------
Training an image classifier
---------------------------------------------------------------------
For this assingment you'll do the following steps in order:
1. Load and normalizing the CIFAR10 training and test datasets using
   ``torchvision``
2. Define a Convolutional Neural Network (at least 4 conv layer)
3. Define a loss function
4. Train the network on the training data
5. Test the network on the test data
---------------------------------------------------------------------
"""

# IMPORTING REQUIRED PACKAGES
import os
import numpy as np
import scipy.io as sio
import torch
import torchvision
import torchvision.transforms as transforms


# DEFINE VARIABLE
BATCH_SIZE = 32
EPOCH_NUM = 100
LR = 0.001
MODEL_SAVE_PATH = './Models'

if not os.path.exists(MODEL_SAVE_PATH):
	os.mkdir(MODEL_SAVE_PATH)

# DEFINING TRANSFORM TO APPLY TO THE IMAGES
# YOU MAY ADD OTHER TRANSFORMS FOR DATA AUGMENTATION
transform = transforms.Compose([
	transforms.Resize(32),
	transforms.ToTensor(),
	transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])


########################################################################
# 1. LOAD AND NORMALIZE CIFAR10 DATASET
########################################################################

#FILL IN: Get train and test dataset from torchvision and create respective dataloader
trainset = torchvision.datasets.CIFAR10(
	root='./data', train=True,
	download=True, transform=transform
)
trainloader = torch.utils.data.DataLoader(
	trainset, batch_size=BATCH_SIZE
)

testset =torchvision.datasets.CIFAR10(
	root='./data', train=False,
	download=True, transform=transform
)
testloader = torch.utils.data.DataLoader(
	trainset, batch_size=BATCH_SIZE
)

########################################################################
# 2. DEFINE YOUR CONVOLUTIONAL NEURAL NETWORK AND IMPORT IT
########################################################################


device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net = ConvNet().to(device) #MAKE SURE TO DEFINE ConvNet IN A CELL ABOVE THE STARTER CODE OF WHICH IS IN cnn_model.py
#You can pass arguments to ConvNet if you want instead of hard coding them.


########################################################################
# 3. DEFINE A LOSS FUNCTION AND OPTIMIZER
########################################################################

import torch.optim as optim

#FILL IN : the criteria for ce loss
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=LR, momentum=0.9)

## Part 3
Fill in to obtain test accuracy over the entire test set and append it to the test accuracy variable.

In [None]:
########################################################################
# 4. TRAIN THE NETWORK
########################################################################

test_accuracy = []
test_loss = []
train_accuracy = []
train_loss = []
net.train()

for epoch in range(EPOCH_NUM):  # loop over the dataset multiple times

	running_loss = 0.0
	test_min_acc = 0
	total = 0
	correct = 0

	for i, data in enumerate(trainloader, 0):
		# get the inputs
		inputs, labels = data

		# zero the parameter gradients
		optimizer.zero_grad()

		# forward + backward + optimize
		outputs = net(inputs.to(device))
		loss = criterion(outputs, labels.to(device))
		loss.backward()
		optimizer.step()

		# print statistics
		running_loss += loss.item()
		_, predicted = torch.max(outputs.data, 1)

		# FILL IN: Obtain accuracy for the given batch of TRAINING data using
		# the formula acc = 100.0 * correct / total where
		# total is the toal number of images processed so far
		# correct is the correctly classified images so far

		total += predicted.shape[0]
		correct += np.sum((predicted.cpu() == labels).numpy())

		train_loss.append(running_loss/20)
		train_accuracy.append(100.0*correct/total)

		if i % 20 == 19:    # print every 20 mini-batches
			print(
				'Train: [%d, %5d] loss: %.3f acc: %.3f' %
				(epoch + 1, i + 1, running_loss / 20,100.0*correct/total)
			)
			running_loss = 0.0

	# TEST LEARNT MODEL ON ENTIRE TESTSET
	# FILL IN: to get test accuracy on the entire testset and append
	# it to the list test_accuracy

	running_loss = 0.0
	correct = 0
	total = 0
	net.eval()
	with torch.no_grad():
		for data in testloader:
			#
			inputs, labels = data
			outputs = net(inputs.to(device))
			#
			running_loss += loss.item()
			_, predicted = torch.max(outputs.data, 1)
	 		#
			total += predicted.shape[0]
			correct += np.sum((predicted.cpu() == labels).numpy())
	 		#
			test_loss.append(running_loss/20)
			test_accuracy.append(100.0*correct/total)

	net.train()

	test_ep_acc = test_accuracy[-1]
	print('Test Accuracy: %.3f %%' % (test_ep_acc))

	# SAVE BEST MODEL
	if test_min_acc < test_ep_acc:
		test_min_acc = test_ep_acc
		torch.save(net,MODEL_SAVE_PATH + '/my_best_model.pth')


# PLOT THE TRAINING LOSS VS EPOCH GRAPH
# PLOT THE TESTING ACCURACY VS EPOCH GRAPH
# PRINT THE FINAL TESTING ACCURACY OF YOUR BEST MODEL

print('Finished Training')

## Part 4
Plot training loss, train accuracy and test accuracy from the saved variables. Can you infer based on the plots, whether the model is overfitted, under-fitted or perfectly fitted?

In [None]:
import matplotlib.pyplot as plt
fig, axs = plt.subplots(2, 3, figsize=(15,8))
axs[0,0].set_title("Training Loss")
axs[0,0].plot(train_loss)
axs[0,1].set_title("Testing Loss")
axs[0,1].plot(test_loss)
axs[0,2].set_title("Testing/Training Loss")
axs[0,2].plot(np.array(test_loss) / np.array(train_loss))
axs[1,0].set_title("Training Accuracy")
axs[1,0].plot(train_accuracy)
axs[1,1].set_title("Testing Accuracy")
axs[1,1].plot(test_accuracy)
axs[1,2].set_title("Testing/Training Accuracy")
axs[1,2].plot(np.array(test_accuracy) / np.array(train_accuracy))

Model is well fitted because training and testing accuracy are very similar throughout the training process. Testing accuracy generally remains around 90% of training accuracy for most of the test, so there is slight over-fitting, but not enough to significantly impact model functionality.

In [None]:
from weight_extractor import extract_weights

extract_weights(net, "mininet_weights.json"))