**Assignment Details** - 

Write a neural network that can:
take 2 inputs:
an image from the MNIST dataset (say 5), and
a random number between 0 and 9, (say 7)
and gives two outputs:
the "number" that was represented by the MNIST image (predict 5), and
the "sum" of this number with the random number and the input image to the network (predict 5 + 7 = 12)
     
you can mix fully connected layers and convolution layers
you can use one-hot encoding to represent the random number input and the "summed" output.
Random number (7) can be represented as 0 0 0 0 0 0 0 1 0 0
Sum (13) can be represented as:
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0b1101 (remember that 4 digits in binary can at max represent 15, so we may need to go for 5 digits. i.e. 10010
Your code MUST be:
well documented (via readme file on GitHub and comments in the code)
must mention the data representation
must mention your data generation strategy (basically the class/method you are using for random number generation)
must mention how you have combined the two inputs (basically which layer you are combining)
must mention how you are evaluating your results 
must mention "what" results you finally got and how did you evaluate your results
must mention what loss function you picked and why!
training MUST happen on the GPU
Accuracy is not really important for the SUM
Once done, upload the code with short training logs in the readme file from colab to GitHub, and share the GitHub link (public repository)

In [1]:
# Adding Necessary Imports
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import matplotlib.pyplot as plt
from torch.utils.data import Dataset
import random # added for handling random number generation related stuffs

import warnings
warnings.filterwarnings("ignore") 

In [2]:
# Trying to use cuda as device if its available
def get_default_device():
  if torch.cuda.is_available():
      return torch.device('cuda')
  else:
      return torch.device('cpu')
device = get_default_device()
device

device(type='cuda')

In [3]:
# Let's download MNIST data for train and test
# with some default image transformation applied

image_transform = torchvision.transforms.Compose([
                               torchvision.transforms.ToTensor(),
                               torchvision.transforms.Normalize(
                                 (0.13,), (0.31,))
                             ])

mnist_train_dataset = torchvision.datasets.MNIST('dataset/', 
                                           train=True, 
                                           download=True,
                                           transform=image_transform)

mnist_test_dataset = torchvision.datasets.MNIST('dataset/', 
                                          train=False, 
                                          download=True,
                                          transform=image_transform)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to dataset/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting dataset/MNIST/raw/train-images-idx3-ubyte.gz to dataset/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to dataset/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting dataset/MNIST/raw/train-labels-idx1-ubyte.gz to dataset/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to dataset/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting dataset/MNIST/raw/t10k-images-idx3-ubyte.gz to dataset/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to dataset/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting dataset/MNIST/raw/t10k-labels-idx1-ubyte.gz to dataset/MNIST/raw



In [5]:
# We would be building custom mnist dataset where we would put the random number and calculated final sum
# so that errors can be calculated either at individual record level or at batch.
class CustomMNISTDataset(Dataset):
  def __init__(self, mnist_train_data, mnist_train_label, train_random_numbers):
    self.mnist_train_data = mnist_train_data
    self.mnist_train_label = mnist_train_label
    self.train_random_numbers = train_random_numbers
    self.train_final_label = self.mnist_train_label + train_random_numbers

  def __getitem__(self, index):
    return self.mnist_train_data[index].unsqueeze(0), self.train_random_numbers[index], self.mnist_train_label[index], self.train_final_label[index]

  def __len__(self):
    return len(self.mnist_train_data)

# Instantiating custom dataset and defining data loader
customMNISTDataset = CustomMNISTDataset(mnist_train_dataset.train_data,
                                        mnist_train_dataset.train_labels,
                                        torch.randint(0, 9, (len(mnist_train_dataset.train_data),)))
train_loader = torch.utils.data.DataLoader(customMNISTDataset, batch_size = 32, shuffle=True, num_workers=4, pin_memory=True)
# we specified sub processes for data loading for option for parallelism, also specified pin_memory for cuda case

# Custom reusable method to transfer data to the specified device
def to_device(data, device):
  if isinstance(data, (list,tuple)):
      return [to_device(x, device) for x in data]
  return data.to(device, non_blocking=True)

# Transferring data to the available device
for images, randomNumbers, mnist_labels, labels in train_loader:
  images = to_device(images, device)
  randomNumbers = to_device(randomNumbers, device)
  mnist_labels = to_device(mnist_labels, device)
  labels = to_device(labels, device)

In [12]:
# Defining the network - 
class Network(nn.Module):
  def __init__(self):
    super().__init__()

    # Layers working on input 1, that's the MNIST image
    #-----------------------------------------------------
    self.firstNN_conv1 = nn.Conv2d(1, 10, kernel_size=5, stride=1, bias=False) # First convolution with one input (gray) and 10 output channels
    self.firstNN_conv1_drop = nn.Dropout2d() # Drop out layer, I have used here but sometimes we refrain doing it at the first layer
    self.firstNN_conv2 = nn.Conv2d(10, 20, kernel_size=5, stride=1, bias=False) # Second convolution with ten input and 20 output channels (kernel=> 5*5)
    self.firstNN_conv2_drop = nn.Dropout2d() # Drop out layer
    self.firstNN_fc1 = nn.Linear(320, 100) # Fully Connected Layer 1, 320 as input and 100 as output
    self.firstNN_fc2 = nn.Linear(100, 50) # Fully Connected Layer 2, 100 as input and 50 as output
    self.firstNN_fc3 = nn.Linear(50, 10) # Fully Connected Layer 3, 50 as input and, 10 as output
    #----------------------------------------------------

    # Layers working on input 2, that's random number, between 0 and 9
    #-----------------------------------------------------
    self.secondNN_linear1 = nn.Linear(10, 30) # Input channel = 10, output channel = 30
    self.secondNN_linear2 = nn.Linear(30, 10) # Input channel = 30, output channel = 10
    self.secondNN_linear3 = nn.Linear(20, 19) # Input channel = 20, output channel = 19
    #----------------------------------------------------

    #----------------------------------------------------
    # Later we would be doing concatenation on the above outputs (while working at forward method)
    #----------------------------------------------------

  def forward(self, input1, input2):

    
    # Forward steps for first input (input1 (individual) image dimension expected => 28*28 gray scale images)
    #-----------------------------------------------------
    input1 = self.firstNN_conv1(input1) # (28 - 5 + 1) * (28 - 5 + 1) * 10 = 24 * 24 * 10
    input1 = self.firstNN_conv1_drop(input1) # 24 * 24 * 10
    input1 = F.max_pool2d(input1, 2) # (24 // 2) * (24 // 2) * 10 = 12 * 12 * 10
    input1 = F.relu(input1) # 12 * 12 * 10
    input1 = self.firstNN_conv2(input1) # (12 - 5 + 1) * (12 - 5 + 1) * 20 = 8 * 8 * 20
    input1 = self.firstNN_conv2_drop(input1) # 8 * 8 * 20
    input1 = F.max_pool2d(input1, 2) # 4 * 4 * 20
    input1 = F.relu(input1) # 4 * 4 * 20
    input1 = input1.view(-1, 4 * 4 * 20)
    input1 = self.firstNN_fc1(input1) # (4 * 4 * 20) => 320 as input dimension, 100 as output
    input1 = F.relu(input1) # 100
    input1 = F.dropout(input1) # 100
    input1 = self.firstNN_fc2(input1) # 100 as input dimension, 50 as output
    input1 = F.relu(input1) # 50
    input1 = F.dropout(input1) # 50
    input1 = self.firstNN_fc3(input1) # 50 as input dimension, 10 as output
    #-----------------------------------------------------

    
    # Steps for getting second output 
    #-----------------------------------------------------

    # gotten the following error - module 'torch.cuda' has no attribute 'int64'
    input2 = input2.to('cpu')
    input2 = F.one_hot(input2.to(torch.int64), num_classes = 10)
    input2.to(device)
    input2 = self.secondNN_linear1(input2.type(torch.cuda.FloatTensor))
    # 10 as input dimension, 30 as output

    
    input2 = F.relu(input2) # 30
    input2 = self.secondNN_linear2(input2) # 30 as input dimension, 10 as output
    #-----------------------------------------------------

    
    x = torch.cat((input1, input2), dim=1) # so next input dimension would be 10 + 10 = 20
    x = F.relu(x).to(device)
    x = self.secondNN_linear3(x) # 20 as input dimension, 19 as output
    # Why 19 ? because maximum sum result could be 9 + 9, and minimum could be 0 + 0 as 0. So the nodes to represent 
    # the result would be 0 to 18, thats 19 in numbers
    return F.log_softmax(input1), F.log_softmax(x)
    

In [13]:
# Function taken from 24th december class notes, to get correct number of predictions
def get_num_correct(preds, labels):
  return preds.argmax(dim=1).eq(labels).sum().item()

In [14]:
# referred class notes of 24th december, for calculating batchwise errors

network = Network().to(device) # Instantiating our neural network class, copied to the possible cuda device
optimizer = optim.Adam(network.parameters(), lr=0.01) # Using Adam as optimizer, with learning rate as 0.01

epochs = 200
print_after_n = 20

for epoch in range(epochs): # defining the epoch, how many times we would see through the entire training set

    total_loss = 0
    total_correct = 0

    for batch in train_loader: # Get Batch
        images, randomNumbers, mnist_labels, labels = batch 

        if device.type == 'cuda':
          preds_image, preds_final = network(images.type(torch.cuda.FloatTensor), randomNumbers.type(torch.cuda.FloatTensor)) # Pass Batch
        else:
          preds_image, preds_final = network(images.type(torch.FloatTensor), randomNumbers.type(torch.FloatTensor)) # Pass Batch

        loss = F.cross_entropy(preds_image, mnist_labels.to(device)) + F.cross_entropy(preds_final, labels.to(device)) # Calculate Loss

        optimizer.zero_grad()
        loss.backward() # Calculate Gradients
        optimizer.step() # Update Weights

        total_loss += loss.item()

    if epoch % print_after_n == 0:
      print("epoch", epoch, "loss:", total_loss)

epoch 0 loss: 8819.445842266083
epoch 20 loss: 8652.41888666153
epoch 40 loss: 8663.875659942627
epoch 60 loss: 8676.743039608002
epoch 80 loss: 8708.55082321167
epoch 100 loss: 8702.191983222961
epoch 120 loss: 8703.95646905899
epoch 140 loss: 8699.741447925568
epoch 160 loss: 8699.91216993332
epoch 180 loss: 8701.500436782837


Looking at how the loss has reduced, its obvious that any early stopping usage like restricting continuation if loss hasn't decreased below a certain specified level might help reduce overfitting and the amount of compute usage.