# Homework Exercises
**Due: 23th Feb, 11:59pm**

**Name: Sharryl Seto (1005523)**

**Class: Cl03**
<br>
<br>
Based on the same FashionMNIST dataset, work on the following tasks below. Submit your homework as either: (i) an ipynb file with your results inside; or (ii) a python file and separate pdf discussing your results.

(a) Develop a new feed-forward neural network that contains 3 hidden layers, with hidden layers 1, 2, 3 being of dimensions 512, 256, 128, respectively. Hidden layer 1 is the layer immediately after the input layer, while hidden layer 3 is the one just before the output layer.

(b) Experiment with three different activation functions and two different optimizers. Report your results and discuss your findings.

(c) Building upon Task b above, describe and implement two approaches to improve upon the best variation from Task b. Report your results and discuss your findings.


# Setting up the notebook on colab

Let's check if we are using the GPU environment and cuda is installed:

In [1]:
# Import PyTorch and other libraries
import torch
import numpy as np
from tqdm import tqdm

print("PyTorch version:")
print(torch.__version__)
print("GPU Detected:")
print(torch.cuda.is_available())

#defining a shortcut function for later:
import os
using_GPU = os.path.exists('/opt/bin/nvidia-smi')

PyTorch version:
2.2.0+cu121
GPU Detected:
True


In [2]:
# torch nn module
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

In [3]:
# optimisers
import torch.optim as optim

## Loading Data

We'll start by loading the data with `torchvision` --- knowing how to use torchvision isn't the point of this tutorial, so it's relatively unannotated.

In [4]:
!pip install torchvision==0.17 #note: you can find compatible torch/torchvision versions here: https://github.com/pytorch/vision#installation
import torchvision
from torchvision.datasets import FashionMNIST

train_dataset = FashionMNIST(root='./torchvision-data',
                             train=True,
                             transform=torchvision.transforms.ToTensor(),
                             download=True)

test_dataset = FashionMNIST(root='./torchvision-data', train=False,
                            transform=torchvision.transforms.ToTensor())



  _torch_pytree._register_pytree_node(


Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./torchvision-data/FashionMNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 26421880/26421880 [00:00<00:00, 109958006.70it/s]


Extracting ./torchvision-data/FashionMNIST/raw/train-images-idx3-ubyte.gz to ./torchvision-data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./torchvision-data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 29515/29515 [00:00<00:00, 12464245.12it/s]


Extracting ./torchvision-data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to ./torchvision-data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./torchvision-data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 4422102/4422102 [00:00<00:00, 67238624.56it/s]


Extracting ./torchvision-data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ./torchvision-data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./torchvision-data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 5148/5148 [00:00<00:00, 2212550.16it/s]


Extracting ./torchvision-data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ./torchvision-data/FashionMNIST/raw



`train_dataset` and `test_dataset` are both subclasses of PyTorch's `torch.utils.data.Dataset`. The main benefit of subclassing this abstract class is that we can use `torch.utils.data.DataLoader`s to handle batching our examples and iterating over them. We'll create `DataLoader`s for our datasets now.

In [5]:
from torch.utils.data import DataLoader

# Data-related hyperparameters
batch_size = 64

# Set up a DataLoader for the training dataset.
train_dataloader = DataLoader(
    dataset=train_dataset, batch_size=batch_size, shuffle=True)

# Set up a DataLoader for the test dataset.
test_dataloader = DataLoader(
    dataset=test_dataset, batch_size=batch_size)

Let's take a look at what's inside our datasets. `torch.utils.data.Dataset`s are indexable, so we can easily peek inside.

In [None]:
# Print the first training example
# print(train_dataset[0])

From this output, we can see the dataset elements are tuple of `(data_tensor, label)`. `data_tensor` is a `FloatTensor` of shape `(1, 28, 28)` (since the image is 28x28), and `label` is an integer from 0 to 9 (since there are 10 classes in the data).

Let's similarly look at what the `DataLoader` produces.

In [None]:
#list(train_dataloader)[0]

As we can see, the `DataLoader` groups examples into batches of size `batch_size` (64 by default in the code above). Thus, the shape of the returned tensor is `(64, 1, 28, 28)`, since we essentially stacked `batch_size` examples together. Similarly, `labels` is now a `LongTensor` of size `batch_size`.

Note that the label for a single example was a Python `int` --- the dataloader automatically grouped them into a `LongTensor` of the appropriate size.

# Part (a)
(a) Develop a new feed-forward neural network that contains 3 hidden layers, with hidden layers 1, 2, 3 being of dimensions 512, 256, 128, respectively. Hidden layer 1 is the layer immediately after the input layer, while hidden layer 3 is the one just before the output layer.

## Building the model
Now we can construct a `FeedForwardNN` instance that we'll train. Each FashionMNIST example is `28x28`, so we get it as a Tensor of shape `(28, 28)`.

We'll flatten out each example to a vector of size `(784,)` for compatibility with our model.

In [None]:
class FeedForwardNN_a(nn.Module):
  # input_size: Dimensionality of input feature vector.
  # num_classes: The number of classes in the classification problem.
  # num_hidden: The number of hidden (intermediate) layers to use.
  # hidden_dim: The size of each of the hidden layers.
  # dropout: The proportion of units to drop out after each layer.
  def __init__(self, input_size, num_classes, num_hidden, dropout):
    # Always call the superclass (nn.Module) constructor first!
    super(FeedForwardNN_a, self).__init__()

    # Set up the hidden layers.
    assert num_hidden > 0
    # A special ModuleList to store our hidden layers.
    self.hidden_layers = nn.ModuleList([])
    # First hidden layer maps from input_size -> num_hidden.
    self.hidden_layers.append(nn.Linear(input_size, 512))
    # Subsequent hidden layers map from num_hidden -> num_hidden.
    # Note that they can map to any dimensionality --- as long as the final
    # output is a distribution over your classes!
    # for i in range(num_hidden - 1):
    self.hidden_layers.append(nn.Linear(512, 256))
    self.hidden_layers.append(nn.Linear(256, 128))

    # Set up the dropout layer.
    self.dropout = nn.Dropout(dropout)

    # Set up the final transform to a distribution over classes.
    self.output_projection = nn.Linear(128, num_classes)

    # Set up the nonlinearity to use between layers.
    self.nonlinearity = nn.ReLU()

  # Forward's sole argument is the input.
  # input is of shape (batch_size, input_size)
  def forward(self, x):
    # Apply the hidden layers, nonlinearity, and dropout.
    for hidden_layer in self.hidden_layers:
      x = hidden_layer(x)
      x = self.dropout(x)
      x = self.nonlinearity(x)

    # Output layer: project x to a distribution over classes.
    out = self.output_projection(x)

    # Softmax the out tensor to get a log-probability distribution
    # over classes for each example.
    out_distribution = F.log_softmax(out, dim=-1)
    return out_distribution

In [None]:
ffnn_clf_a = FeedForwardNN_a(input_size=784, num_classes=10, num_hidden=3,
                         dropout=0.2)
print(ffnn_clf_a)

parameters = ffnn_clf_a.parameters()

print("Shapes of model parameters:")
print([x.size() for x in list(parameters)])

FeedForwardNN_a(
  (hidden_layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): Linear(in_features=256, out_features=128, bias=True)
  )
  (dropout): Dropout(p=0.2, inplace=False)
  (output_projection): Linear(in_features=128, out_features=10, bias=True)
  (nonlinearity): ReLU()
)
Shapes of model parameters:
[torch.Size([512, 784]), torch.Size([512]), torch.Size([256, 512]), torch.Size([256]), torch.Size([128, 256]), torch.Size([128]), torch.Size([10, 128]), torch.Size([10])]


If we're using a GPU, we'll move the model to the GPU which should speed up training. We do this with the same `.cuda()` method we used for Tensors.

In [None]:
if using_GPU:
  ffnn_clf_a = ffnn_clf_a.cuda()

# Check if the Module is on GPU by checking if a parameter is on GPU
print("Model on GPU?:")
print(next(ffnn_clf_a.parameters()).is_cuda)

Model on GPU?:
True


## Construct other classes we need for training: loss and optimizer

Now, we'll set up a criterion for calculating the loss and an Optimizer for updating our parameters.

In [None]:
# Set up criterion for calculating loss
nll_criterion = nn.NLLLoss()

lr = 0.1
# Set up an optimizer for updating the parameters of fashionmnist_ffnn_clf
ffnn_optimizer = optim.SGD(ffnn_clf_a.parameters(),
                           lr=lr)

## Train function!

Now, we'll implement the procedure to train the model --- this is typically called the "train loop" since we loop over our batches, performing the forward pass, calculating a loss, backpropping, and then updating our parameters. This is the bulk of the code necessary to train the model.

In [10]:
# train function
def train_model(num_epochs, num_iter, fashionmnist_ffnn_clf):
    # Iterate `num_epochs` times.
  for epoch in range(num_epochs):
    print("Starting epoch {}".format(epoch + 1))
    # Iterate over the train_dataloader, unpacking the images and labels
    for (images, labels) in train_dataloader:
      # Reshape images from (batch_size, 1, 28, 28) to (batch_size, 784), since
      # that's what our model expects. Remember that -1 does shape inference!
      reshaped_images = images.view(-1, 784)

      # Wrap reshaped_images and labels in Variables,
      # since we want to calculate gradients and backprop.
      reshaped_images = Variable(reshaped_images)
      labels = Variable(labels)

      # If we're using the GPU, move reshaped_images and labels to the GPU.
      if using_GPU:
        reshaped_images = reshaped_images.cuda()
        labels = labels.cuda()

      # Run the forward pass through the model to get predicted log distribution.
      # predicted shape: (batch_size, 10) (since there are 10 classes)
      predicted = fashionmnist_ffnn_clf(reshaped_images)

      # Calculate the loss
      batch_loss = nll_criterion(predicted, labels)

      # Clear the gradients as we prepare to backprop.
      ffnn_optimizer.zero_grad()

      # Backprop (backward pass), which calculates gradients.
      batch_loss.backward()

      # Take a gradient step to update parameters.
      ffnn_optimizer.step()

      # Increment gradient update counter.
      num_iter += 1

      # Calculate test set loss and accuracy every 500 gradient updates
      # It's standard to have this as a separate evaluate function, but
      # we'll place it inline for didactic purposes.
      if num_iter % 500 == 0:
        # Set model to eval mode, which turns off dropout.
        fashionmnist_ffnn_clf.eval()
        # Counters for the num of examples we get right / total num of examples.
        num_correct = 0
        total_examples = 0
        total_test_loss = 0

        # Iterate over the test dataloader
        for (test_images, test_labels) in test_dataloader:
          # Reshape images from (batch_size, 1, 28, 28) to (batch_size, 784) again
          reshaped_test_images = test_images.view(-1, 784)

          # Wrap test data in Variable, like we did earlier.
          # We set volatile=True bc we don't need history; speeds up inference.
          reshaped_test_images = Variable(reshaped_test_images, volatile=True)
          test_labels = Variable(test_labels, volatile=True)

          # If we're using the GPU, move tensors to the GPU.
          if using_GPU:
            reshaped_test_images = reshaped_test_images.cuda()
            test_labels = test_labels.cuda()

          # Run the forward pass to get predicted distribution.
          predicted = fashionmnist_ffnn_clf(reshaped_test_images)

          # Calculate loss for this test batch. This is averaged, so multiply
          # by the number of examples in batch to get a total.
          total_test_loss += nll_criterion(
              predicted, test_labels).data * test_labels.size(0)

          # Get predicted labels (argmax)
          # We need predicted.data since predicted is a Variable, and torch.max
          # expects a Tensor as input. .data extracts Tensor underlying Variable.
          _, predicted_labels = torch.max(predicted.data, 1)

          # Count the number of examples in this batch
          total_examples += test_labels.size(0)

          # Count the total number of correctly predicted labels.
          # predicted == labels generates a ByteTensor in indices where
          # predicted and labels match, so we can sum to get the num correct.
          num_correct += torch.sum(predicted_labels == test_labels.data)
        accuracy = 100 * num_correct / total_examples
        average_test_loss = total_test_loss / total_examples
        print("Iteration {}. Test Loss {}. Test Accuracy {}.".format(
            num_iter, average_test_loss, accuracy))
        # Set the model back to train mode, which activates dropout again.
        fashionmnist_ffnn_clf.train()

In [None]:
train_model(10,0,ffnn_clf_a)

Starting epoch 1


  reshaped_test_images = Variable(reshaped_test_images, volatile=True)
  test_labels = Variable(test_labels, volatile=True)


Iteration 500. Test Loss 0.6076332926750183. Test Accuracy 76.5199966430664.
Starting epoch 2
Iteration 1000. Test Loss 0.48759403824806213. Test Accuracy 82.18999481201172.
Iteration 1500. Test Loss 0.45207878947257996. Test Accuracy 83.3499984741211.
Starting epoch 3
Iteration 2000. Test Loss 0.43110692501068115. Test Accuracy 84.18000030517578.
Iteration 2500. Test Loss 0.4130193889141083. Test Accuracy 85.05999755859375.
Starting epoch 4
Iteration 3000. Test Loss 0.40974515676498413. Test Accuracy 85.47999572753906.
Iteration 3500. Test Loss 0.39064595103263855. Test Accuracy 85.55999755859375.
Starting epoch 5
Iteration 4000. Test Loss 0.408939391374588. Test Accuracy 85.31999969482422.
Iteration 4500. Test Loss 0.3904130160808563. Test Accuracy 85.25.
Starting epoch 6
Iteration 5000. Test Loss 0.37169697880744934. Test Accuracy 86.54000091552734.
Iteration 5500. Test Loss 0.3568299114704132. Test Accuracy 86.66999816894531.
Starting epoch 7
Iteration 6000. Test Loss 0.35452386736

# Part (b)

(b) Experiment with three different activation functions and two different optimizers. Report your results and discuss your findings.

3 activation functions: nn.ReLU(), nn.Tanh(), nn.Sigmoid()

2 optimizers: Adam and RMSProp.

In [23]:
class FeedForwardNN_b(nn.Module):
  # input_size: Dimensionality of input feature vector.
  # num_classes: The number of classes in the classification problem.
  # num_hidden: The number of hidden (intermediate) layers to use.
  # hidden_dim: The size of each of the hidden layers.
  # dropout: The proportion of units to drop out after each layer.
  def __init__(self, input_size, num_classes, num_hidden, dropout, act_func):
    # Always call the superclass (nn.Module) constructor first!
    super(FeedForwardNN_b, self).__init__()

    # Set up the hidden layers.
    assert num_hidden > 0
    # A special ModuleList to store our hidden layers.
    self.hidden_layers = nn.ModuleList([])
    # First hidden layer maps from input_size -> num_hidden.
    self.hidden_layers.append(nn.Linear(input_size, 512))
    # Subsequent hidden layers map from num_hidden -> num_hidden.
    # Note that they can map to any dimensionality --- as long as the final
    # output is a distribution over your classes!
    # for i in range(num_hidden - 1):
    self.hidden_layers.append(nn.Linear(512, 256))
    self.hidden_layers.append(nn.Linear(256, 128))

    # Set up the dropout layer.
    self.dropout = nn.Dropout(dropout)

    # Set up the final transform to a distribution over classes.
    self.output_projection = nn.Linear(128, num_classes)

    # Set up the nonlinearity to use between layers.
    if act_func== "ReLU":
      self.nonlinearity = nn.ReLU()
    elif act_func=="Tanh":
      self.nonlinearity = nn.Tanh()
    elif act_func=="Sigmoid":
      self.nonlinearity = nn.Sigmoid()

  # Forward's sole argument is the input.
  # input is of shape (batch_size, input_size)
  def forward(self, x):
    # Apply the hidden layers, nonlinearity, and dropout.
    for hidden_layer in self.hidden_layers:
      x = hidden_layer(x)
      x = self.dropout(x)
      x = self.nonlinearity(x)

    # Output layer: project x to a distribution over classes.
    out = self.output_projection(x)

    # Softmax the out tensor to get a log-probability distribution
    # over classes for each example.
    out_distribution = F.log_softmax(out, dim=-1)
    return out_distribution

## Variation 1A: ReLU + Adam

In [None]:
# Variation 1: ReLU
ffnn_clf_b1 = FeedForwardNN_b(input_size=784, num_classes=10, num_hidden=3,
                         dropout=0.2, act_func="ReLU")
print(ffnn_clf_b1)

FeedForwardNN_b(
  (hidden_layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): Linear(in_features=256, out_features=128, bias=True)
  )
  (dropout): Dropout(p=0.2, inplace=False)
  (output_projection): Linear(in_features=128, out_features=10, bias=True)
  (nonlinearity): ReLU()
)


In [None]:
if using_GPU:
  ffnn_clf_b1 = ffnn_clf_b1.cuda()

# Check if the Module is on GPU by checking if a parameter is on GPU
print("Model on GPU?:")
print(next(ffnn_clf_b1.parameters()).is_cuda)

Model on GPU?:
True


In [None]:
# Set up criterion for calculating loss
nll_criterion = nn.NLLLoss()

lr = 0.005
# momentum = 0.9
# Set up an optimizer for updating the parameters of fashionmnist_ffnn_clf
ffnn_optimizer = optim.Adam(ffnn_clf_b1.parameters(),
                           lr=lr)

In [None]:
# Train model
train_model(10, 0, ffnn_clf_b1)

Starting epoch 1


  reshaped_test_images = Variable(reshaped_test_images, volatile=True)
  test_labels = Variable(test_labels, volatile=True)


Iteration 500. Test Loss 0.5165243744850159. Test Accuracy 82.47000122070312.
Starting epoch 2
Iteration 1000. Test Loss 0.47114139795303345. Test Accuracy 83.54000091552734.
Iteration 1500. Test Loss 0.4532720148563385. Test Accuracy 84.1500015258789.
Starting epoch 3
Iteration 2000. Test Loss 0.44893041253089905. Test Accuracy 84.38999938964844.
Iteration 2500. Test Loss 0.45002636313438416. Test Accuracy 84.06999969482422.
Starting epoch 4
Iteration 3000. Test Loss 0.4376470148563385. Test Accuracy 84.57999420166016.
Iteration 3500. Test Loss 0.4363155663013458. Test Accuracy 84.65999603271484.
Starting epoch 5
Iteration 4000. Test Loss 0.4360687732696533. Test Accuracy 85.05999755859375.
Iteration 4500. Test Loss 0.4236913323402405. Test Accuracy 84.97999572753906.
Starting epoch 6
Iteration 5000. Test Loss 0.43823176622390747. Test Accuracy 84.75.
Iteration 5500. Test Loss 0.42339402437210083. Test Accuracy 85.30999755859375.
Starting epoch 7
Iteration 6000. Test Loss 0.4159811437

## Variation 1B: ReLU + RMSProp

In [None]:
# Set up criterion for calculating loss
nll_criterion = nn.NLLLoss()

lr = 0.005
# Set up an optimizer for updating the parameters of fashionmnist_ffnn_clf
ffnn_optimizer = optim.RMSprop(ffnn_clf_b1.parameters(),
                           lr=lr)

In [None]:
# Train model
train_model(10, 0, ffnn_clf_b1)

Starting epoch 1


  reshaped_test_images = Variable(reshaped_test_images, volatile=True)
  test_labels = Variable(test_labels, volatile=True)


Iteration 500. Test Loss 0.482811838388443. Test Accuracy 85.0.
Starting epoch 2
Iteration 1000. Test Loss 0.45666414499282837. Test Accuracy 84.97999572753906.
Iteration 1500. Test Loss 0.42218005657196045. Test Accuracy 85.70999908447266.
Starting epoch 3
Iteration 2000. Test Loss 0.43939346075057983. Test Accuracy 85.7699966430664.
Iteration 2500. Test Loss 0.4716459810733795. Test Accuracy 83.95999908447266.
Starting epoch 4
Iteration 3000. Test Loss 0.4900510013103485. Test Accuracy 85.0.
Iteration 3500. Test Loss 0.46308377385139465. Test Accuracy 84.72999572753906.
Starting epoch 5
Iteration 4000. Test Loss 0.4534662067890167. Test Accuracy 85.11000061035156.
Iteration 4500. Test Loss 0.44308093190193176. Test Accuracy 85.36000061035156.
Starting epoch 6
Iteration 5000. Test Loss 0.49480628967285156. Test Accuracy 84.06999969482422.
Iteration 5500. Test Loss 0.4244786500930786. Test Accuracy 86.12999725341797.
Starting epoch 7
Iteration 6000. Test Loss 0.42973342537879944. Test 

## Variation 2A: Tanh + Adam




In [None]:
# Variation 2: Tanh
ffnn_clf_b2 = FeedForwardNN_b(input_size=784, num_classes=10, num_hidden=3,
                         dropout=0.2, act_func="Tanh")
print(ffnn_clf_b2)

FeedForwardNN_b(
  (hidden_layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): Linear(in_features=256, out_features=128, bias=True)
  )
  (dropout): Dropout(p=0.2, inplace=False)
  (output_projection): Linear(in_features=128, out_features=10, bias=True)
  (nonlinearity): Tanh()
)


In [None]:
if using_GPU:
  ffnn_clf_b2 = ffnn_clf_b2.cuda()

# Check if the Module is on GPU by checking if a parameter is on GPU
print("Model on GPU?:")
print(next(ffnn_clf_b2.parameters()).is_cuda)

Model on GPU?:
True


In [None]:
# Set up criterion for calculating loss
nll_criterion = nn.NLLLoss()

lr = 0.005
# Set up an optimizer for updating the parameters of fashionmnist_ffnn_clf
ffnn_optimizer = optim.Adam(ffnn_clf_b2.parameters(),
                           lr=lr)

In [None]:
# Train model
train_model(10, 0, ffnn_clf_b2)

Starting epoch 1


  reshaped_test_images = Variable(reshaped_test_images, volatile=True)
  test_labels = Variable(test_labels, volatile=True)


Iteration 500. Test Loss 0.6670293807983398. Test Accuracy 77.58999633789062.
Starting epoch 2
Iteration 1000. Test Loss 0.5968989133834839. Test Accuracy 78.16999816894531.
Iteration 1500. Test Loss 0.5965826511383057. Test Accuracy 80.41999816894531.
Starting epoch 3
Iteration 2000. Test Loss 0.6133263111114502. Test Accuracy 80.02999877929688.
Iteration 2500. Test Loss 0.5700254440307617. Test Accuracy 80.04000091552734.
Starting epoch 4
Iteration 3000. Test Loss 0.5751953721046448. Test Accuracy 80.79000091552734.
Iteration 3500. Test Loss 0.5977470874786377. Test Accuracy 80.0199966430664.
Starting epoch 5
Iteration 4000. Test Loss 0.6000685691833496. Test Accuracy 77.54999542236328.
Iteration 4500. Test Loss 0.5947855114936829. Test Accuracy 80.68000030517578.
Starting epoch 6
Iteration 5000. Test Loss 0.6131851077079773. Test Accuracy 79.45999908447266.
Iteration 5500. Test Loss 0.5812848210334778. Test Accuracy 78.29999542236328.
Starting epoch 7
Iteration 6000. Test Loss 0.562

## Variation 2B: Tanh + RMSprop


In [None]:
# Set up criterion for calculating loss
nll_criterion = nn.NLLLoss()

lr = 0.005
# Set up an optimizer for updating the parameters of fashionmnist_ffnn_clf
ffnn_optimizer = optim.RMSprop(ffnn_clf_b2.parameters(),
                           lr=lr)

In [None]:
# Train model
train_model(10, 0, ffnn_clf_b2)

Starting epoch 1


  reshaped_test_images = Variable(reshaped_test_images, volatile=True)
  test_labels = Variable(test_labels, volatile=True)


Iteration 500. Test Loss 0.5585297346115112. Test Accuracy 81.04000091552734.
Starting epoch 2
Iteration 1000. Test Loss 0.5351490378379822. Test Accuracy 82.43999481201172.
Iteration 1500. Test Loss 0.5927719473838806. Test Accuracy 80.3699951171875.
Starting epoch 3
Iteration 2000. Test Loss 0.5723657011985779. Test Accuracy 81.3499984741211.
Iteration 2500. Test Loss 0.5930774807929993. Test Accuracy 80.0.
Starting epoch 4
Iteration 3000. Test Loss 0.553825855255127. Test Accuracy 81.40999603271484.
Iteration 3500. Test Loss 0.5374708771705627. Test Accuracy 81.54000091552734.
Starting epoch 5
Iteration 4000. Test Loss 0.5385926961898804. Test Accuracy 82.3699951171875.
Iteration 4500. Test Loss 0.5849688649177551. Test Accuracy 80.61000061035156.
Starting epoch 6
Iteration 5000. Test Loss 0.5565933585166931. Test Accuracy 81.47999572753906.
Iteration 5500. Test Loss 0.5417063236236572. Test Accuracy 81.50999450683594.
Starting epoch 7
Iteration 6000. Test Loss 0.5608379244804382. T

## Variation 3A: Sigmoid + Adam

In [None]:
# Variation 3: Sigmoid
ffnn_clf_b3 = FeedForwardNN_b(input_size=784, num_classes=10, num_hidden=3,
                         dropout=0.2, act_func="Sigmoid")
print(ffnn_clf_b3)

FeedForwardNN_b(
  (hidden_layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): Linear(in_features=256, out_features=128, bias=True)
  )
  (dropout): Dropout(p=0.2, inplace=False)
  (output_projection): Linear(in_features=128, out_features=10, bias=True)
  (nonlinearity): Sigmoid()
)


In [None]:
if using_GPU:
  ffnn_clf_b3 = ffnn_clf_b3.cuda()

# Check if the Module is on GPU by checking if a parameter is on GPU
print("Model on GPU?:")
print(next(ffnn_clf_b3.parameters()).is_cuda)

Model on GPU?:
True


In [None]:
# Set up criterion for calculating loss
nll_criterion = nn.NLLLoss()

lr = 0.005
# Set up an optimizer for updating the parameters of fashionmnist_ffnn_clf
ffnn_optimizer = optim.Adam(ffnn_clf_b3.parameters(),
                           lr=lr)

In [None]:
# Train model
train_model(10, 0, ffnn_clf_b3)

Starting epoch 1


  reshaped_test_images = Variable(reshaped_test_images, volatile=True)
  test_labels = Variable(test_labels, volatile=True)


Iteration 500. Test Loss 0.63265460729599. Test Accuracy 77.30999755859375.
Starting epoch 2
Iteration 1000. Test Loss 0.5842671990394592. Test Accuracy 80.88999938964844.
Iteration 1500. Test Loss 0.596457302570343. Test Accuracy 81.7699966430664.
Starting epoch 3
Iteration 2000. Test Loss 0.5717397332191467. Test Accuracy 82.32999420166016.
Iteration 2500. Test Loss 0.5036637783050537. Test Accuracy 83.88999938964844.
Starting epoch 4
Iteration 3000. Test Loss 0.47630736231803894. Test Accuracy 84.56999969482422.
Iteration 3500. Test Loss 0.5322922468185425. Test Accuracy 84.12999725341797.
Starting epoch 5
Iteration 4000. Test Loss 0.5059822201728821. Test Accuracy 84.33999633789062.
Iteration 4500. Test Loss 0.49571073055267334. Test Accuracy 85.00999450683594.
Starting epoch 6
Iteration 5000. Test Loss 0.5630818009376526. Test Accuracy 83.86000061035156.
Iteration 5500. Test Loss 0.49617528915405273. Test Accuracy 85.04000091552734.
Starting epoch 7
Iteration 6000. Test Loss 0.515

## Variation 3B: Sigmoid + RMSprop

In [None]:
# Set up criterion for calculating loss
nll_criterion = nn.NLLLoss()

lr = 0.005
# Set up an optimizer for updating the parameters of fashionmnist_ffnn_clf
ffnn_optimizer = optim.RMSprop(ffnn_clf_b3.parameters(),
                           lr=lr)

In [None]:
# Train model
train_model(10, 0, ffnn_clf_b3)

Starting epoch 1


  reshaped_test_images = Variable(reshaped_test_images, volatile=True)
  test_labels = Variable(test_labels, volatile=True)


Iteration 500. Test Loss 0.43482404947280884. Test Accuracy 86.18000030517578.
Starting epoch 2
Iteration 1000. Test Loss 0.45174023509025574. Test Accuracy 85.93999481201172.
Iteration 1500. Test Loss 0.4200459420681. Test Accuracy 86.41999816894531.
Starting epoch 3
Iteration 2000. Test Loss 0.43779149651527405. Test Accuracy 86.37999725341797.
Iteration 2500. Test Loss 0.4491100013256073. Test Accuracy 85.72000122070312.
Starting epoch 4
Iteration 3000. Test Loss 0.43674716353416443. Test Accuracy 86.25999450683594.
Iteration 3500. Test Loss 0.4236128330230713. Test Accuracy 86.55999755859375.
Starting epoch 5
Iteration 4000. Test Loss 0.43589621782302856. Test Accuracy 86.22999572753906.
Iteration 4500. Test Loss 0.4324311316013336. Test Accuracy 86.3499984741211.
Starting epoch 6
Iteration 5000. Test Loss 0.4161583483219147. Test Accuracy 86.6500015258789.
Iteration 5500. Test Loss 0.45435765385627747. Test Accuracy 86.11000061035156.
Starting epoch 7
Iteration 6000. Test Loss 0.4

**Summary**

lr = 0.005

Variation 1: ReLU + Adam / RMSprop
- Test Loss 0.4513688385486603 / 0.4654547870159149.
-Test Accuracy 85.11000061035156 / 85.0199966430664.

Variation 2: Tanh + Adam / RMSprop
- Test Loss 0.613991916179657 / 0.5963281393051147.
- Test Accuracy 79.82999420166016 / 80.31999969482422.

Variation 3: Sigmoid + Adam / RMSprop
- Test Loss 0.4699700176715851 / 0.4451184570789337.
- Test Accuracy 86.04999542236328 / 86.1199951171875.

# Part (c)

(c) Building upon Task b above, describe and implement two approaches to improve upon the best variation from Task b. Report your results and discuss your findings.


Best variation from Task B: **Sigmoid + RMSprop**


### Method 1: Increase number of epochs and number of hidden layers (to a limit)


In [24]:
ffnn_clf_c = FeedForwardNN_b(input_size=784, num_classes=10, num_hidden=5,
                         dropout=0.2, act_func="Sigmoid")
print(ffnn_clf_c)

FeedForwardNN_b(
  (hidden_layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): Linear(in_features=256, out_features=128, bias=True)
  )
  (dropout): Dropout(p=0.2, inplace=False)
  (output_projection): Linear(in_features=128, out_features=10, bias=True)
  (nonlinearity): Sigmoid()
)


In [25]:
if using_GPU:
  ffnn_clf_c = ffnn_clf_c.cuda()

# Check if the Module is on GPU by checking if a parameter is on GPU
print("Model on GPU?:")
print(next(ffnn_clf_c.parameters()).is_cuda)

Model on GPU?:
True


In [26]:
# Set up criterion for calculating loss
nll_criterion = nn.NLLLoss()

lr = 0.005
# Set up an optimizer for updating the parameters of fashionmnist_ffnn_clf
ffnn_optimizer = optim.RMSprop(ffnn_clf_c.parameters(),
                           lr=lr)

In [None]:
# Train  with 15 epochs
train_model(15, 0, ffnn_clf_c)

Starting epoch 1


  reshaped_test_images = Variable(reshaped_test_images, volatile=True)
  test_labels = Variable(test_labels, volatile=True)


Iteration 500. Test Loss 1.080093502998352. Test Accuracy 58.099998474121094.
Starting epoch 2
Iteration 1000. Test Loss 0.6740158200263977. Test Accuracy 77.0199966430664.
Iteration 1500. Test Loss 0.5758374333381653. Test Accuracy 80.73999786376953.
Starting epoch 3
Iteration 2000. Test Loss 0.6483395099639893. Test Accuracy 79.30999755859375.
Iteration 2500. Test Loss 0.5413018465042114. Test Accuracy 83.13999938964844.
Starting epoch 4
Iteration 3000. Test Loss 0.5594813823699951. Test Accuracy 82.98999786376953.
Iteration 3500. Test Loss 0.4925706386566162. Test Accuracy 84.50999450683594.
Starting epoch 5
Iteration 4000. Test Loss 0.4928855299949646. Test Accuracy 84.87999725341797.
Iteration 4500. Test Loss 0.49439021944999695. Test Accuracy 85.00999450683594.
Starting epoch 6
Iteration 5000. Test Loss 0.4545965790748596. Test Accuracy 85.72000122070312.
Iteration 5500. Test Loss 0.46522656083106995. Test Accuracy 84.75.
Starting epoch 7
Iteration 6000. Test Loss 0.4562705457210

#### Finding 1:
- increasing the no. of epochs from 10 to 15 increases the accuracy slightly, but from the trend we can see that this only resulted in the oscillation of the max. accuracy of about 86.5.

Result
- Test Loss decreased by 0.04 from 0.4451184570789337 to 0.4037586450576782.
- Test Accuracy increased by 0.6 from 86.1199951171875 to 86.73999786376953.

In [41]:
# changing no. of layers
class FeedForwardNN_c(nn.Module):
  # input_size: Dimensionality of input feature vector.
  # num_classes: The number of classes in the classification problem.
  # num_hidden: The number of hidden (intermediate) layers to use.
  # hidden_dim: The size of each of the hidden layers.
  # dropout: The proportion of units to drop out after each layer.
  def __init__(self, input_size, num_classes, num_hidden, dropout):
    # Always call the superclass (nn.Module) constructor first!
    super(FeedForwardNN_c, self).__init__()

    # Set up the hidden layers.
    assert num_hidden > 0
    # A special ModuleList to store our hidden layers.
    self.hidden_layers = nn.ModuleList([])
    # First hidden layer maps from input_size -> num_hidden.
    self.hidden_layers.append(nn.Linear(input_size, 512))
    # Subsequent hidden layers map from num_hidden -> num_hidden.
    # Note that they can map to any dimensionality --- as long as the final
    # output is a distribution over your classes!
    # for i in range(num_hidden - 1):
    self.hidden_layers.append(nn.Linear(512, 256))
    self.hidden_layers.append(nn.Linear(256, 128))
    self.hidden_layers.append(nn.Linear(128, 64))

    # Set up the dropout layer.
    self.dropout = nn.Dropout(dropout)

    # Set up the final transform to a distribution over classes.
    self.output_projection = nn.Linear(64, num_classes)

    # Set up the nonlinearity to use between layers.
    self.nonlinearity = nn.Sigmoid()

  # Forward's sole argument is the input.
  # input is of shape (batch_size, input_size)
  def forward(self, x):
    # Apply the hidden layers, nonlinearity, and dropout.
    for hidden_layer in self.hidden_layers:
      x = hidden_layer(x)
      x = self.dropout(x)
      x = self.nonlinearity(x)

    # Output layer: project x to a distribution over classes.
    out = self.output_projection(x)

    # Softmax the out tensor to get a log-probability distribution
    # over classes for each example.
    out_distribution = F.log_softmax(out, dim=-1)
    return out_distribution

In [42]:
ffnn_clf_c1 = FeedForwardNN_c(input_size=784, num_classes=10, num_hidden=4,
                         dropout=0.2)
print(ffnn_clf_c1)

FeedForwardNN_c(
  (hidden_layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): Linear(in_features=256, out_features=128, bias=True)
    (3): Linear(in_features=128, out_features=64, bias=True)
  )
  (dropout): Dropout(p=0.2, inplace=False)
  (output_projection): Linear(in_features=64, out_features=10, bias=True)
  (nonlinearity): Sigmoid()
)


In [43]:
if using_GPU:
  ffnn_clf_c1 = ffnn_clf_c1.cuda()

# Check if the Module is on GPU by checking if a parameter is on GPU
print("Model on GPU?:")
print(next(ffnn_clf_c1.parameters()).is_cuda)

Model on GPU?:
True


In [52]:
# Set up criterion for calculating loss
nll_criterion = nn.NLLLoss()

lr = 0.005
# Set up an optimizer for updating the parameters of fashionmnist_ffnn_clf
ffnn_optimizer = optim.RMSprop(ffnn_clf_c1.parameters(),
                           lr=lr)

In [53]:
# Train model
train_model(10, 0, ffnn_clf_c1)

Starting epoch 1


  reshaped_test_images = Variable(reshaped_test_images, volatile=True)
  test_labels = Variable(test_labels, volatile=True)


Iteration 500. Test Loss 0.4462265372276306. Test Accuracy 85.47999572753906.
Starting epoch 2
Iteration 1000. Test Loss 0.42653006315231323. Test Accuracy 86.13999938964844.
Iteration 1500. Test Loss 0.4461647868156433. Test Accuracy 84.88999938964844.
Starting epoch 3
Iteration 2000. Test Loss 0.44366443157196045. Test Accuracy 85.97000122070312.
Iteration 2500. Test Loss 0.4399830996990204. Test Accuracy 85.62999725341797.
Starting epoch 4
Iteration 3000. Test Loss 0.42525726556777954. Test Accuracy 86.02999877929688.
Iteration 3500. Test Loss 0.45552483201026917. Test Accuracy 84.65999603271484.
Starting epoch 5
Iteration 4000. Test Loss 0.41470015048980713. Test Accuracy 86.18999481201172.
Iteration 4500. Test Loss 0.41976916790008545. Test Accuracy 85.91999816894531.
Starting epoch 6
Iteration 5000. Test Loss 0.4269491136074066. Test Accuracy 85.77999877929688.
Iteration 5500. Test Loss 0.4645144045352936. Test Accuracy 84.72000122070312.
Starting epoch 7
Iteration 6000. Test Los

#### Finding 2:
- despite increasing the no. of hidden layers from 3 to 4, accuracy stays limited at about 86.5 maximum.

###Method 2: Decrease learning rate and change loss functions


In [54]:
# Set up criterion for calculating loss
nll_criterion = nn.NLLLoss()

lr = 0.001
# Set up an optimizer for updating the parameters of fashionmnist_ffnn_clf
ffnn_optimizer = optim.RMSprop(ffnn_clf_c1.parameters(),
                           lr=lr)

In [55]:
# Train model
train_model(10, 0, ffnn_clf_c1)

Starting epoch 1


  reshaped_test_images = Variable(reshaped_test_images, volatile=True)
  test_labels = Variable(test_labels, volatile=True)


Iteration 500. Test Loss 0.3976600170135498. Test Accuracy 86.62999725341797.
Starting epoch 2
Iteration 1000. Test Loss 0.39026811718940735. Test Accuracy 86.6199951171875.
Iteration 1500. Test Loss 0.38605162501335144. Test Accuracy 86.8699951171875.
Starting epoch 3
Iteration 2000. Test Loss 0.39221036434173584. Test Accuracy 87.1199951171875.
Iteration 2500. Test Loss 0.3846086859703064. Test Accuracy 87.15999603271484.
Starting epoch 4
Iteration 3000. Test Loss 0.375472754240036. Test Accuracy 87.6500015258789.
Iteration 3500. Test Loss 0.37506869435310364. Test Accuracy 87.43999481201172.
Starting epoch 5
Iteration 4000. Test Loss 0.3770142197608948. Test Accuracy 87.32999420166016.
Iteration 4500. Test Loss 0.3668029308319092. Test Accuracy 87.56999969482422.
Starting epoch 6
Iteration 5000. Test Loss 0.3716893196105957. Test Accuracy 87.4000015258789.
Iteration 5500. Test Loss 0.3712044954299927. Test Accuracy 87.32999420166016.
Starting epoch 7
Iteration 6000. Test Loss 0.3744

#### Finding 1:
- the limiting factor was indeed the learning rate, as decreasing from 0.005 to 0.001 it allowed the accuracy to increase by about 1.


In [61]:
# Set up criterion for calculating loss
nll_criterion = nn.CrossEntropyLoss()

lr = 0.001
# Set up an optimizer for updating the parameters of fashionmnist_ffnn_clf
ffnn_optimizer = optim.RMSprop(ffnn_clf_c1.parameters(),
                           lr=lr)

In [62]:
# Train model
train_model(10, 0, ffnn_clf_c1)

Starting epoch 1


  reshaped_test_images = Variable(reshaped_test_images, volatile=True)
  test_labels = Variable(test_labels, volatile=True)


Iteration 500. Test Loss 0.3659091591835022. Test Accuracy 87.13999938964844.
Starting epoch 2
Iteration 1000. Test Loss 0.3606683909893036. Test Accuracy 87.29999542236328.
Iteration 1500. Test Loss 0.36331987380981445. Test Accuracy 87.73999786376953.
Starting epoch 3
Iteration 2000. Test Loss 0.36198291182518005. Test Accuracy 87.79999542236328.
Iteration 2500. Test Loss 0.3564203977584839. Test Accuracy 87.95999908447266.
Starting epoch 4
Iteration 3000. Test Loss 0.35864007472991943. Test Accuracy 87.7699966430664.
Iteration 3500. Test Loss 0.3625774681568146. Test Accuracy 87.7699966430664.
Starting epoch 5
Iteration 4000. Test Loss 0.36144399642944336. Test Accuracy 87.70999908447266.
Iteration 4500. Test Loss 0.3569597601890564. Test Accuracy 87.79000091552734.
Starting epoch 6
Iteration 5000. Test Loss 0.36297717690467834. Test Accuracy 87.65999603271484.
Iteration 5500. Test Loss 0.361416757106781. Test Accuracy 87.8699951171875.
Starting epoch 7
Iteration 6000. Test Loss 0.3

#### Finding 2:
- changing the loss function to CrossEntropyLoss allowed the accuracy to increase by about 0.5.
