# [Deep Learning](https://www.cc.gatech.edu/~hays/compvision/proj6/)

## Setup

In [129]:
%matplotlib notebook
%load_ext autoreload
%autoreload 2
import cv2
import numpy as np
import random
import torch.nn as nn
import torch.optim as optim
import os.path as osp
import matplotlib.pyplot as plt
from utils import *
import student_code as sc
from torchvision.models import alexnet

data_path = osp.join('../data', '15SceneData')
num_classes = 15

# If you have a good Nvidia GPU with an appropriate environment, 
# try setting the use_GPU flag to True (the environment provided does
# not support GPUs and we will not provide any support for GPU
# computation in this project). Please note that 
# we will evaluate your implementations only using CPU mode so even if
# you use a GPU, make sure your code runs in the CPU mode with the
# environment we provided. 
use_GPU = False
if use_GPU:
    from utils_gpu import *

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


To train a network in PyTorch, we need 4 components:
1. **Dataset** - an object which can load the data and labels given an index.
2. **Model** - an object that contains the network architecture definition.
3. **Loss function** - a function that measures how far the network output is from the ground truth label.
4. **Optimizer** - an object that optimizes the network parameters to reduce the loss value.

This project has two main parts. In Part 1, you will train a deep network from scratch. In Part 2, you will "fine-tune" a trained network. 

## Part 0. Warm up! Training a Deep Network from Scratch

In [130]:
# Fix random seeds so that results will be reproducible
set_seed(0, use_GPU)

You do not need to code anything for this part. You will simply run the code we provided, but we want you to report the result you got. This section will also familiarize you with the steps of training a deep network from scratch. 

In [131]:
# Training parameters.
input_size = (64, 64)
RGB = False  
base_lr = 1e-2  # may try a smaller lr if not using batch norm
weight_decay = 5e-4
momentum = 0.9

We will first create our datasets, by calling the create_datasets function from student_code. This function returns a separate dataset loader for each split of the dataset (training and testing/validation). Each dataloader is used to load the datasets after appling some pre-processing transforms. In Part 1, you will be asked to add a few more pre-processing transforms to the dataloaders by modifying this function.

In [132]:
# Create the training and testing datasets.
train_dataset, test_dataset = sc.create_datasets(data_path=data_path, input_size=input_size, rgb=RGB)
assert test_dataset.classes == train_dataset.classes

Computing pixel mean and stdev...
Batch 0 / 30
Batch 20 / 30
Done, mean = 
[0.45579668]
std = 
[0.23624939]
Computing pixel mean and stdev...
Batch 0 / 60
Batch 20 / 60
Batch 40 / 60
Done, mean = 
[0.45517009]
std = 
[0.2350788]


Now we will create our network model using the SimpleNet class from student_code. The implementation provided in the SimpleNet class gives you a basic network. In Part 1, you will be asked to add a few more layers to this network. 

In [133]:
# Create the network model.
model = sc.SimpleNet(num_classes=num_classes, rgb=False, verbose=False)
if use_GPU:
    model = model.cuda()
print(model)

SimpleNet(
  (features): Sequential(
    (0): Conv2d(1, 10, kernel_size=(9, 9), stride=(1, 1), bias=False)
    (1): BatchNorm2d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): MaxPool2d(kernel_size=5, stride=3, padding=0, dilation=1, ceil_mode=False)
    (3): ReLU()
    (4): Dropout(p=0.5)
    (5): Conv2d(10, 15, kernel_size=(5, 5), stride=(1, 1), bias=False)
    (6): BatchNorm2d(15, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (8): ReLU()
    (9): Dropout(p=0.5)
  )
  (classifier): Conv2d(15, 15, kernel_size=(6, 6), stride=(1, 1))
)


Next we will create the loss function and the optimizer. 

In [134]:
# Create the loss function.
# see http://pytorch.org/docs/0.3.0/nn.html#loss-functions for a list of available loss functions
loss_function = nn.CrossEntropyLoss()

In [135]:
# Create the optimizer and a learning rate scheduler
optimizer = optim.SGD(params=model.parameters(), lr=base_lr, weight_decay=weight_decay, momentum=momentum)
# Currently a simple step scheduler.
# See http://pytorch.org/docs/0.3.0/optim.html#how-to-adjust-learning-rate for various LR schedulers
# and how to use them
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=60, gamma=0.1)

Finally we are ready to train our network! We will start a local server to see the training progress of our network. Open a new terminal and activate the environment for this project. Then run the following command: **python -m visdom.server**. This will start a local server. The terminal output should give out a link like: "http://localhost:8097". Open this link in your browser. After you run the following block, visit this link again, and you will be able to see graphs showing the progress of your training! If you do not see any graphs, select Part 1 on the top left bar where is says Environment (only select Part 1, do not check main or Part 2).

In [79]:
# train the network!
params = {'n_epochs': 100, 'batch_size': 50, 'experiment': 'part1'}
trainer = Trainer(train_dataset, test_dataset, model, loss_function, optimizer, lr_scheduler, params)
best_prec1 = trainer.train_val()
print('Best top-1 Accuracy = {:4.3f}'.format(best_prec1))

---------------------------------------
Experiment: part1
resume_optim: True
experiment: part1
val_freq: 1
checkpoint_file: None
num_workers: 4
print_freq: 100
shuffle: True
batch_size: 50
do_val: True
n_epochs: 100
---------------------------------------
part1 Epoch 0 / 100
train part1: batch 0/29, loss 2.648, top-1 accuracy 18.000, top-5 accuracy 52.000
train part1: loss 2.664706
val part1: batch 0/59, loss 3.037, top-1 accuracy 0.000, top-5 accuracy 0.000
val part1: loss 2.634302
Checkpoint saved
BEST TOP1 ACCURACY SO FAR
part1 Epoch 1 / 100
train part1: batch 0/29, loss 2.653, top-1 accuracy 16.000, top-5 accuracy 42.000
train part1: loss 2.570846
val part1: batch 0/59, loss 3.143, top-1 accuracy 0.000, top-5 accuracy 0.000
val part1: loss 2.552107
Checkpoint saved
BEST TOP1 ACCURACY SO FAR
part1 Epoch 2 / 100
train part1: batch 0/29, loss 2.500, top-1 accuracy 14.000, top-5 accuracy 50.000
train part1: loss 2.485453
val part1: batch 0/59, loss 2.198, top-1 accuracy 18.000, top-5 a

train part1: loss 1.046037
val part1: batch 0/59, loss 2.814, top-1 accuracy 28.000, top-5 accuracy 66.000
val part1: loss 2.416530
Checkpoint saved
part1 Epoch 31 / 100
train part1: batch 0/29, loss 0.865, top-1 accuracy 72.000, top-5 accuracy 100.000
train part1: loss 1.064576
val part1: batch 0/59, loss 4.055, top-1 accuracy 16.000, top-5 accuracy 46.000
val part1: loss 2.305213
Checkpoint saved
BEST TOP1 ACCURACY SO FAR
part1 Epoch 32 / 100
train part1: batch 0/29, loss 1.014, top-1 accuracy 72.000, top-5 accuracy 92.000
train part1: loss 1.035600
val part1: batch 0/59, loss 3.275, top-1 accuracy 16.000, top-5 accuracy 54.000
val part1: loss 2.465159
Checkpoint saved
part1 Epoch 33 / 100
train part1: batch 0/29, loss 1.150, top-1 accuracy 62.000, top-5 accuracy 92.000
train part1: loss 1.034405
val part1: batch 0/59, loss 3.125, top-1 accuracy 22.000, top-5 accuracy 58.000
val part1: loss 2.378150
Checkpoint saved
part1 Epoch 34 / 100
train part1: batch 0/29, loss 0.995, top-1 accu

val part1: loss 2.988017
Checkpoint saved
part1 Epoch 63 / 100
train part1: batch 0/29, loss 0.283, top-1 accuracy 96.000, top-5 accuracy 100.000
train part1: loss 0.299309
val part1: batch 0/59, loss 5.077, top-1 accuracy 20.000, top-5 accuracy 58.000
val part1: loss 2.997636
Checkpoint saved
part1 Epoch 64 / 100
train part1: batch 0/29, loss 0.152, top-1 accuracy 100.000, top-5 accuracy 100.000
train part1: loss 0.298455
val part1: batch 0/59, loss 4.927, top-1 accuracy 22.000, top-5 accuracy 58.000
val part1: loss 3.011393
Checkpoint saved
part1 Epoch 65 / 100
train part1: batch 0/29, loss 0.266, top-1 accuracy 96.000, top-5 accuracy 100.000
train part1: loss 0.294577
val part1: batch 0/59, loss 5.272, top-1 accuracy 20.000, top-5 accuracy 56.000
val part1: loss 2.988395
Checkpoint saved
part1 Epoch 66 / 100
train part1: batch 0/29, loss 0.234, top-1 accuracy 100.000, top-5 accuracy 100.000
train part1: loss 0.294848
val part1: batch 0/59, loss 5.191, top-1 accuracy 20.000, top-5 ac

train part1: loss 0.253474
val part1: batch 0/59, loss 5.473, top-1 accuracy 22.000, top-5 accuracy 54.000
val part1: loss 3.115583
Checkpoint saved
part1 Epoch 96 / 100
train part1: batch 0/29, loss 0.229, top-1 accuracy 96.000, top-5 accuracy 100.000
train part1: loss 0.252413
val part1: batch 0/59, loss 5.441, top-1 accuracy 22.000, top-5 accuracy 56.000
val part1: loss 3.115074
Checkpoint saved
part1 Epoch 97 / 100
train part1: batch 0/29, loss 0.196, top-1 accuracy 98.000, top-5 accuracy 100.000
train part1: loss 0.251616
val part1: batch 0/59, loss 5.483, top-1 accuracy 22.000, top-5 accuracy 56.000
val part1: loss 3.119599
Checkpoint saved
part1 Epoch 98 / 100
train part1: batch 0/29, loss 0.182, top-1 accuracy 100.000, top-5 accuracy 100.000
train part1: loss 0.251598
val part1: batch 0/59, loss 5.353, top-1 accuracy 22.000, top-5 accuracy 56.000
val part1: loss 3.133118
Checkpoint saved
part1 Epoch 99 / 100
train part1: batch 0/29, loss 0.239, top-1 accuracy 98.000, top-5 accu

Expect this code to take around 5 minutes on CPU or 3 minutes on GPU. Now you are ready to actually modify the functions we used to train our model. Before you move on, make sure to record the accuracy of your network from Part 0, and report it in your write up. 

## Part 1: Modifying the Dataloaders and the Simple Network create_datasets

In [149]:
# Fix random seeds so that results will be reproducible
set_seed(0, use_GPU)

Now you will modify the create_datasets function from student_code. You will add random left-right mirroring and normalization to the transformations applied to the training dataset. You will also add normalization to the transformations applied to the testing dataset. 

In [150]:
# Create the training and testing datasets.
train_dataset, test_dataset = sc.create_datasets(data_path=data_path, input_size=input_size, rgb=RGB)
assert test_dataset.classes == train_dataset.classes

Computing pixel mean and stdev...
Batch 0 / 30
Batch 20 / 30
Done, mean = 
[0.45579668]
std = 
[0.23624939]
Computing pixel mean and stdev...
Batch 0 / 60
Batch 20 / 60
Batch 40 / 60
Done, mean = 
[0.45517009]
std = 
[0.2350788]


Now you will modify SimpleNet by adding droppout, batch normalization, and additional convolution/maxpool/relu layers. You should achieve an accuracy of at least **50%**. Make sure your network passes this threshold--it is required for full credit on this section!

You can also use the following two blocks to determine the stucture of your network.

In [151]:
# create the network model
model = sc.SimpleNet(num_classes=num_classes, rgb=False, verbose=False)
if use_GPU:
    model = model.cuda()
print(model)

SimpleNet(
  (features): Sequential(
    (0): Conv2d(1, 10, kernel_size=(9, 9), stride=(1, 1), bias=False)
    (1): BatchNorm2d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): MaxPool2d(kernel_size=5, stride=3, padding=0, dilation=1, ceil_mode=False)
    (3): ReLU()
    (4): Dropout(p=0.5)
    (5): Conv2d(10, 15, kernel_size=(5, 5), stride=(1, 1), bias=False)
    (6): BatchNorm2d(15, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (8): ReLU()
    (9): Dropout(p=0.5)
  )
  (classifier): Conv2d(15, 15, kernel_size=(6, 6), stride=(1, 1))
)


In [152]:
# Use this block to determine the kernel size of the conv2d layer in the classifier
# first, set the kernel size of that conv2d layer to 1, and run this block
# then, use that size of input to the classifier printed by this block to
# go back and update the kernel size of the conv2d layer in the classifier
# Finally, run this block again and verify that the network output size is a scalar
# Don't forget to re-run the block above every time you update the SimpleNet class!
from torch.autograd import Variable
data, _ = train_dataset[0]
s = data.size()
data = Variable(data.view(1, *s))
if use_GPU:
    data = data.cuda()
out = model(data)
print('Network output size is ', out.size())

Network output size is  torch.Size([15])


Next we will create the loss function and the optimizer. You do not have to modify the custom_part1_trainer in student_code if you use the same loss_function, optimizer, scheduler and parameters (n_epoch, batch_size etc.) as provided in this notebook to hit the required threshold of 50% accuracy. If you changed any of these values, it is important that you modify this function in student_code since we will not be using the notebook you submit to evaluate. 

In [153]:
# Set up the trainer. You can modify custom_part1_trainer in
# student_copy.py if you want to try different learning settings.
custom_part1_trainer = sc.custom_part1_trainer(model)

if custom_part1_trainer is None:
    # Create the loss function.
    # see http://pytorch.org/docs/0.3.0/nn.html#loss-functions for a list of available loss functions
    loss_function = nn.CrossEntropyLoss()

    # Create the optimizer and a learning rate scheduler.
    optimizer = optim.SGD(params=model.parameters(), lr=base_lr, weight_decay=weight_decay, momentum=momentum)
    # Currently a simple step scheduler, but you can get creative.
    # See http://pytorch.org/docs/0.3.0/optim.html#how-to-adjust-learning-rate for various LR schedulers
    # and how to use them
    lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=60, gamma=0.1)

    params = {'n_epochs': 100, 'batch_size': 50, 'experiment': 'part1'}
    
else:
    if 'loss_function' in custom_part1_trainer:
        loss_function = custom_part1_trainer['loss_function']
    if 'optimizer' in custom_part1_trainer:
        optimizer = custom_part1_trainer['optimizer']
    if 'lr_scheduler' in custom_part1_trainer:
        lr_scheduler = custom_part1_trainer['lr_scheduler']
    if 'params' in custom_part1_trainer:
        params = custom_part1_trainer['params']

We are ready to train our network! As before, we will start a local server to see the training progress of our network (if you server is already running, you should not start another one). Open a new terminal and activate the environment for this project. Then run the following command: **python -m visdom.server**. This will start a local server. The terminal output should give out a link like: "http://localhost:8097". Open this link in your browser. After you run the following block, visit this link again, and you will be able to see graphs showing the progress of your training! If you do not see any graphs, select Part 1 on the top left bar where is says Environment (only select Part 1, do not check main or Part 2).

In [154]:
# Train the network!
trainer = Trainer(train_dataset, test_dataset, model, loss_function, optimizer, lr_scheduler, params)
best_prec1 = trainer.train_val()
print('Best top-1 Accuracy = {:4.3f}'.format(best_prec1))

---------------------------------------
Experiment: part1
resume_optim: True
experiment: part1
val_freq: 1
checkpoint_file: None
num_workers: 4
print_freq: 100
shuffle: True
batch_size: 50
do_val: True
n_epochs: 100
---------------------------------------
part1 Epoch 0 / 100
train part1: batch 0/29, loss 2.678, top-1 accuracy 12.000, top-5 accuracy 50.000
train part1: loss 2.673910
val part1: batch 0/59, loss 1.826, top-1 accuracy 36.000, top-5 accuracy 92.000
val part1: loss 2.735530
Checkpoint saved
BEST TOP1 ACCURACY SO FAR
part1 Epoch 1 / 100
train part1: batch 0/29, loss 2.606, top-1 accuracy 20.000, top-5 accuracy 54.000
train part1: loss 2.466444
val part1: batch 0/59, loss 2.009, top-1 accuracy 42.000, top-5 accuracy 76.000
val part1: loss 2.363068
Checkpoint saved
BEST TOP1 ACCURACY SO FAR
part1 Epoch 2 / 100
train part1: batch 0/29, loss 2.644, top-1 accuracy 16.000, top-5 accuracy 52.000
train part1: loss 2.218547
val part1: batch 0/59, loss 2.471, top-1 accuracy 6.000, top-

val part1: loss 1.632295
Checkpoint saved
part1 Epoch 30 / 100
train part1: batch 0/29, loss 1.637, top-1 accuracy 44.000, top-5 accuracy 84.000
train part1: loss 1.510275
val part1: batch 0/59, loss 2.250, top-1 accuracy 20.000, top-5 accuracy 66.000
val part1: loss 1.640027
Checkpoint saved
part1 Epoch 31 / 100
train part1: batch 0/29, loss 1.552, top-1 accuracy 46.000, top-5 accuracy 84.000
train part1: loss 1.512509
val part1: batch 0/59, loss 1.711, top-1 accuracy 48.000, top-5 accuracy 84.000
val part1: loss 1.637625
Checkpoint saved
part1 Epoch 32 / 100
train part1: batch 0/29, loss 1.569, top-1 accuracy 48.000, top-5 accuracy 84.000
train part1: loss 1.507364
val part1: batch 0/59, loss 1.869, top-1 accuracy 30.000, top-5 accuracy 84.000
val part1: loss 1.661534
Checkpoint saved
part1 Epoch 33 / 100
train part1: batch 0/29, loss 1.645, top-1 accuracy 46.000, top-5 accuracy 88.000
train part1: loss 1.492655
val part1: batch 0/59, loss 1.794, top-1 accuracy 36.000, top-5 accuracy

val part1: loss 1.507522
Checkpoint saved
BEST TOP1 ACCURACY SO FAR
part1 Epoch 62 / 100
train part1: batch 0/29, loss 1.197, top-1 accuracy 60.000, top-5 accuracy 94.000
train part1: loss 1.339588
val part1: batch 0/59, loss 2.122, top-1 accuracy 24.000, top-5 accuracy 74.000
val part1: loss 1.507003
Checkpoint saved
BEST TOP1 ACCURACY SO FAR
part1 Epoch 63 / 100
train part1: batch 0/29, loss 1.242, top-1 accuracy 62.000, top-5 accuracy 92.000
train part1: loss 1.337956
val part1: batch 0/59, loss 2.077, top-1 accuracy 24.000, top-5 accuracy 74.000
val part1: loss 1.513378
Checkpoint saved
part1 Epoch 64 / 100
train part1: batch 0/29, loss 1.429, top-1 accuracy 46.000, top-5 accuracy 86.000
train part1: loss 1.284295
val part1: batch 0/59, loss 2.119, top-1 accuracy 22.000, top-5 accuracy 74.000
val part1: loss 1.509077
Checkpoint saved
BEST TOP1 ACCURACY SO FAR
part1 Epoch 65 / 100
train part1: batch 0/29, loss 1.321, top-1 accuracy 58.000, top-5 accuracy 88.000
train part1: loss 1.3

val part1: loss 1.500436
Checkpoint saved
part1 Epoch 94 / 100
train part1: batch 0/29, loss 1.150, top-1 accuracy 64.000, top-5 accuracy 92.000
train part1: loss 1.292240
val part1: batch 0/59, loss 2.099, top-1 accuracy 26.000, top-5 accuracy 76.000
val part1: loss 1.498553
Checkpoint saved
part1 Epoch 95 / 100
train part1: batch 0/29, loss 1.157, top-1 accuracy 62.000, top-5 accuracy 96.000
train part1: loss 1.279566
val part1: batch 0/59, loss 2.019, top-1 accuracy 28.000, top-5 accuracy 74.000
val part1: loss 1.501158
Checkpoint saved
part1 Epoch 96 / 100
train part1: batch 0/29, loss 1.226, top-1 accuracy 56.000, top-5 accuracy 90.000
train part1: loss 1.279682
val part1: batch 0/59, loss 2.100, top-1 accuracy 24.000, top-5 accuracy 76.000
val part1: loss 1.488922
Checkpoint saved
part1 Epoch 97 / 100
train part1: batch 0/29, loss 1.244, top-1 accuracy 64.000, top-5 accuracy 84.000
train part1: loss 1.299871
val part1: batch 0/59, loss 2.022, top-1 accuracy 30.000, top-5 accuracy

Make sure you get at least 50% accuracy in this section! If you tried different settings than the ones provided to get 50%, you should modify custom_part1_trainer in student code to return a dictionary with your changed settings. 

## Part 2. Fine-Tuning a Pre-Trained Network

In [162]:
# Fix random seeds so that results will be reproducible
set_seed(0, use_GPU)

Training a network from scratch takes a lof of time. Instead of training from scratch, we can take a pre-trained model and fine tune it for our purposes. This is the goal of Part 2--you will train a pre-trained network, and achieve at least 80% accuracy. 

In [163]:
# training parameters
input_size = (224, 224)
RGB = True
base_lr = 1e-3
weight_decay = 5e-4
momentum = 0.9
backprop_depth = 3

In [164]:
# Create the training and testing datasets.
train_dataset, test_dataset = sc.create_datasets(data_path=data_path, input_size=input_size, rgb=RGB)
assert test_dataset.classes == train_dataset.classes

Computing pixel mean and stdev...
Batch 0 / 30
Batch 20 / 30
Done, mean = 
[0.45611589 0.45611589 0.45611589]
std = 
[0.24786406 0.24786406 0.24786406]
Computing pixel mean and stdev...
Batch 0 / 60
Batch 20 / 60
Batch 40 / 60
Done, mean = 
[0.45549639 0.45549639 0.45549639]
std = 
[0.24698076 0.24698076 0.24698076]


Following block loads a pretrained AlexNet.

In [165]:
# Create the network model.
model = alexnet(pretrained=True)
print(model)

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace)
    (3): Dropout(p=0.5)
    (4): Linear(in_features=4096, out_feature

Now, you modify create_part2_model from student code in order to fine-tune AlexNet. As you can see in the docs (https://github.com/pytorch/vision/blob/master/torchvision/models/alexnet.py) and in the model printout above, AlexNet has 2 parts: 'features', which constists of conv layers that extract feature maps from the image, and 'classifier' which consists of FC layers that classify the features. We want to replace the last Linear layer in model.classifier. 

In [166]:
model = sc.create_part2_model(model, num_classes)
if use_GPU:
    model = model.cuda()
print(model)

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace)
    (3): Dropout(p=0.5)
    (4): Linear(in_features=4096, out_feature

Next we will create the loss function and the optimizer. Just as with part 1, if you modify any of the setttings to hit the required accuracy, you must modify custom_part2_trainer function to return a dictionary containing your changes. 

In [167]:
# Set up the trainer. You can modify custom_part2_trainer in
# student_copy.py if you want to try different learning settings.
custom_part2_trainer = sc.custom_part2_trainer(model)

if custom_part2_trainer is None:
    # Create the loss function
    # see http://pytorch.org/docs/0.3.0/nn.html#loss-functions for a list of available loss functions
    loss_function = nn.CrossEntropyLoss()

    # Since we do not want to optimize the whole network, we must extract a list of parameters of interest that will be
    # optimized by the optimizer.
    params_to_optimize = []

    # List of modules in the network
    mods = list(model.features.children()) + list(model.classifier.children())

    # Extract parameters from the last `backprop_depth` modules in the network and collect them in
    # the params_to_optimize list.
    for m in mods[::-1][:backprop_depth]:
        params_to_optimize.extend(list(m.parameters()))

    # Construct the optimizer    
    optimizer = optim.SGD(params=params_to_optimize, lr=base_lr, weight_decay=weight_decay, momentum=momentum)

    # Create a scheduler, currently a simple step scheduler, but you can get creative.
    # See http://pytorch.org/docs/0.3.0/optim.html#how-to-adjust-learning-rate for various LR schedulers
    # and how to use them
    lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)
    
    params = {'n_epochs': 4, 'batch_size': 10, 'experiment': 'part2'} 
    
else:
    if 'loss_function' in custom_part2_trainer:
        loss_function = custom_part2_trainer['loss_function']
    if 'optimizer' in custom_part2_trainer:
        optimizer = custom_part2_trainer['optimizer']
    if 'lr_scheduler' in custom_part2_trainer:
        lr_scheduler = custom_part2_trainer['lr_scheduler']
    if 'params' in custom_part2_trainer:
        params = custom_part2_trainer['params']

We are ready to fine tune our network! Just like before, we will start a local server to see the training progress of our network. Open a new terminal and activate the environment for this project. Then run the following command: **python -m visdom.server**. This will start a local server. The terminal output should give out a link like: "http://localhost:8097". Open this link in your browser. After you run the following block, visit this link again, and you will be able to see graphs showing the progress of your training! If you do not see any graphs, select Part 2 on the top left bar where is says Environment (only select Part 2, do not check main or Part 1).

# Train the network!
trainer = Trainer(train_dataset, test_dataset, model, loss_function, optimizer, lr_scheduler, params)
best_prec1 = trainer.train_val()
print('Best top-1 Accuracy = {:4.3f}'.format(best_prec1))

In [168]:
trainer = Trainer(train_dataset, test_dataset, model, loss_function, optimizer, lr_scheduler, params) 
best_prec1 = trainer.train_val() 
print('Best top-1 Accuracy = {:4.3f}'.format(best_prec1))

---------------------------------------
Experiment: part2
resume_optim: True
experiment: part2
val_freq: 1
checkpoint_file: None
num_workers: 4
print_freq: 100
shuffle: True
batch_size: 10
do_val: True
n_epochs: 4
---------------------------------------
part2 Epoch 0 / 4
train part2: batch 0/149, loss 2.786, top-1 accuracy 10.000, top-5 accuracy 40.000
train part2: batch 100/149, loss 1.153, top-1 accuracy 60.000, top-5 accuracy 100.000
train part2: loss 0.904719
val part2: batch 0/298, loss 0.492, top-1 accuracy 80.000, top-5 accuracy 100.000
val part2: batch 100/298, loss 0.426, top-1 accuracy 90.000, top-5 accuracy 100.000
val part2: batch 200/298, loss 0.331, top-1 accuracy 80.000, top-5 accuracy 100.000
val part2: loss 0.456064
Checkpoint saved
BEST TOP1 ACCURACY SO FAR
part2 Epoch 1 / 4
train part2: batch 0/149, loss 0.055, top-1 accuracy 100.000, top-5 accuracy 100.000
train part2: batch 100/149, loss 0.227, top-1 accuracy 90.000, top-5 accuracy 100.000
train part2: loss 0.39699

Expect this code to take around 10 minutes on CPU or 30 seconds on GPU. You should hit 80% accuracy. 