# Welcome to the PyTorch Seedlings Exercise

This exercise will cover some key concepts in visual machine learning, including:

* The preparation of image datasets
* The construction of neural net models for image tasks
* The use of *transfer learning,* a method for re-using previously trained models for new tasks

## Notes on Using This Notebook

* Code will be provided for boilerplate tasks; in other places, you will need to fill in code to complete the exercise. Cells you need to fill in will be flagged with the **Exercise** heading.
* The code cells are, in general, meant to be run in order. If you think a code cell should be working, but it isn't, verify that all previous cells were run - the cell you're having trouble with may depend on a variable or file that is created in a previous cell.
* Class names and other text normally meant for consumption by a computer will be rendered in a `monospace font`. This will hopefully reduce confusion between, e.g., the word "dataset" referring to the concept of a cohesive body of data, and the class name `Dataset` referring to the related PyTorch class.

### Do This Now:

The cell below downloads and unzips the dataset we'll be using for this exercise. The dataset is 1.8GB, so **please uncomment and execute the following code cell now** to get the process started. (The commented lines are there to prevent the download triggering accidentally, so you may wish to replace them afterward.)

In [None]:
!curl -0 https://s3-us-west-1.amazonaws.com/pytorch-course-datasets/plant-seedlings-classification.zip > seedlings.zip
!unzip seedlings.zip
!unzip train.zip
!unzip test.zip

## Introduction

This exercise is based on the Kaggle competition, [Plant Seedlings Classification](https://www.kaggle.com/c/plant-seedlings-classification/overview). The goal is to create a neural net that can accurately classify newly sprouted plants as belonging to a particular species. Twelve species are represented in the training data, six crop plants and six undesirable weed plants.

### The Training Dataset

The training dataset is a set of almost 5000 image files, each depicting a seedling, sorted into folders labeled with the correct species name of each plant of interest:

```
train
  \--Black-grass
  |    \--0050f38b3.png
  |    \--0183fdf68.png
  |    ...
  \--Charlock
       \--022179d65.png
       \--02c95e601.png
       ...
```

We will train and validate our dataset with this data.

### Multiple Iterations

We'll show two approaches - one simpler, one more advanced. The simpler one will employ a simple model that we will train from scratch. The second approach will involve *transfer learning,* and will involve doing some domain-specific learning on an existing, pre-trained model.

Don't forget that even if you want to jump ahead to the advanced exercise, it may depend on code executed in earlier cells.

### The Final Step

The *test* dataset is a separate, unlabeled set of images. The final step in today's exercise will be to use your model to classify the unlabeled images. You will export your predictions as a CSV file and upload them to the Kaggle site to receive a final accuracy score.

## The First Iteration: Building from Scratch

Let's Get Started! The code cell below contains imports we'll need; please execute it.

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import torchvision
from torchvision import transforms

import os
import time
import random

random.seed(23)

### Setting Up Your Training Dataset

In order for our images to be consumed by a model, it helps if they are regularized in some way. The function below resizes and crops the images to squares of a specified size. The random flip step is there to manage potential bias in the data. In this case, imagine how your model might be skewed if many photos in the dataset were all taken from the same angle in similar lighting, but new data presented for inference was created under different conditions.

In [2]:
def get_transforms(target_size=100, normalize=False):
    t = transforms.Compose([
        transforms.Resize(target_size),
        transforms.RandomCrop(target_size),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor()
        ])
    if normalize: # for imagenet-trained models specifically
        t = transforms.Compose([
            t,
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ])
    return t

As mentioned in the introduction above, the training data is a set of images, divided into folders, with each folder named for the image's class. There are twelve classes.

This is a common enough arrangement that PyTorch (through the torchvision library) has an `ImageFolder` class that will build a PyTorch `Dataset` object for you from this structure.

In [3]:
full_dataset = torchvision.datasets.ImageFolder('train', transform=get_transforms())
print('This dataset has:')
print('  {} elements'.format(len(full_dataset)))
print('  {} classes'.format(len(full_dataset.classes)))


This dataset has:
  4750 elements
  12 classes


It's a best practice to set aside part of your labeled data for validation. This guards against *overfitting.* The main symptom of overfitting is that a model seems to perform well in training, but does poorly when presented with new data. This happens when the model learns the dataset a little too well, and doesn't develop general rules for dealing with similar inputs. (Qualitatively, this can be compared with a child who has learned multiplication tables by rote up to 10x10, but hasn't learned a rule to multiply 13 x 16.) It often means that the model is overspecified with respect to the data - that is, that the parameter space of the model is large enough to form a map of the individual inputs to specific outputs.

On the other hand, if your model performs just as accurately on the validation dataset as on the training dataset, that's a positive sign that it's learning as intended.

Here, we use `torch.utils.data.random_split()` to extract training and validation sets with an 80/20 split:

In [4]:
train_len = int(0.8 * len(full_dataset))
validate_len = len(full_dataset) - train_len
train_dataset, validate_dataset = torch.utils.data.random_split(full_dataset, (train_len, validate_len))
print('Training dataset contains {} elements'.format(len(train_dataset)))
print('Validation dataset contains {} elements'.format(len(validate_dataset)))

Training dataset contains 3800 elements
Validation dataset contains 950 elements


It is usually convenient to package a `Dataset` in a `DataLoader`. When you're writing your own `Dataset` object, all you have to do is report the number of elements in the set, and return elements (with their labels, if needed) by index. The `DataLoader` handles everything else: Batching, shuffling, multi-threading I/O, sampling, and more. The `DataLoader` is the most common interface for offering data to a training loop.

In [5]:
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=4, shuffle=True)
validate_loader = torch.utils.data.DataLoader(validate_dataset)

### A Simple Model That Might Work

An earlier tutorial in this series, which made a classfier for CIFAR-10 images, used a variant of the LeNet-5 architecture, adapted for 3-channel color and larger images:

```
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), 2)
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.fc1.in_features)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
```

The `__init__()` method defines two convolutional layers and three linear layers. Here's a quick review of what the parameters mean:

* `conv1` is meant to take input with `3` channels (corresponding to the three color channels), and produce an output of `6` feature activation maps, with a detection window of `5` pixels square (it's kernel size). You can think of this layer as scanning the input image and looking for features it recognizes.
* `conv2` takes output with `6` channels (corresponding to the 6 features detected by `conv1`), produces output for 16 features, and also employs a `5`-pixel window. You can think of this layer as composing the features detected by `conv1` into larger features.
* `fc1` and `fc2` perform further processing on the output of the convnet layers.
* `fc3` gives our final output, a vector of `10` elements. These are floating point numbers that relate to the model's confidence that the input belongs to a particular class.

The `forward()` method composes these layers and some important functions into a computation graph that takes in a 3x32x32 tensor representing a 3-color image, and . Here's how the data flows through the graph:

| Stage | Tensor Shape | Notes |
| --- | --- | --- |
| input | 3 x 32 x 32 | 32x32 image with 3 color channels |
| conv1 | 6 x 28 x 28 | 6 features; spatial map reduced from 32 to 28 due to kernel size |
| pooling | 6 x 14 x 14 | every 2 x 2 group of the map elements is reduced to a single element, which takes on the max value of its parent elements |
| conv2 | 16 x 10 x 10 | 16 features; spatial map reduced from 14 to 10 due to kernel size |
| pooling | 16 x 5 x 5 | as above, reducing resolution of the spatial map |
| reshape | 1 x 400 | same data as the 3D tensor in the previous step, but flattened to a vector (400 = 16 x 5 x 5) |
| fc1 | 1 x 120 | |
| fc2 | 1 x 84 | |
| output | 1 x 10 | 10 classes of data |

### Exercise

Below is a skeleton version of the image classifier above, with most the parameters removed. (The 3-color input stays the same, and the `12` for the number of output classes has also been filled in.) **How would you fill in the values to make this work for our 100x100 seedling images?** Don't forget that some values are related, such as the output features of `conv1` and the input features of `conv2`. Some values are related to your input size as well, such as the input width of `fc1`.

Things to think about and experiment with:

**For the convolutional layers:** Does this model work using the same number of features (6 and 16) as before? Is there any advantage to altering the kernel size?

Convolutional layers can also specify a *stride length:* A stride length of 1 means the kernel scans every possible position, a stride of 2 means it scans every other position, 3 means it scans every 3rd, and so on. If you enlarge the kernel, is there an advantage in setting a stride length?

**For the linear layers:** Do the original input widths of the linear layers still work? (Hint: How does `fc1` respond to the new the 3x100x100 input size?) Can the intermediate values be left as-is, or is there benefit to changing them?

In [None]:
 class SeedlingModelV1(nn.Module):
    def __init__(self):
        super(SeedlingModelV1, self).__init__()
        self.conv1 = nn.Conv2d(3, ?, ?)
        self.conv2 = nn.Conv2d(?, ?, ?)
        self.fc1 = nn.Linear(?, ?)
        self.fc2 = nn.Linear(?, ?)
        self.fc3 = nn.Linear(?, 12)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), ?)
        x = F.max_pool2d(F.relu(self.conv2(x)), ?)
        x = x.view(-1, ?)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

In [6]:
class SeedlingModelV1(nn.Module):
    def __init__(self):
        super(SeedlingModelV1, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 22 * 22, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 12)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), 2)
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.fc1.in_features)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

It might be valuable to check your updated model architecture with the code in the next cell. If any of your layers are mismatched, you should get an error. This code:

* instantiates the model
* extracts an instance from the dataset
* feeds the instance to the model for processing

*(NB: The `torch.unsqueeze()` call is there because `forward()` actually expects a batch of tensors. Here, we have added a dimension at the beginning of our lone tensor to create a batch of 1.)*

In [7]:
model = SeedlingModelV1()
image, label = train_dataset.__getitem__(0)
output = model.forward(torch.unsqueeze(image, 0))
print(output)

tensor([[-0.0800, -0.1151, -0.1177, -0.0328,  0.0209,  0.0870,  0.0102,  0.0394,
         -0.0764,  0.0479,  0.0628,  0.0336]], grad_fn=<AddmmBackward>)


### Training the Model

First, we'll define a few constants, including the learning hyperparameters. It can be convenient to have these parameters defined in one place, or specified on the command line, to make it easy to tune them as you're shaking out your training loop.

In [8]:
N_EPOCHS = 20 # number of passes over the training dataset
LR = 0.01 # learning rate
MOMENTUM = 0.5 # for SGD

BATCH_SIZE = 4 # number of instances per batch served by dataloader
NUM_WORKERS = 2 # number of I/O threads used by dataloader

MODEL_DIR = 'models' # save models here
MODEL_SAVEFILE = 'seedling'

And we'll need to create that folder for our models:

In [None]:
!mkdir models

As we train and validate the model, we'll want informative logging so that we know what's going on, and roughly how long it should take. Also, it's a good practice to save the model when it reaches a new accuracy peak, so we'll create a helper for that.

In [9]:
def tlog(msg):
    print('{}   {}'.format(time.asctime(), msg))

    
def save_model(model, epoch):
    tlog('Saving model')
    savefile = "{}-e{}-{}.pt".format(MODEL_SAVEFILE, epoch, int(time.time()))
    path = os.path.join(MODEL_DIR, savefile)
    # recommended way from https://pytorch.org/docs/stable/notes/serialization.html
    torch.save(model.cpu().state_dict(), path)
    return savefile

If we can, we'd like to run this on GPU. Below, we'll check for the presence of a CUDA-compatible device and get a handle to it:

In [10]:
if not torch.cuda.is_available():
    device = torch.device('cpu')
    print('GPU not available - running on CPU.\nIf you are running this notebook in Colab, go to the Runtime menu and select "Change runtime type" to switch to GPU.')
else:
    device = torch.device('cuda')
    print('GPU ready to go!')

GPU not available - running on CPU.
If you are running this notebook in Colab, go to the Runtime menu and select "Change runtime type" to switch to GPU.


Finally, just to make sure we're starting from *tabula rasa* (and for review), let's recreate the key components of our process:

In [11]:
full_dataset = torchvision.datasets.ImageFolder('train', transform=get_transforms())
train_len = int(0.8 * len(full_dataset))
validate_len = len(full_dataset) - train_len
train_dataset, validate_dataset = torch.utils.data.random_split(full_dataset, (train_len, validate_len))

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=BATCH_SIZE, num_workers=NUM_WORKERS, shuffle=True)
validate_loader = torch.utils.data.DataLoader(validate_dataset, batch_size=1)

model = SeedlingModelV1()

Now, we have an untrained model in `model`, our data ready to consume from `train_loader` and `validate_loader`, and a `device` selected. It's time to train!

The structure of this training loop should be familiar from previous exercises.

In [18]:
def train(model, epochs=N_EPOCHS):
    tlog('Training the model...')
    
    best_accuracy = 0. # determines whether we save a copy of the model
    saved_model_filename = None
    
    model = model.to(device) # move to GPU if available
    loss_fn = nn.CrossEntropyLoss() # combines nn.LogSoftmax() and nn.NLLLoss() for classification tasks
    optimizer = optim.SGD(model.parameters(), lr=LR, momentum=MOMENTUM)
    
    for epoch in range(epochs):
        tlog('BEGIN EPOCH {} of {}'.format(epoch + 1, epochs))
        running_loss = 0. # bookkeeping
        
        tlog('Train:')
        for i, data in enumerate(train_loader):
            instances, labels = data[0], data[1]
            instances, labels = instances.to(device), labels.to(device) # move to GPU if available
            
            optimizer.zero_grad()
            guesses = model(instances)
            loss = loss_fn(guesses, labels)
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
            if (i + 1) % 200 == 0: # log every 200 batches
                tlog('  batch {}   avg loss: {}'.format(i + 1, running_loss / (200)))
                running_loss = 0.
        
        tlog('Validate:')
        with torch.no_grad(): # no need to do expensive gradient computation for validation
            total_loss = 0.
            correct = 0
            
            for i, data in enumerate(validate_loader):
                instance, label = data[0], data[1]
                instance, label = instance.to(device), label.to(device) # move to GPU if available
                
                guess = model(instance)
                loss = loss_fn(guess, label)
                total_loss += loss.item()
                
                prediction = torch.argmax(guess, 1)
                if prediction.item() == label.item(): # assuming batch size of 1
                    correct += 1

            avg_loss = total_loss / len(validate_loader)
            accuracy = correct / len(validate_loader)
            tlog('  Avg loss for epoch: {}   accuracy: {}'.format(avg_loss, accuracy))
            
            if accuracy >= best_accuracy:
                tlog( '  New accuracy peak, saving model')
                best_accuracy = accuracy
                saved_model_filename = save_model(model, epoch + 1)
                
    return (saved_model_filename, best_accuracy)
                


When you run the training loop, you should see the loss decreasing and accuracy increasing more-or-less monotonically, both for training and for validation. You should also see the average per-instance loss values roughly similar for validation and testing.

In [17]:
best_model_filename, accuracy  = train(model)
print('The best model is saved at {} with accuracy {}'.format(best_model_filename, accuracy))

Wed May  8 16:26:20 2019   Training the model...
Wed May  8 16:26:20 2019   BEGIN EPOCH 1 of 20
Wed May  8 16:26:20 2019   Train:
Wed May  8 16:26:28 2019     batch 200   avg loss: 2.4408892291784285
Wed May  8 16:26:35 2019     batch 400   avg loss: 2.424518073797226
Wed May  8 16:26:43 2019     batch 600   avg loss: 2.408646847009659
Wed May  8 16:26:51 2019     batch 800   avg loss: 2.374582479596138
Wed May  8 16:26:56 2019   Validate:
Wed May  8 16:27:07 2019     Avg loss for epoch: 2.347108101970271   accuracy: 0.15368421052631578
Wed May  8 16:27:07 2019     New accuracy peak, saving model
Wed May  8 16:27:07 2019   Saving model
Wed May  8 16:27:07 2019   BEGIN EPOCH 2 of 20
Wed May  8 16:27:07 2019   Train:
Wed May  8 16:27:15 2019     batch 200   avg loss: 2.2742838722467424
Wed May  8 16:27:22 2019     batch 400   avg loss: 2.344126656651497
Wed May  8 16:27:29 2019     batch 600   avg loss: 2.2710443186759948
Wed May  8 16:27:37 2019     batch 800   avg loss: 2.0749044781923

Wed May  8 16:37:01 2019     batch 800   avg loss: 0.6167449614405632
Wed May  8 16:37:06 2019   Validate:
Wed May  8 16:37:19 2019     Avg loss for epoch: 1.064957264850014   accuracy: 0.6789473684210526
Wed May  8 16:37:19 2019     New accuracy peak, saving model
Wed May  8 16:37:19 2019   Saving model
Wed May  8 16:37:19 2019   BEGIN EPOCH 16 of 20
Wed May  8 16:37:19 2019   Train:
Wed May  8 16:37:27 2019     batch 200   avg loss: 0.43457256615161893
Wed May  8 16:37:34 2019     batch 400   avg loss: 0.4400522553920746
Wed May  8 16:37:42 2019     batch 600   avg loss: 0.47951856106519697
Wed May  8 16:37:48 2019     batch 800   avg loss: 0.4267819634079933
Wed May  8 16:37:53 2019   Validate:
Wed May  8 16:38:05 2019     Avg loss for epoch: 1.0964412139591417   accuracy: 0.6631578947368421
Wed May  8 16:38:05 2019   BEGIN EPOCH 17 of 20
Wed May  8 16:38:05 2019   Train:
Wed May  8 16:38:13 2019     batch 200   avg loss: 0.33576897144317625
Wed May  8 16:38:20 2019     batch 400   

### Exercise

**What accuracy did you achieve?** Did the model converge (i.e., did the per-instance loss flatten out) in the number of epochs you ran? Was the loss during validation similar to the loss during training?

**Was the learning stable?** Did loss continue to decrease and accuracy increase monotonically?

**How could it improve?** Consider the many choices we've made up to this point, and their effect on the model architecture, the state of the data, and the execution of the training loop:

* **Data:** We regularized the *shape* of the training data, but performed no other normalization. (For more information, see the discussion below on normalization of the color space.) Could the data be altered in some way that improves accuracy?
* **Convnet Layers:** Convolutional layers make use of multiple important parameters. Would preformance be improved with a change to the number of input or output features, or the kernel size, or the stride length?

If you feel your run could have been better, hypothesize about which of the above factors might affect it, and pick one or two to experiment with.

In [None]:
best_model_filename, accuracy  = train(model, epochs=5)
print('The best model is saved at {} with accuracy {}'.format(best_model_filename, accuracy))

In [None]:
class SeedlingModelV1_1(nn.Module):
    def __init__(self):
        super(SeedlingModelV1_1, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 9)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 21 * 21, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 12)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), 2)
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.fc1.in_features)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model_1_1 = SeedlingModelV1_1()
best_model_1_1, acc_1_1 = train(model_1_1)
print('The best updated model is saved at {} with accuracy {}'.format(best_model_1_1, acc_1_1))

In [None]:
def count_model_params(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(count_model_params(model))
print(count_model_params(model_1_1))

## The Second Iteration: Tuning an Existing Model

Now, we'll look at a second technique: Making adjustments to a pre-trained model.

The very best computer vision models can be large indeed. Here's a sampling of the parameter counts of some of the pre-trained models available with the `torchvision` library:

| Model | Number of Parameters |
| --- | --- |
| SeedlingModelV1 (above) | 943,456 |
| SqueezeNet 1.1 | 1,235,496 |
| Resnet 50 | 25,557,032 |
| Densenet-161 | 28,681,000 |
| Alexnet | 61,100,840 |
| VGG-16 | 138,357,544 |

Training a model with tens of millions of parameters - or more! - can take a huge amount of time, even if you have access to hardware acceleration. The good news, as we covered in the earlier unit on transfer learning, is that you can leverage pre-trained models for your problem domain, and greatly reduce your training time.

The pre-trained models available in `torchvision` are all trained against [ImageNet](http://www.image-net.org/about-overview) - a general-purpose set of over a million images drawn from the World Wide Web, categorized by their content into 1000 different categories. We'll be adapting one of these models to see if we can achieve better results while still only incurring a short cost for training time.

### The Model

### The Data

* image size - resize or discard?
* normalization

### The Process

In [25]:
dense = torchvision.models.densenet161(pretrained=True)
r50 = torchvision.models.resnet50(pretrained=True)

In [26]:
dense.classifier = nn.Linear(in_features=2208, out_features=12, bias=True)
r50.fc = nn.Linear(in_features=2048, out_features=12, bias=True)

In [21]:
# NOTE: We are altering variable values from above
full_dataset = torchvision.datasets.ImageFolder('train', transform=get_transforms(target_size=224, normalize=True))
train_len = int(0.8 * len(full_dataset))
validate_len = len(full_dataset) - train_len
train_dataset, validate_dataset = torch.utils.data.random_split(full_dataset, (train_len, validate_len))
print('Training dataset contains {} elements'.format(len(train_dataset)))
print('Validation dataset contains {} elements'.format(len(validate_dataset)))
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=4, shuffle=True)
validate_loader = torch.utils.data.DataLoader(validate_dataset)
images, labels = next(iter(train_loader))
print(images[0].shape) # should be a 3-color image @ 224x224

Training dataset contains 3800 elements
Validation dataset contains 950 elements
torch.Size([3, 224, 224])


In [28]:
def train(model, epochs=N_EPOCHS):
    tlog('Training the model...')
    
    best_accuracy = 0. # determines whether we save a copy of the model
    saved_model_filename = None
    
    model = model.to(device) # move to GPU if available
    loss_fn = nn.CrossEntropyLoss() # combines nn.LogSoftmax() and nn.NLLLoss() for classification tasks
    optimizer = optim.SGD(model.parameters(), lr=LR, momentum=MOMENTUM)
    
    for epoch in range(epochs):
        tlog('BEGIN EPOCH {} of {}'.format(epoch + 1, epochs))
        running_loss = 0. # bookkeeping
        
        tlog('Train:')
        for i, data in enumerate(train_loader):
            instances, labels = data[0], data[1]
            instances, labels = instances.to(device), labels.to(device) # move to GPU if available
            
            optimizer.zero_grad()
            guesses = model(instances)
            loss = loss_fn(guesses, labels)
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
            if (i + 1) % 200 == 0: # log every 200 batches
                tlog('  batch {}   avg loss: {}'.format(i + 1, running_loss / (200)))
                running_loss = 0.
        
        tlog('Validate:')
        with torch.no_grad(): # no need to do expensive gradient computation for validation
            total_loss = 0.
            correct = 0
            
            for i, data in enumerate(validate_loader):
                instance, label = data[0], data[1]
                instance, label = instance.to(device), label.to(device) # move to GPU if available
                
                guess = model(instance)
                loss = loss_fn(guess, label)
                total_loss += loss.item()
                
                prediction = torch.argmax(guess, 1)
                if prediction.item() == label.item(): # assuming batch size of 1
                    correct += 1

            avg_loss = total_loss / len(validate_loader)
            accuracy = correct / len(validate_loader)
            tlog('  Avg loss for epoch: {}   accuracy: {}'.format(avg_loss, accuracy))
            
            if accuracy >= best_accuracy:
                tlog( '  New accuracy peak, saving model')
                best_accuracy = accuracy
                saved_model_filename = save_model(model, epoch + 1)
                
    return (saved_model_filename, best_accuracy)
                


In [27]:
# train(model, epochs=1)
train(r50, epochs=10)

Wed May  8 17:51:30 2019   Training the model...
Wed May  8 17:51:30 2019   BEGIN EPOCH 1 of 10
Wed May  8 17:51:30 2019   Train:
Wed May  8 17:55:32 2019     batch 200   avg loss: 1.960296801328659
Wed May  8 17:59:25 2019     batch 400   avg loss: 1.3483356368541717


KeyboardInterrupt: 