---
<a id='step3'></a>
## Step 3: Create a CNN to Classify Dog Breeds (from Scratch)

Now that we have functions for detecting humans and dogs in images, we need a way to predict breed from images.  In this step, you will create a CNN that classifies dog breeds.  You must create your CNN _from scratch_ (so, you can't use transfer learning _yet_!), and you must attain a test accuracy of at least 10%.  In Step 4 of this notebook, you will have the opportunity to use transfer learning to create a CNN that attains greatly improved accuracy.

We mention that the task of assigning breed to dogs from images is considered exceptionally challenging.  To see why, consider that *even a human* would have trouble distinguishing between a Brittany and a Welsh Springer Spaniel.  

Brittany | Welsh Springer Spaniel
- | - 
<img src="images/Brittany_02625.jpg" width="100"> | <img src="images/Welsh_springer_spaniel_08203.jpg" width="200">

It is not difficult to find other dog breed pairs with minimal inter-class variation (for instance, Curly-Coated Retrievers and American Water Spaniels).  

Curly-Coated Retriever | American Water Spaniel
- | -
<img src="images/Curly-coated_retriever_03896.jpg" width="200"> | <img src="images/American_water_spaniel_00648.jpg" width="200">


Likewise, recall that labradors come in yellow, chocolate, and black.  Your vision-based algorithm will have to conquer this high intra-class variation to determine how to classify all of these different shades as the same breed.  

Yellow Labrador | Chocolate Labrador | Black Labrador
- | -
<img src="images/Labrador_retriever_06457.jpg" width="150"> | <img src="images/Labrador_retriever_06455.jpg" width="240"> | <img src="images/Labrador_retriever_06449.jpg" width="220">

We also mention that random chance presents an exceptionally low bar: setting aside the fact that the classes are slightly imabalanced, a random guess will provide a correct answer roughly 1 in 133 times, which corresponds to an accuracy of less than 1%.  

Remember that the practice is far ahead of the theory in deep learning.  Experiment with many different architectures, and trust your intuition.  And, of course, have fun!

### (IMPLEMENTATION) Specify Data Loaders for the Dog Dataset

Use the code cell below to write three separate [data loaders](http://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) for the training, validation, and test datasets of dog images (located at `dog_images/train`, `dog_images/valid`, and `dog_images/test`, respectively).  You may find [this documentation on custom datasets](http://pytorch.org/docs/stable/torchvision/datasets.html) to be a useful resource.  If you are interested in augmenting your training and/or validation data, check out the wide variety of [transforms](http://pytorch.org/docs/stable/torchvision/transforms.html?highlight=transform)!

In [1]:
import os
from torchvision import datasets
import torch
import torchvision.models as models
import torchvision.transforms as transforms
import numpy as np



training_data_transform = transforms.Compose([transforms.Resize(255),
                transforms.CenterCrop(224), 
                transforms.RandomHorizontalFlip(),
                transforms.RandomRotation(20),
                transforms.ToTensor(),
                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
other_data_transform = transforms.Compose([transforms.CenterCrop(224), 
                transforms.ToTensor(),
                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])


train_data = datasets.ImageFolder("/data/dog_images/train", transform = training_data_transform)
test_data = datasets.ImageFolder("/data/dog_images/test", transform = other_data_transform)
valid_data = datasets.ImageFolder("/data/dog_images/valid", transform = other_data_transform)

# define dataloader parameters
batch_size = 20
num_workers = 0

# prepare data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, 
                                           num_workers=num_workers, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, 
                                          num_workers=num_workers, shuffle=True)
valid_loader = torch.utils.data.DataLoader(valid_data, batch_size=batch_size, 
                                          num_workers=num_workers, shuffle=True)

## Specify appropriate transforms, and batch_sizes
loaders_scratch = {"train" : train_loader, "test" : test_loader, "valid" : valid_loader}

use_cuda = torch.cuda.is_available()



**Question 3:** Describe your chosen procedure for preprocessing the data. 
- How does your code resize the images (by cropping, stretching, etc)?  What size did you pick for the input tensor, and why?
- Did you decide to augment the dataset?  If so, how (through translations, flips, rotations, etc)?  If not, why not?


**Answer**:

### (IMPLEMENTATION) Model Architecture

Create a CNN to classify dog breed.  Use the template in the code cell below.

In [2]:
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    ### TODO: choose an architecture, and complete the class
    def __init__(self):
        super(Net, self).__init__()
        ## Define layers of a CNN
        # input 224 x 224 x 3 
        self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
        # input 112 x 112 x 16 
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        # input 56 x 56 x 32 
        self.conv3 = nn.Conv2d(32, 64, 3, padding=1)
        # input 28 x 28 x 64 
        self.conv4 = nn.Conv2d(64, 128, 3, padding=1)
        # input 14 x 14 x 128
        self.conv5 = nn.Conv2d(128, 256, 3, padding=1)
        # input 7 x 7 x 256 
        self.conv6 = nn.Conv2d(256, 512, 3, padding=1)        
        # max pooling layer
        self.maxpool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(512*7*7, 256)
        self.bn1 = nn.BatchNorm1d(256)
        self.fc2 = nn.Linear(256, 133)
        
        # dropout layer
        self.dropout = nn.Dropout(0.3)
    
    def forward(self, x):
        #activation = nn.ReLU()
        x = F.relu(self.conv1(x))
        x = self.maxpool(x)
        x = F.relu(self.conv2(x))
        x = self.maxpool(x)
        x = F.relu(self.conv3(x))
        x = self.maxpool(x)
        x = F.relu(self.conv4(x))
        x = self.maxpool(x)
        x = F.relu(self.conv5(x))
        x = self.maxpool(x)

        
        x = F.relu(self.conv6(x))
        x = x.view(-1,512*7*7)
        x = self.fc1(x)
        x = self.bn1(x)
        x = F.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        
        return x
    
#-#-# You do NOT have to modify the code below this line. #-#-#

# instantiate the CNN
model_scratch = Net()

# move tensors to GPU if CUDA is available
if use_cuda:
    model_scratch.cuda()

__Question 4:__ Outline the steps you took to get to your final CNN architecture and your reasoning at each step.  

__Answer:__ 

### (IMPLEMENTATION) Specify Loss Function and Optimizer

Use the next code cell to specify a [loss function](http://pytorch.org/docs/stable/nn.html#loss-functions) and [optimizer](http://pytorch.org/docs/stable/optim.html).  Save the chosen loss function as `criterion_scratch`, and the optimizer as `optimizer_scratch` below.

In [3]:
import torch.optim as optim
criterion_scratch = nn.CrossEntropyLoss()
optimizer_scratch = optim.Adam(model_scratch.parameters(), lr=0.001 )

### (IMPLEMENTATION) Train and Validate the Model

Train and validate your model in the code cell below.  [Save the final model parameters](http://pytorch.org/docs/master/notes/serialization.html) at filepath `'model_scratch.pt'`.

In [4]:
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
train_losses, valid_losses = [], []


def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
    valid_loss_min = np.Inf 
    last_valid_loss = 0.0
    for epoch in range(1, n_epochs+1):
        train_loss = 0.0
        valid_loss = 0.0
        
        ###################
        # train the model #
        ###################
        model.train()
        
        for batch_idx, (data, target) in enumerate(loaders['train']):
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            optimizer.zero_grad()
            output = model(data)
            output = torch.squeeze(output)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
            train_loss += ((1 / (batch_idx + 1)) * (loss.data - train_loss))
            
        ######################    
        # validate the model #
        ######################
        model.eval()
        
        with torch.no_grad():
            for batch_idx, (data, target) in enumerate(loaders['valid']):
                if use_cuda:
                    data, target = data.cuda(), target.cuda()
                output = model(data)
                loss = criterion(output, target)
                valid_loss += ((1 / (batch_idx + 1)) * (loss.data - valid_loss))

        print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
            epoch, 
            train_loss,
            valid_loss
            ))
        ## TODO: save the model if validation loss has decreased
        if valid_loss < last_valid_loss: 
            torch.save(model, save_path)
                
        last_valid_loss = valid_loss
        train_losses.append(train_loss)
        valid_losses.append(valid_loss)
   
    return model




### (IMPLEMENTATION) Test the Model

Try out your model on the test dataset of dog images.  Use the code cell below to calculate and print the test loss and accuracy.  Ensure that your test accuracy is greater than 10%.

In [5]:
def test(loaders, model, criterion, use_cuda):

    test_loss = 0.
    correct = 0.
    total = 0.

    model.eval()
    with torch.no_grad():
        for batch_idx, (data, target) in enumerate(loaders['test']):
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            output = model(data)
            loss = criterion(output, target)
            test_loss += ((1 / (batch_idx + 1)) * (loss.data - test_loss))
            pred = output.data.max(1, keepdim=True)[1]
            correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
            total += data.size(0)
    print('Test Loss: {:.6f}\n'.format(test_loss))
    print('\nTest Accuracy: %2d%% (%2d/%2d)' % (100. * correct / total, correct, total))



In [6]:
from workspace_utils import active_session
n = 25
with active_session():
    model_scratch = train(n, loaders_scratch, model_scratch, optimizer_scratch, 
                      criterion_scratch, use_cuda, 'model_scratch7.pt')
    test(loaders_scratch, model_scratch, criterion_scratch, use_cuda)


Epoch: 1 	Training Loss: 4.747371 	Validation Loss: 4.684639
Epoch: 2 	Training Loss: 4.537896 	Validation Loss: 4.515203


  "type " + obj.__name__ + ". It won't be checked "


Epoch: 3 	Training Loss: 4.338673 	Validation Loss: 4.633378
Epoch: 4 	Training Loss: 4.180035 	Validation Loss: 4.744027
Epoch: 5 	Training Loss: 4.067447 	Validation Loss: 4.348669
Epoch: 6 	Training Loss: 3.950940 	Validation Loss: 4.298106
Epoch: 7 	Training Loss: 3.771987 	Validation Loss: 4.300085
Epoch: 8 	Training Loss: 3.635617 	Validation Loss: 4.268655
Epoch: 9 	Training Loss: 3.491191 	Validation Loss: 4.127584
Epoch: 10 	Training Loss: 3.314965 	Validation Loss: 4.274055
Epoch: 11 	Training Loss: 3.136083 	Validation Loss: 3.950857
Epoch: 12 	Training Loss: 2.978744 	Validation Loss: 3.995743
Epoch: 13 	Training Loss: 2.800841 	Validation Loss: 3.915588
Epoch: 14 	Training Loss: 2.642503 	Validation Loss: 4.190447
Epoch: 15 	Training Loss: 2.503570 	Validation Loss: 3.881375
Epoch: 16 	Training Loss: 2.350058 	Validation Loss: 3.917857
Epoch: 17 	Training Loss: 2.199865 	Validation Loss: 4.001546
Epoch: 18 	Training Loss: 2.059152 	Validation Loss: 3.861259
Epoch: 19 	Trai

KeyboardInterrupt: 

In [7]:
 test(loaders_scratch, model_scratch, criterion_scratch, use_cuda)

Test Loss: 4.174478


Test Accuracy: 22% (185/836)
