# Train and Deploy Custom Model in AWS

## Project: Train, Evaluate and Deploy Dog Identification App in SageMaker
---
### Why We're Here 
In this notebook, we will train and deploy a **custom model** in SageMaker. Specifically, the pretrained PyTorch model from  [Dog Breed Classifier](https://github.com/reedemus/dog_breed_classifier) project will be used as an example for this exercise. 
### The Road Ahead

We break the notebook into separate steps. Feel free to use the links below to navigate the notebook.

* [Step 1](#step1): Upload the dataset into an S3 bucket
* [Step 2](#step2): Create the custom model
* [Step 3](#step3): Completing a training script
* [Step 4](#step4): Training and deploying the custom model
* [Step 5](#step5): Evaluating the performance
---
<a id='step1'></a>
## Step 1: Upload the dataset to S3

We will import the AWS SageMaker libraries and define helper functions for handling the dataset. We will download the dog dataset from [this URL](https://s3-us-west-1.amazonaws.com/udacity-aind/dog-project/dogImages.zip) and extract it before uploading them into the bucket.


In [None]:
# import the required libraries
import requests
import boto3
import sagemaker
from zipfile import ZipFile
from tqdm import tqdm

Define `downloadFile` and `extractFile` helper functions to download the dataset.

In [None]:
def downloadFile(file_url, file_name, dir=None, chunk_size=1024):
    '''
    Helper function to download file to specified directory

    :param file_url: file download URL
    :param file_name: file name to be saved.
    :param dir: path where file is saved other than current directory (Default = current working directory)
    :param chunk_size: size of file chunk to download (Default = 1024 bytes)
    :returns: None
    '''
    saved_file_path = file_name
    if dir != None and not os.path.exists(dir):
        os.mkdir(dir)
        saved_file_path = os.path.join(dir, file_name)

    r = requests.get(file_url, stream=True)
    total_size_in_bytes = len(r.content)
    progress_bar = tqdm(total=total_size_in_bytes, unit='iB', unit_scale=True, desc=file_name)
    
    with open(saved_file_path, 'wb') as f:
        for chunk in r.iter_content(chunk_size):
            progress_bar.update(len(chunk))
            # writing one chunk at a time to file
            if chunk:
                f.write(chunk)
    progress_bar.close()
    if total_size_in_bytes != 0 and progress_bar.n != total_size_in_bytes:
       print("ERROR, something went wrong")
       return

def extractFile(file_name):
    '''
    Extracts compressed file in zip format into current directory
    
    :param file_name: file name
    :returns: None
    '''
    # create a zipfile object and extract it to current directory
    print("Extracting file...")
    with ZipFile(file_name, 'r') as z:
        z.extractall()


Downlaod the dataset into current directory. The default folder after extraction is `dogImages/`.

In [None]:
from glob import glob
import numpy as np

dog_url = 'https://s3-us-west-1.amazonaws.com/udacity-aind/dog-project/dogImages.zip'

downloadFile(dog_url, 'dogImages.zip')
extractFile('dogImages.zip')

# load filenames for human and dog images
dog_files = np.array(glob("dogImages/*/*/*"))

# print number of images in each dataset
print('There are %d total dog images.' % len(dog_files))

Let's start by creating a SageMaker session and specifying:

- The S3 bucket and prefix that you want to use for training and model data. This should be within the same region as the Notebook Instance, training, and hosting.
- The IAM role arn used to give training and hosting access to your data. See the documentation for how to create these. Note, if more than one role is required for notebook instances, training, and/or hosting, please replace the `sagemaker.get_execution_role()` with a the appropriate full IAM role arn string(s).

In [None]:
# session and role
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()

# create an S3 bucket
bucket = sagemaker_session.default_bucket()

# Name of the dataset directory
data_dir = 'dogImages'

# set prefix, a descriptive name for a directory  
prefix = 'dog-breed-classifier'

# upload all data to S3
input_dataset = sagemaker_session.upload_data(path=data_dir, bucket=bucket, key_prefix=prefix)
print(input_dataset)

## Test cell
Test that our data has been successfully uploaded. The cell below prints out the items in the S3 bucket and will throw an error if it is empty. We should see the contents of ```data_dir``` and perhaps some checkpoints. If there are any other files listed, then we may have some old model files that can be deleted via the S3 console (though, additional files shouldn't affect the performance of model developed in this notebook).


In [None]:
# confirm that data is in S3 bucket
empty_check = []
for obj in boto3.resource('s3').Bucket(bucket).objects.all():
    empty_check.append(obj.key)
    print(obj.key)

assert len(empty_check) !=0, 'S3 bucket is empty.'
print('Test passed!')

---
<a id='step2'></a>
## Step 2: Create the custom model
Create a CNN model to classify dog breed using transfer learning. The model is defined in `model.py`.

In [None]:
# Print the implementation using a Python syntax highlighter package
!pygmentize model.py

---
<a id='step3'></a>
## Step 3: Create the training script
Once the model is developed, we implement the training script ```train.py```. The script does the following steps:

1. Loads training data from a specified directory
2. Parses any training & model hyperparameters (ex. nodes in a neural network, training epochs, etc.)
3. Instantiates a model of your design, with any specified hyperparams
4. Trains that model
5. Finally, saves the model so that it can be hosted/deployed later

From the code below, notice a few things:

- Model loading (`model_fn`) and saving code
- Getting SageMaker's default hyperparameters
- Loading the training data

If you'd like to read more about model saving with __[torch.save](https://pytorch.org/tutorials/beginner/saving_loading_models.html)__, click on the provided links.

In [None]:
# Print the implementation using a Python syntax highlighter package
!pygmentize train.py

---
<a id='step4'></a>
## Step 4: Create an Estimator
When a custom model is constructed in SageMaker, an entry point must be specified. This is the Python file which will be executed when the model is trained; the `train.py` function you specified above. To run a custom training script in SageMaker, construct an estimator, and fill in the appropriate constructor arguments:

- *entry_point*: The path to the Python script SageMaker runs for training and prediction.
- *source_dir*: The path to the training script directory source_sklearn OR source_pytorch.
- *role*: Role ARN, which was specified, above.
- *train_instance_count*: The number of training instances (should be left at 1).
- *train_instance_type*: The type of SageMaker instance for training. 
- *sagemaker_session*: The session used to train on Sagemaker.
- *hyperparameters (optional)*: A dictionary { 'name':value, .. } passed to the train function as hyperparameters.
>Note: For a PyTorch model, there is another optional argument *framework_version*, which you can set to the latest version of PyTorch.

### Define the estimator

In [None]:
# Define a PyTorch estimator
from sagemaker.pytorch import PyTorch

# specify an output path
# prefix is specified above
output_path = 's3://{}/{}'.format(bucket, prefix)

# instantiate  the estimator
estimator = PyTorch(entry_point='train.py',
                    source_dir='code', # train.py at code directory
                    role=role,
                    py_version='py3',
                    framework_version='1.10.0', # PyTorch version
                    instance_count=1,
                    instance_type='ml.c4.xlarge',
                    output_path=output_path,
                    sagemaker_session=sagemaker_session,
                    hyperparameters={
                        'epochs': 50,
                        'batch-size': 64,
                        'lr': 0.001
                    })

### Train the estimator
Train your estimator on the training data stored in S3. This should create a training job that you can monitor in your SageMaker console.

In [None]:
%%time

# Train your estimator on S3 training data
estimator.fit({ 'train': input_dataset })

### Deploy the estimator

After training, deploy your model to create a predictor. If you're using a PyTorch model, you'll need to create a trained PyTorchModel that accepts the trained <model>.model_data as an input parameter and points to the provided source_pytorch/predict.py file as an entry point.

To deploy a trained model, you'll use `model.deploy`, which takes in two arguments:

- initial_instance_count: The number of deployed instances (1).
- instance_type: The type of SageMaker instance for deployment.
>Note: If you run into an instance error, it may be because you chose the wrong training or deployment instance_type. It may help to refer to your previous exercise code to see which types of instances we used.

In [None]:
%%time

# from sagemaker.pytorch import PyTorchModel


# deploy your model to create a predictor
predictor = estimator.deploy(initial_instance_count=1, instance_type='ml.t2.medium')

### (IMPLEMENTATION) Test the Model

Try out your model on the test dataset of dog images.  Use the code cell below to calculate and print the test loss and accuracy.  Ensure that your test accuracy is greater than 10%.

In [None]:
def test(loaders, model, criterion, use_cuda):

    # monitor test loss and accuracy
    test_loss = 0.
    correct = 0.
    total = 0.

    model.eval()
    for batch_idx, (data, target) in enumerate(loaders['test']):
        # move to GPU
        if use_cuda:
            data, target = data.cuda(), target.cuda()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # calculate the loss
        loss = criterion(output, target)
        # update average test loss 
        test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data - test_loss))
        # convert output probabilities to predicted class
        pred = output.data.max(1, keepdim=True)[1]
        # compare predictions to true label
        correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
        total += data.size(0)
            
    print('Test Loss: {:.6f}\n'.format(test_loss))

    print('\nTest Accuracy: %2d%% (%2d/%2d)' % (100. * correct / total, correct, total))

In [None]:
# call test function    
test(loaders_scratch, model_scratch, criterion_scratch, use_cuda)

---
<a id='step4'></a>
## Step 4: Create a CNN to Classify Dog Breeds (using Transfer Learning)

You will now use transfer learning to create a CNN that can identify dog breed from images.  Your CNN must attain at least 60% accuracy on the test set.

### (IMPLEMENTATION) Specify Data Loaders for the Dog Dataset

Use the code cell below to write three separate [data loaders](http://pytorch.org/docs/master/data.html#torch.utils.data.DataLoader) for the training, validation, and test datasets of dog images (located at `dogImages/train`, `dogImages/valid`, and `dogImages/test`, respectively). 

If you like, **you are welcome to use the same data loaders from the previous step**, when you created a CNN from scratch.

>__Note:__ Previous loaders cannot be re-used because the input image was resized to **256 x 256**, but models trained on ImageNet are **224 x 224**. So we have to start over writing a new loader, *loaders_transfer*.


In [None]:
## TODO: Specify data loaders
# Note: pretrained models from ImageNet requires input images to be shape (3 x h x w) with h,w >= 224 and 
#       normalized to mean,std dev as per ImageNet (mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225])
#
# Reference: https://pytorch.org/vision/stable/models.html
import os
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

## Specify appropriate transforms, and batch_sizes
root_dir = 'dogImages/'
train_dir = os.path.join(root_dir, 'train')
valid_dir = os.path.join(root_dir, 'valid')
test_dir = os.path.join(root_dir, 'test')
# ResNet input image size
IMG_SIZE = 224
# mean and std deviation of models trained on Imagenet dataset
mean = [0.485, 0.456, 0.406]
std_dev = [0.229, 0.224, 0.225]

# Data augmentation to create a variety of test images so the model learn to generalize better.
# Output is a tensor.
preprocess_train = transforms.Compose([
                                    transforms.RandomResizedCrop(IMG_SIZE),
                                    transforms.RandomRotation(20),
                                    transforms.RandomHorizontalFlip(),
                                    transforms.ToTensor(),
                                    transforms.Normalize( mean, std_dev)
                                    ])

# Data augmentation is not performed on validation and test datasets because the goal is not to create more data,
# but to resize and crop the images to the same size as the input image.
# Output is a tensor.
preprocess_valid_test = transforms.Compose([
                                    transforms.Resize(256),
                                    transforms.CenterCrop(IMG_SIZE),
                                    transforms.ToTensor(),
                                    transforms.Normalize( mean, std_dev)
                                    ])

train_dataset = datasets.ImageFolder(train_dir, transform=preprocess_train)
valid_dataset = datasets.ImageFolder(valid_dir, transform=preprocess_valid_test)
test_dataset = datasets.ImageFolder(test_dir, transform=preprocess_valid_test)

BATCH_SIZE = 64
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
valid_loader = DataLoader(valid_dataset, batch_size=BATCH_SIZE, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)

loaders_transfer = { 'train':train_loader, 'valid':valid_loader, 'test':test_loader }

### (IMPLEMENTATION) Model Architecture

Use transfer learning to create a CNN to classify dog breed.  Use the code cell below, and save your initialized model as the variable `model_transfer`.

In [None]:
import torch
import torchvision.models as models
import torch.nn as nn

# Using feature extraction approach
# =================================
# Freeze the weights for all of the network except the final fully connected(FC) layer.
# This last FC layer is replaced with a new one with random weights and only this layer is trained.
# https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html#initialize-and-reshape-the-networks

# ResNet 152-layer model
model_transfer = models.resnet152(pretrained=True)

# Freeze the pre-trained weights,biases of all layers at first so it doesn't get updated during re-training
for param in model_transfer.parameters():
    param.requires_grad = False

# Get the number of input features in the last FC layer
# Reinitialize output features to number of dog breed classes
input_features = model_transfer.fc.in_features
DOG_BREEDS_NUM = 133
model_transfer.fc = nn.Linear(input_features, DOG_BREEDS_NUM)

print("ResNet-152 last fc layer:", models.resnet152().fc)
print("Our fc layer:", model_transfer.fc)

use_cuda = torch.cuda.is_available()
if use_cuda:
    model_transfer = model_transfer.cuda()

__Question 5:__ Outline the steps you took to get to your final CNN architecture and your reasoning at each step.  Describe why you think the architecture is suitable for the current problem.

__Answer:__
From [PyTorch website](https://pytorch.org/vision/stable/models.html), ResNet152 has the lowest Top-1 and Top-5 errors among all the ResNet architectures. Also, the model has very good accuracy with a deep 152 hidden layers. Since it is pre-trained on ImageNet, which contains dog breed classes, the model is a pretty good candidate for our use case.


### (IMPLEMENTATION) Specify Loss Function and Optimizer

Use the next code cell to specify a [loss function](http://pytorch.org/docs/master/nn.html#loss-functions) and [optimizer](http://pytorch.org/docs/master/optim.html).  Save the chosen loss function as `criterion_transfer`, and the optimizer as `optimizer_transfer` below.

In [None]:
# use same as before
import torch.optim as optim

criterion_transfer = nn.CrossEntropyLoss()
optimizer_transfer = optim.Adam( model_transfer.parameters(), lr=0.001 )

### (IMPLEMENTATION) Train and Validate the Model

Train and validate your model in the code cell below.  [Save the final model parameters](http://pytorch.org/docs/master/notes/serialization.html) at filepath `'model_transfer.pt'`.

In [None]:
# train the model
n_epochs = 40
model_transfer = train(n_epochs, loaders_transfer, model_transfer, optimizer_transfer, criterion_transfer, use_cuda, 'model_transfer.pt')

# save model to Google Drive
if isColabRunning:
    saveToDrive('model_transfer.pt')

# load the model that got the best validation accuracy (uncomment the line below)
model_transfer.load_state_dict(torch.load('model_transfer.pt'))

### (IMPLEMENTATION) Test the Model

Try out your model on the test dataset of dog images. Use the code cell below to calculate and print the test loss and accuracy.  Ensure that your test accuracy is greater than 60%.

In [None]:
test(loaders_transfer, model_transfer, criterion_transfer, use_cuda)

### (IMPLEMENTATION) Predict Dog Breed with the Model

Write a function that takes an image path as input and returns the dog breed (`Affenpinscher`, `Afghan hound`, etc) that is predicted by your model.  

In [None]:
### TODO: Write a function that takes a path to an image as input
### and returns the dog breed that is predicted by the model.
from PIL import Image
import torchvision.transforms as transforms
import torch
  
# list of class names by index, i.e. a name can be accessed like class_names[0]
class_names = [item[4:].replace("_", " ") for item in loaders_transfer['train'].dataset.classes]

def predict_breed_transfer(img_path):
    # load the image and return the predicted breed
    image = Image.open(img_path).convert(mode='RGB')
    IMAGE_SIZE = 224
    # preprocess the image using transform
    prediction_transform = transforms.Compose([
                                        transforms.Resize(256),
                                        transforms.CenterCrop(IMAGE_SIZE),
                                        transforms.ToTensor(),
                                        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                                             std=[0.229, 0.224, 0.225] )
                                        ])
    image_tensor = prediction_transform(image).unsqueeze(0)
    # move to GPU
    if use_cuda:
        image_tensor = image_tensor.cuda()
    
    # set to evaluation mode for inferencing
    model_transfer.eval()
    idx = torch.argmax(model_transfer(image_tensor))
    return class_names[idx]

---
<a id='step5'></a>
## Step 5: Write your Algorithm

Write an algorithm that accepts a file path to an image and first determines whether the image contains a human, dog, or neither.  Then,
- if a __dog__ is detected in the image, return the predicted breed.
- if a __human__ is detected in the image, return the resembling dog breed.
- if __neither__ is detected in the image, provide output that indicates an error.

You are welcome to write your own functions for detecting humans and dogs in images, but feel free to use the `face_detector` and `dog_detector` functions developed above.  You are __required__ to use your CNN from Step 4 to predict dog breed.  

Some sample output for our algorithm is provided below, but feel free to design your own user experience!

![Sample Human Output](https://github.com/reedemus/dog_breed_classifier/blob/transfer-learning/images/sample_human_output.png?raw=1)


### (IMPLEMENTATION) Write your Algorithm

In [None]:
### TODO: Write your algorithm.
### Feel free to use as many code cells as needed.
def run_app(img_path):
    '''handle cases for a human face, dog, and neither'''
    # open image
    img = Image.open(img_path)
    
    # predict its breed
    breed = predict_breed_transfer(img_path)
    
    if face_detector(img_path) == True:
        className = predict_breed_transfer(img_path)
        msg = "You are not a dog...but sure looks like a " + str(className)
    elif dog_detector(img_path) == True:
        className = predict_breed_transfer(img_path)
        msg = "I'm guessing your dog is a " + str(className) + "!"
    else:
        msg = "Not a dog or human...what are you?"

    plt.imshow(img)
    plt.axis('off')
    plt.title(msg)
    plt.show()

---
<a id='step6'></a>
## Step 6: Test Your Algorithm

In this section, you will take your new algorithm for a spin!  What kind of dog does the algorithm think that _you_ look like?  If you have a dog, does it predict your dog's breed accurately?  If you have a cat, does it mistakenly think that your cat is a dog?

### (IMPLEMENTATION) Test Your Algorithm on Sample Images!

Test your algorithm at least six images on your computer.  Feel free to use any images you like.  Use at least two human and two dog images.  

__Question 6:__ Is the output better than you expected :) ?  Or worse :( ?  Provide at least three possible points of improvement for your algorithm.

__Answer:__ (Three possible points for improvement)
<br> Yes, the outputs are more accurate than the model output from the CNN designed from scratch.
However, the model is unable to classify the last dog image which contains a yellow ball. Some improvements that can be tried are:
1. Add more variety of images to the dog train dataset, which includes a dog with a ball.
> a quick glance on the train dataset shows there is no dog images with a ball.
2. Increase the training sample size for this dog breed class by finding more photos.
3. Resize and crop the test image before feeding into the model.

In [None]:
## TODO: Execute your algorithm from Step 6 on
## at least 6 images on your computer.
## Feel free to use as many code cells as needed.

## suggested code, below
for file in np.hstack((human_files[:3], dog_files[:3])):
    run_app(file)