# Dog Classification Project - Neil Simon

This notebook goes through the processes of
1. Retrieving a dataset'
1. Uncompressing that dataset.
1. Uploading that dataset to an S3 bucket.
1. Setting up hyperparameter tuning using learning rate, weight decay, eps and batch size using the AdamW optimizer.
1. Starting a hyperparameter tuning job using 4 training jobs (2 at a time).
1. Record the best hyperparameters as discovered from the above.


In [2]:
!pip install smdebug

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting smdebug
  Using cached smdebug-1.0.12-py2.py3-none-any.whl (270 kB)
Collecting pyinstrument==3.4.2
  Using cached pyinstrument-3.4.2-py2.py3-none-any.whl (83 kB)
Collecting pyinstrument-cext>=0.2.2
  Using cached pyinstrument_cext-0.2.4-cp37-cp37m-manylinux2010_x86_64.whl (20 kB)
Installing collected packages: pyinstrument-cext, pyinstrument, smdebug
Successfully installed pyinstrument-3.4.2 pyinstrument-cext-0.2.4 smdebug-1.0.12


In [3]:
import sagemaker
import boto3

## Dataset
TODO: Explain what dataset you are using for this project. Maybe even give a small overview of the classes, class distributions etc that can help anyone not familiar with the dataset get a better understand of it.

In [4]:
# Create a directory into which we are downloadning and extracting our data (to prevent clobbering local data)
!mkdir -p dogImages
# Wget with -c to make sure that we only retrieve the data if necessary.
!wget -c -P dogImages https://s3-us-west-1.amazonaws.com/udacity-aind/dog-project/dogImages.zip
# Command to unzip data
#unzip commented out as this step has already been performed
#!unzip dogImages/dogImages.zip -d dogImages >/dev/null

--2021-11-21 17:28:42--  https://s3-us-west-1.amazonaws.com/udacity-aind/dog-project/dogImages.zip
Resolving s3-us-west-1.amazonaws.com (s3-us-west-1.amazonaws.com)... 52.219.121.16
Connecting to s3-us-west-1.amazonaws.com (s3-us-west-1.amazonaws.com)|52.219.121.16|:443... connected.
HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable

    The file is already fully retrieved; nothing to do.



In [5]:
# Retrieve the sagemaker session
sagemaker_session=sagemaker.Session()
# Retrieve the sagemaker S3 bucket
bucket = sagemaker_session.default_bucket()
# Set the prefix to us in the above bucket
prefix = "nd009t-c3-project/dogImages"
print("Uploading dogImages")
# Upload to s3 commented out as this step has already been performed
#inputs = sagemaker_session.upload_data(path="dogImages/dogImages", bucket=bucket, key_prefix=prefix)
inputs = 's3://sagemaker-us-east-1-574118147827/nd009t-c3-project/dogImages'
print("input spec (in this case, just an S3 path): {}".format(inputs))

role = sagemaker.get_execution_role()

Uploading dogImages
input spec (in this case, just an S3 path): s3://sagemaker-us-east-1-574118147827/nd009t-c3-project/dogImages


## Hyperparameter Tuning

Setup the hyperparameter ranges. The ranges are chosen to be between 0.1x and 10x the default settings for these ranges.

In [12]:
#HP ranges, metrics etc.
hyperparameter_ranges = {
    "lr": sagemaker.tuner.ContinuousParameter(1e-4, 1e-1),
    "weight-decay": sagemaker.tuner.ContinuousParameter(1e-3, 1e-1),
    "eps": sagemaker.tuner.ContinuousParameter(1e-9, 1e-7),
    "batch-size": sagemaker.tuner.CategoricalParameter([32, 64]),
    #"test-batch-size": sagemaker.tuner.CategoricalParameter([64]),
}
objective_metric_name = "average test loss"
objective_type = "Minimize"
metric_definitions = [{"Name": "average test loss", "Regex": "Test set: Average loss: ([0-9\\.]+)"}]


In [15]:
#Estimator for HPs
from sagemaker.pytorch import PyTorch

#estimator = # TODO: Your estimator here
estimator = PyTorch(
    entry_point="hpo.py",
    role=role,
    py_version='py36',
    framework_version="1.8",
    instance_count=1,
    instance_type="ml.m5.xlarge"
)

#tuner = # TODO: Your HP tuner here
tuner = sagemaker.tuner.HyperparameterTuner(
    estimator,
    objective_metric_name,
    hyperparameter_ranges,
    metric_definitions,
    max_jobs=4,
    max_parallel_jobs=2,
    objective_type=objective_type,
)


In [16]:
tuner.fit({"training": inputs}, wait=True)

...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................!


In [17]:
# The best estimators and the best HPs
best_estimator = tuner.best_estimator()

#Get the hyperparameters of the best trained model
best_estimator.hyperparameters()


2021-11-21 22:07:43 Starting - Preparing the instances for training
2021-11-21 22:07:43 Downloading - Downloading input data
2021-11-21 22:07:43 Training - Training image download completed. Training in progress.
2021-11-21 22:07:43 Uploading - Uploading generated training model
2021-11-21 22:07:43 Completed - Training job completed


{'_tuning_objective_metric': '"average test loss"',
 'batch-size': '"64"',
 'eps': '3.1900006493881446e-09',
 'lr': '0.000778309355328367',
 'sagemaker_container_log_level': '20',
 'sagemaker_estimator_class_name': '"PyTorch"',
 'sagemaker_estimator_module': '"sagemaker.pytorch.estimator"',
 'sagemaker_job_name': '"pytorch-training-2021-11-21-21-37-22-596"',
 'sagemaker_program': '"hpo.py"',
 'sagemaker_region': '"us-east-1"',
 'sagemaker_submit_directory': '"s3://sagemaker-us-east-1-574118147827/pytorch-training-2021-11-21-21-37-22-596/source/sourcedir.tar.gz"',
 'weight-decay': '0.003764938898607458'}

In [20]:
#best_hyperparameters={'batch-size': best_estimator.hyperparameters()['batch-size'],
#                      'eps': best_estimator.hyperparameters()['eps'],
#                      'lr': best_estimator.hyperparameters()['lr'],
#                      'weight-decay': best_estimator.hyperparameters()['weight-decay'],}
#best_hyperparameters
best_hyperparameters={'batch-size': '"64"',
 'eps': '3.1900006493881446e-09',
 'lr': '0.000778309355328367',
 'weight-decay': '0.003764938898607458'}
best_hyperparameters

{'batch-size': '"64"',
 'eps': '3.1900006493881446e-09',
 'lr': '0.000778309355328367',
 'weight-decay': '0.003764938898607458'}

## Model Profiling and Debugging
TODO: Using the best hyperparameters, create and finetune a new model

**Note:** You will need to use the `train_model.py` script to perform model profiling and debugging.

In [None]:
# TODO: Set up debugging and profiling rules and hooks

In [None]:
# TODO: Create and fit an estimator

estimator = PyTorch(
    entry_point="model_train.py",
    role=role,
    py_version='py36',
    framework_version="1.8",
    instance_count=1,
    instance_type="ml.m5.xlarge"
)

In [None]:
# TODO: Plot a debugging output.

**TODO**: Is there some anomalous behaviour in your debugging output? If so, what is the error and how will you fix it?  
**TODO**: If not, suppose there was an error. What would that error look like and how would you have fixed it?

In [None]:
# TODO: Display the profiler output

## Model Deploying

In [None]:
# TODO: Deploy your model to an endpoint

predictor=estimator.deploy() # TODO: Add your deployment configuration like instance type and number of instances

In [None]:
# TODO: Run an prediction on the endpoint

image = # TODO: Your code to load and preprocess image to send to endpoint for prediction
response = predictor.predict(image)

In [None]:
# TODO: Remember to shutdown/delete your endpoint once your work is done
predictor.delete_endpoint()