# Automatic model tuning of PyTorch models with Amazon SageMaker
This lab demonstrates the power of **Amazon SageMaker's automatic model tuning capability**, also known as hyperparameter optimization (HPO). Instead of a labor intensive process of trial and error that could take days or weeks, [automatic model tuning](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html) let's a data scientist ask SageMaker to find the optimal set of hyperparameters, typically in minutes or hours. 

The notebook shows how to provide a set of parameters to tune, ranges to consider, a metric to optimize on, some limits on the number of jobs to consider, and the compute capacity to leverage. A SageMaker tuning job then efficiently explores options using a Bayesian optimization. SageMaker creates a set of models and highlights which one is optimal given your constraints. The resulting model is ready for deployment behind an endpoint or for batch predictions.

## Setup
For this notebook, we simply get our security role and establish some parameters for use of S3.

In [1]:
import sagemaker
from sagemaker import get_execution_role
import boto3

client  = boto3.client(service_name='sagemaker')

role = get_execution_role()
print(role)
sess = sagemaker.Session()

bucket = sess.default_bucket() # or custom bucket name
prefix = 'DEMO-PYT-image-classification-birds'
JOB_PREFIX = 'pyt-hpo-ic'
FRAMEWORK_VERSION = '1.3.1'

arn:aws:iam::355151823911:role/service-role/AmazonSageMaker-ExecutionRole-20180515T132694


This notebook relies on execution of previous notebooks in this workshop. Specifically, it assumes the image data has been prepared and uploaded to s3. Here we just define exactly where the training jobs will pull their image data from.

In [2]:
train_inputs = 's3://{}/{}/train'.format(bucket, prefix)
val_inputs   = 's3://{}/{}/validation'.format(bucket, prefix)
test_inputs  = 's3://{}/{}/test'.format(bucket, prefix)
print('Training data:   {}\nValidation data: {}\nTest data:       {}'.format(train_inputs, val_inputs, test_inputs))

Training data:   s3://sagemaker-us-east-2-355151823911/DEMO-PYT-image-classification-birds/train
Validation data: s3://sagemaker-us-east-2-355151823911/DEMO-PYT-image-classification-birds/validation
Test data:       s3://sagemaker-us-east-2-355151823911/DEMO-PYT-image-classification-birds/test


Here are the classes that have been uploaded to s3 for training.

In [3]:
!aws s3 ls $train_inputs/

                           PRE 013.Bobolink/
                           PRE 017.Cardinal/
                           PRE 035.Purple_Finch/
                           PRE 036.Northern_Flicker/


## Create hyperparameter tuning job
To use Amazon SageMaker's automatic model tuning capability, you create a tuning job, which in turn will launch a set of SageMaker training jobs. As when creating a training job directly, you first establish a set of hyperparameters, some metric definitions, and then a TensorFlow estimator which will be fed a Python training script.

In [4]:
from sagemaker.pytorch import PyTorch
from sagemaker.tuner import IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner

In [6]:
hyperparameters = {'initial_epochs': 5, 
                   'data_dir': '/opt/ml/input/data',
                   'dropout': 0.5}

metric_definitions=[{'Name' : 'validation:acc', 
                     'Regex': '.*Test accuracy: (.*$)'},
                    {'Name' : 'validation:loss', 
                     'Regex': '.*Test loss: (.*).. Test ac.*'},
                    {'Name' : 'train:loss', 
                     'Regex': '.*Train loss: (.*).. Test lo.*'}]

estimator = PyTorch(entry_point='train-resnet.py',
                       source_dir='code',
                       train_instance_type='ml.c5.4xlarge',
                       train_instance_count=1,
                       hyperparameters=hyperparameters,
                       metric_definitions=metric_definitions,
                       role=sagemaker.get_execution_role(),
                       framework_version=FRAMEWORK_VERSION, 
                       debugger_hook_config=False,  # working around existing bug (TT 0305452782, Answer 93236)
                       py_version='py3',
                       base_job_name=JOB_PREFIX)

More interestingly, here is the part that is unique to creating the tuning job. You define a set of hyperparameter ranges that you want SageMaker to explore via training jobs. For our example, we focus on the number of [fine tuning](https://www.pyimagesearch.com/2019/06/03/fine-tuning-with-keras-and-deep-learning/) epochs, the dropout ratio, and the fine tuning learning rate. If we were manually try to find the best settings, it would take significant time and lots of trial and error. With SageMaker, we can hand off that job and find the optimal settings with ease.

In [7]:
hyperparameter_ranges = {'initial_epochs': IntegerParameter(5, 20),
                         'dropout': ContinuousParameter(0.2, 0.7)}

In [8]:
objective_metric_name = 'validation:acc'
objective_type = 'Maximize'

In [9]:
tuner = HyperparameterTuner(estimator,
                            objective_metric_name,
                            hyperparameter_ranges,
                            metric_definitions,
                            max_jobs=6,
                            max_parallel_jobs=2,
                            objective_type=objective_type,
                            base_tuning_job_name=JOB_PREFIX)

In [10]:
inputs = {'train':train_inputs, 'test': test_inputs, 'validation': val_inputs}

With the tuning job established, we can now launch the job and then check back to see what parameters are best suited to our image classifier.

In [11]:
tuner.fit(inputs)

In [18]:
status = boto3.client('sagemaker').describe_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=tuner.latest_tuning_job.job_name)['HyperParameterTuningJobStatus']
print('Tuning job: {}, Status: {}'.format(tuner.latest_tuning_job.job_name, status))

Tuning job: pyt-hpo-ic-200302-1913, Status: InProgress


## Analyze tuning job results
In the remainder of this notebook, we perform some analysis on the results of the tuning job. This helps us gain insight into which parameters were most influential. It can also help generate ideas for other tuning jobs that would help get even closer to our objective. The SageMaker console also provides a good way to track the job and review results.

In [19]:
tuning_job_name = tuner.latest_tuning_job.job_name

Here we can monitor the progress of the overall tuning job, finding out how many jobs are completed.

In [23]:
tuning_job_result = client.describe_hyper_parameter_tuning_job(HyperParameterTuningJobName=tuning_job_name)

status = tuning_job_result['HyperParameterTuningJobStatus']
if status != 'Completed':
    print('Reminder: the tuning job has not been completed.')
    
job_count = tuning_job_result['TrainingJobStatusCounters']['Completed']
print("%d training jobs have completed" % job_count)
    
is_minimize = (tuning_job_result['HyperParameterTuningJobConfig']['HyperParameterTuningJobObjective']['Type'] != 'Maximize')
objective_name = tuning_job_result['HyperParameterTuningJobConfig']['HyperParameterTuningJobObjective']['MetricName']

6 training jobs have completed


Here we take a look at the parameters that were used for the best model produced thus far.

In [24]:
from pprint import pprint
if tuning_job_result.get('BestTrainingJob',None):
    print("Best model found so far:")
    pprint(tuning_job_result['BestTrainingJob'])
else:
    print("No training jobs have reported results yet.")

Best model found so far:
{'CreationTime': datetime.datetime(2020, 3, 2, 19, 13, 53, tzinfo=tzlocal()),
 'FinalHyperParameterTuningJobObjectiveMetric': {'MetricName': 'validation:acc',
                                                 'Value': 1.0},
 'ObjectiveStatus': 'Succeeded',
 'TrainingEndTime': datetime.datetime(2020, 3, 2, 19, 17, 57, tzinfo=tzlocal()),
 'TrainingJobArn': 'arn:aws:sagemaker:us-east-2:355151823911:training-job/pyt-hpo-ic-200302-1913-001-df6295de',
 'TrainingJobName': 'pyt-hpo-ic-200302-1913-001-df6295de',
 'TrainingJobStatus': 'Completed',
 'TrainingStartTime': datetime.datetime(2020, 3, 2, 19, 15, 42, tzinfo=tzlocal()),
 'TunedHyperParameters': {'dropout': '0.6326482487858599',
                          'initial_epochs': '5'}}


Here we produce a grid view of all the jobs, their parameters, and their results. They are sorted in descending order of their final objective value (best metric at the top, worst at the bottom).

In [25]:
import pandas as pd

tuner_analytics = sagemaker.HyperparameterTuningJobAnalytics(tuning_job_name)

full_df = tuner_analytics.dataframe()

if len(full_df) > 0:
    df = full_df[full_df['FinalObjectiveValue'] > -float('inf')]
    if len(df) > 0:
        df = df.sort_values('FinalObjectiveValue', ascending=is_minimize)
        print("Number of training jobs with valid objective: %d" % len(df))
        print({"lowest":min(df['FinalObjectiveValue']),"highest": max(df['FinalObjectiveValue'])})
        pd.set_option('display.max_colwidth', -1)  # Don't truncate TrainingJobName        
    else:
        print("No training jobs have reported valid results yet.")
        
df

Number of training jobs with valid objective: 6
{'lowest': 0.9580000042915344, 'highest': 1.0}


Unnamed: 0,dropout,initial_epochs,TrainingJobName,TrainingJobStatus,FinalObjectiveValue,TrainingStartTime,TrainingEndTime,TrainingElapsedTimeSeconds
0,0.617379,6.0,pyt-hpo-ic-200302-1913-006-7dcdb541,Completed,1.0,2020-03-02 19:28:38+00:00,2020-03-02 19:31:00+00:00,142.0
1,0.428124,9.0,pyt-hpo-ic-200302-1913-005-1c916e18,Completed,1.0,2020-03-02 19:27:06+00:00,2020-03-02 19:29:52+00:00,166.0
3,0.534982,19.0,pyt-hpo-ic-200302-1913-003-5d5e9e99,Completed,1.0,2020-03-02 19:19:52+00:00,2020-03-02 19:24:41+00:00,289.0
4,0.649921,18.0,pyt-hpo-ic-200302-1913-002-04f3d7cd,Completed,1.0,2020-03-02 19:16:56+00:00,2020-03-02 19:21:06+00:00,250.0
5,0.632648,5.0,pyt-hpo-ic-200302-1913-001-df6295de,Completed,1.0,2020-03-02 19:15:42+00:00,2020-03-02 19:17:57+00:00,135.0
2,0.370858,9.0,pyt-hpo-ic-200302-1913-004-49de1f4b,Completed,0.958,2020-03-02 19:23:22+00:00,2020-03-02 19:26:20+00:00,178.0


With the following chart, we can see how well SageMaker's Bayesian optimization was able to explore the search space of possible hyperparameters over time. In our case, we only ran a few jobs with a few hyperparameter ranges. For a production tuning job with many more jobs and parameter ranges, this chart is more compelling.

In [27]:
!pip install bokeh # studio doesn't have this by default yet

Collecting bokeh
[?25l  Downloading https://files.pythonhosted.org/packages/de/70/fdd4b186d8570a737372487cc5547aac885a1270626e3ebf03db1808e4ed/bokeh-1.4.0.tar.gz (32.4MB)
[K     |################################| 32.4MB 2.8MB/s eta 0:00:01
Building wheels for collected packages: bokeh
  Building wheel for bokeh (setup.py) ... [?25ldone
[?25h  Created wheel for bokeh: filename=bokeh-1.4.0-cp36-none-any.whl size=23689200 sha256=e0df8d3764e548fd22aa91f1c68197d241b905c3fe9be72bfb33ec0fb86e25b3
  Stored in directory: /root/.cache/pip/wheels/fb/f8/47/09700d9a19cbcbf0b7a3130690b75c0d6ff80fbda0b1774c7c
Successfully built bokeh
Installing collected packages: bokeh
Successfully installed bokeh-1.4.0
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [28]:
import bokeh
import bokeh.io
bokeh.io.output_notebook()
from bokeh.plotting import figure, show
from bokeh.models import HoverTool

class HoverHelper():

    def __init__(self, tuning_analytics):
        self.tuner = tuning_analytics

    def hovertool(self):
        tooltips = [
            ("FinalObjectiveValue", "@FinalObjectiveValue"),
            ("TrainingJobName", "@TrainingJobName"),
        ]
        for k in self.tuner.tuning_ranges.keys():
            tooltips.append( (k, "@{%s}" % k) )

        ht = HoverTool(tooltips=tooltips)
        return ht

    def tools(self, standard_tools='pan,crosshair,wheel_zoom,zoom_in,zoom_out,undo,reset'):
        return [self.hovertool(), standard_tools]

hover = HoverHelper(tuner_analytics)

p = figure(plot_width=900, plot_height=400, tools=hover.tools(), x_axis_type='datetime')
p.circle(source=df, x='TrainingStartTime', y='FinalObjectiveValue')
show(p)

Lastly, we take a look at how significantly each of our hyperparameters impacted the final objective value.

In [29]:
ranges = tuner_analytics.tuning_ranges
figures = []
for hp_name, hp_range in ranges.items():
    categorical_args = {}
    if hp_range.get('Values'):
        # This is marked as categorical.  Check if all options are actually numbers.
        def is_num(x):
            try:
                float(x)
                return 1
            except:
                return 0           
        vals = hp_range['Values']
        if sum([is_num(x) for x in vals]) == len(vals):
            # Bokeh has issues plotting a "categorical" range that's actually numeric, so plot as numeric
            print("Hyperparameter %s is tuned as categorical, but all values are numeric" % hp_name)
        else:
            # Set up extra options for plotting categoricals.  A bit tricky when they're actually numbers.
            categorical_args['x_range'] = vals

    # Now plot it
    p = figure(plot_width=500, plot_height=500, 
               title="Objective vs %s" % hp_name,
               tools=hover.tools(),
               x_axis_label=hp_name, y_axis_label=objective_name,
               **categorical_args)
    p.circle(source=df, x=hp_name, y='FinalObjectiveValue')
    figures.append(p)
show(bokeh.layouts.Column(*figures))