# Automatic model tuning of TensorFlow models with Amazon SageMaker
This lab demonstrates the power of **Amazon SageMaker's automatic model tuning capability**, also known as hyperparameter optimization (HPO). Instead of a labor intensive process of trial and error that could take days or weeks, [automatic model tuning](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html) let's a data scientist ask SageMaker to find the optimal set of hyperparameters. 

The notebook shows how to provide a set of parameters to tune, ranges to consider, a metric to optimize on, some limits on the number of jobs to consider, and the compute capacity to leverage. A SageMaker tuning job then efficiently explores options using a Bayesian optimization. SageMaker creates a set of models and highlights which one is optimal given your constraints. The resulting model is ready for deployment behind an endpoint or for batch predictions.

## Setup
For this notebook, we simply get our security role and establish some parameters for use of S3.

In [None]:
import sagemaker
from sagemaker import get_execution_role
import boto3

client  = boto3.client(service_name='sagemaker')

role = get_execution_role()
print(role)
sess = sagemaker.Session()

bucket = sess.default_bucket() # or custom bucket name
prefix = 'DEMO-TF-image-classification-birds'
JOB_PREFIX = 'tf-hpo-ic'
TF_FRAMEWORK_VERSION = '2.0.0'

This notebook relies on execution of previous notebooks in this workshop. Specifically, it assumes the image data has been prepared and uploaded to s3. Here we just define exactly where the training jobs will pull their image data from.

In [None]:
train_inputs = 's3://{}/{}/train'.format(bucket, prefix)
val_inputs   = 's3://{}/{}/validation'.format(bucket, prefix)
test_inputs  = 's3://{}/{}/test'.format(bucket, prefix)
print('Training data:   {}\nValidation data: {}\nTest data:       {}'.format(train_inputs, val_inputs, test_inputs))

Here are the classes that have been uploaded to s3 for training.

In [None]:
!aws s3 ls $train_inputs/

## Create hyperparameter tuning job
To use Amazon SageMaker's automatic model tuning capability, you create a tuning job, which in turn will launch a set of SageMaker training jobs. As when creating a training job directly, you first establish a set of hyperparameters, some metric definitions, and then a TensorFlow estimator which will be fed a Python training script.

In [None]:
from sagemaker.tensorflow import TensorFlow
from sagemaker.tuner import IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner

In [None]:
hyperparameters = {'initial_epochs': 5, 'tuning_epochs': 35, 
                   'data_dir': '/opt/ml/input/data',
                   'dropout': 0.5, 'num_fully_connected_layers': 1}

metric_definitions=[{'Name' : 'validation:acc', 
                     'Regex': '.*step.* - val_accuracy: (.*$)'},
                    {'Name' : 'validation:loss', 
                     'Regex': '- val_loss: (.*?) '},
                    {'Name' : 'acc', 
                     'Regex': '.*step.* - acc: (.*?) '},
                    {'Name' : 'loss', 
                     'Regex': '.*step.* - loss: (.*?) '}]

estimator = TensorFlow(entry_point='train-mobilenet.py',
                    source_dir='code',
                    role=role,
                    framework_version=TF_FRAMEWORK_VERSION,
                    train_instance_count=1,
                    train_instance_type='ml.p3.2xlarge',
                    hyperparameters=hyperparameters,
                    metric_definitions=metric_definitions,
                    py_version='py3',
                    base_job_name=JOB_PREFIX)

More interestingly, here is the part that is unique to creating the tuning job. You define a set of hyperparameter ranges that you want SageMaker to explore via training jobs. For our example, we focus on the number of [fine tuning](https://www.pyimagesearch.com/2019/06/03/fine-tuning-with-keras-and-deep-learning/) epochs, the dropout ratio, and the fine tuning learning rate. If we were manually try to find the best settings, it would take significant time and lots of trial and error. With SageMaker, we can hand off that job and find the optimal settings with ease.

In [None]:
hyperparameter_ranges = {'fine_tuning_epochs': IntegerParameter(35, 55),
                         'dropout': ContinuousParameter(0.2, 0.7),
                         'fine_tuning_lr': ContinuousParameter(0.00001, 0.001)}

In [None]:
objective_metric_name = 'validation:acc'
objective_type = 'Maximize'

In [None]:
tuner = HyperparameterTuner(estimator,
                            objective_metric_name,
                            hyperparameter_ranges,
                            metric_definitions,
                            max_jobs=6,
                            max_parallel_jobs=2,
                            objective_type=objective_type,
                            base_tuning_job_name=JOB_PREFIX)

In [None]:
inputs = {'train':train_inputs, 'test': test_inputs, 'validation': val_inputs}

With the tuning job established, we can now launch the job and then check back to see what parameters are best suited to our image classifier. The tuning job duration will depend on the number of bird species you used in training your classifier, as well as your specification of maximum number of tuning jobs and maximum jobs in parallel, and of course the ML instance type used for the job. If you use the default values in these notebooks, the entire job will take about 45 minutes. 

Note that you do not need to wait for the whole tuning job to complete, as you can examine results of the individual training jobs that it launches. You can also use the SageMaker console to interactively watch the results of each training job launched as part of this tuning job. The console will let you easily identify the best model created thus far.

In [None]:
tuner.fit(inputs)

In [None]:
status = boto3.client('sagemaker').describe_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=tuner.latest_tuning_job.job_name)['HyperParameterTuningJobStatus']
print('Tuning job: {}, Status: {}'.format(tuner.latest_tuning_job.job_name, status))

## Analyze tuning job results
In the remainder of this notebook, we perform some analysis on the results of the tuning job. This helps us gain insight into which parameters were most influential. It can also help generate ideas for other tuning jobs that would help get even closer to our objective. The SageMaker console also provides a good way to track the job and review results.

In [None]:
tuning_job_name = tuner.latest_tuning_job.job_name

Here we can monitor the progress of the overall tuning job, finding out how many jobs are completed.

In [None]:
tuning_job_result = client.describe_hyper_parameter_tuning_job(HyperParameterTuningJobName=tuning_job_name)

status = tuning_job_result['HyperParameterTuningJobStatus']
if status != 'Completed':
    print('Reminder: the tuning job has not been completed.')
    
job_count = tuning_job_result['TrainingJobStatusCounters']['Completed']
print("%d training jobs have completed" % job_count)
    
is_minimize = (tuning_job_result['HyperParameterTuningJobConfig']['HyperParameterTuningJobObjective']['Type'] != 'Maximize')
objective_name = tuning_job_result['HyperParameterTuningJobConfig']['HyperParameterTuningJobObjective']['MetricName']

Here we take a look at the parameters that were used for the best model produced thus far.

In [None]:
from pprint import pprint
if tuning_job_result.get('BestTrainingJob',None):
    print("Best model found so far:")
    pprint(tuning_job_result['BestTrainingJob'])
else:
    print("No training jobs have reported results yet.")

Here we produce a grid view of all the jobs, their parameters, and their results. They are sorted in descending order of their final objective value (best metric at the top, worst at the bottom).

In [None]:
import pandas as pd

tuner_analytics = sagemaker.HyperparameterTuningJobAnalytics(tuning_job_name)

full_df = tuner_analytics.dataframe()

if len(full_df) > 0:
    df = full_df[full_df['FinalObjectiveValue'] > -float('inf')]
    if len(df) > 0:
        df = df.sort_values('FinalObjectiveValue', ascending=is_minimize)
        print("Number of training jobs with valid objective: %d" % len(df))
        print({"lowest":min(df['FinalObjectiveValue']),"highest": max(df['FinalObjectiveValue'])})
        pd.set_option('display.max_colwidth', -1)  # Don't truncate TrainingJobName        
    else:
        print("No training jobs have reported valid results yet.")
        
df

With the following chart, we can see how well SageMaker's Bayesian optimization was able to explore the search space of possible hyperparameters over time. In our case, we only ran a few jobs with a few hyperparameter ranges. For a production tuning job with many more jobs and parameter ranges, this chart is more compelling.

In [None]:
import bokeh
import bokeh.io
bokeh.io.output_notebook()
from bokeh.plotting import figure, show
from bokeh.models import HoverTool

class HoverHelper():

    def __init__(self, tuning_analytics):
        self.tuner = tuning_analytics

    def hovertool(self):
        tooltips = [
            ("FinalObjectiveValue", "@FinalObjectiveValue"),
            ("TrainingJobName", "@TrainingJobName"),
        ]
        for k in self.tuner.tuning_ranges.keys():
            tooltips.append( (k, "@{%s}" % k) )

        ht = HoverTool(tooltips=tooltips)
        return ht

    def tools(self, standard_tools='pan,crosshair,wheel_zoom,zoom_in,zoom_out,undo,reset'):
        return [self.hovertool(), standard_tools]

hover = HoverHelper(tuner_analytics)

p = figure(plot_width=900, plot_height=400, tools=hover.tools(), x_axis_type='datetime')
p.circle(source=df, x='TrainingStartTime', y='FinalObjectiveValue')
show(p)

Lastly, we take a look at how significantly each of our hyperparameters impacted the final objective value.

In [None]:
ranges = tuner_analytics.tuning_ranges
figures = []
for hp_name, hp_range in ranges.items():
    categorical_args = {}
    if hp_range.get('Values'):
        # This is marked as categorical.  Check if all options are actually numbers.
        def is_num(x):
            try:
                float(x)
                return 1
            except:
                return 0           
        vals = hp_range['Values']
        if sum([is_num(x) for x in vals]) == len(vals):
            # Bokeh has issues plotting a "categorical" range that's actually numeric, so plot as numeric
            print("Hyperparameter %s is tuned as categorical, but all values are numeric" % hp_name)
        else:
            # Set up extra options for plotting categoricals.  A bit tricky when they're actually numbers.
            categorical_args['x_range'] = vals

    # Now plot it
    p = figure(plot_width=500, plot_height=500, 
               title="Objective vs %s" % hp_name,
               tools=hover.tools(),
               x_axis_label=hp_name, y_axis_label=objective_name,
               **categorical_args)
    p.circle(source=df, x=hp_name, y='FinalObjectiveValue')
    figures.append(p)
show(bokeh.layouts.Column(*figures))