# SageMaker Experiments LAB - Handwritten Digits Classification

This demo is prepared with `Python 3 (PyTorch 1.6 ...)` kernel.


Amazon SageMaker Experiments is a capability of Amazon SageMaker that lets you organize, track, compare, and evaluate your machine learning experiments.

Machine learning is an iterative process. You need to experiment with multiple combinations of data, algorithm and parameters, all the while observing the impact of incremental changes on model accuracy. Over time this iterative experimentation can result in thousands of model training runs and model versions. This makes it hard to track the best performing models and their input configurations. It’s also difficult to compare active experiments with past experiments to identify opportunities for further incremental improvements.

SageMaker Experiments automatically tracks the inputs, parameters, configurations, and results of your iterations as trials. You can assign, group, and organize these trials into experiments. SageMaker Experiments is integrated with Amazon SageMaker Studio providing a visual interface to browse your active and past experiments, compare trials on key performance metrics, and identify the best performing models.

You can track artifacts for experiments, including data sets, algorithms, hyper-parameters, and metrics. Experiments executed on SageMaker such as SageMaker Autopilot jobs and training jobs will be automatically tracked. You can also track artifacts for additional steps within an ML workflow that come before/after model training e.g. data pre-processing or post-training model evaluation.

The APIs also let you search and browse your current and past experiments, compare experiments, and identify best performing models.

## Step 0 - Environment Preparation

### Install/upgrade packages

In [None]:
%%time

import sys

!{sys.executable} -m pip install --upgrade pip
!{sys.executable} -m pip install awscli==1.19.14
!{sys.executable} -m pip install sagemaker-experiments==0.1.28
!{sys.executable} -m pip install sagemaker==2.25.1

# remove tqdm, since it has compabible problem with PyTorch
!{sys.executable} -m pip uninstall -y tqdm

### Import packages/classes

In [None]:
import time

import boto3
import numpy as np
import pandas as pd
%config InlineBackend.figure_format = 'retina'
from matplotlib import pyplot as plt
from torchvision import datasets, transforms

import sagemaker as sm
from sagemaker import get_execution_role
from sagemaker.session import Session
from sagemaker.analytics import ExperimentAnalytics

import sagemaker as sme
from smexperiments.experiment import Experiment
from smexperiments.trial import Trial
from smexperiments.trial_component import TrialComponent
from smexperiments.tracker import Tracker

### Role and S3 bucket, etc.

In [None]:
boto_sess = boto3.Session()
boto_sm = boto_sess.client('sagemaker')
role = get_execution_role()

# S3 bucket and prefix
sess = sm.session.Session(boto_sess, boto_sm)
bucket = sess.default_bucket()
prefix = 'sagemaker-experiments/DEMO-mnist-classification'

Now we will demonstrate SageMaker Experiments capabilities through an MNIST handwritten digits classification example. The experiment will be organized as follow:

1. Download and prepare the MNIST dataset.
2. Train a Convolutional Neural Network (CNN) Model. Tune the hyper parameter that configures the number of hidden channels in the model. Track the parameter configurations and resulting model accuracy using SageMaker Experiments Python SDK.
3. Finally use the search and analytics capabilities of Python SDK to search, compare and evaluate the performance of all model versions generated from model tuning in Step 2.
4. We will also see an example of tracing the complete linage of a model version i.e. the collection of all the data pre-processing and training configurations and inputs that went into creating that model version.

## Step 1 - Download and prepare dataset

### Dataset
MNIST is a widely used dataset for handwritten digit classification. It consists of 70,000 labeled 28x28 pixel grayscale images of hand-written digits. The dataset is split into 60,000 training images and 10,000 test images. There are 10 classes (one for each of the 10 digits).

We download the MNIST hand written digits dataset, and then apply transformation on each of the image.

In [None]:
# https://github.com/pytorch/vision/issues/1938
from six.moves import urllib
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)

In [None]:
%%time

# download the dataset
# this will not only download data to ./mnist folder, but also load and transform (normalize) them
train_set = datasets.MNIST('mnist', train=True, transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))]), 
    download=True)
                           
test_set = datasets.MNIST('mnist', train=False, transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))]),
    download=False)

In [None]:
plt.imshow(train_set.data[0].numpy())

After transforming the images in the dataset, we upload it to s3.

In [None]:
inputs = sm.Session().upload_data(path='mnist', bucket=bucket, key_prefix=prefix)
print('input spec: {}'.format(inputs))

Now lets track the parameters from the data pre-processing step.

In [None]:
with Tracker.create(display_name="Preprocessing", sagemaker_boto_client=boto_sm) as tracker:
    tracker.log_parameters({
        "normalization_mean": 0.1307,
        "normalization_std": 0.3081,
    })
    # we can log the s3 uri to the dataset we just uploaded
    tracker.log_input(name="mnist-dataset", media_type="s3/uri", value=inputs)
    
preprocessing_trial_component = tracker.trial_component

## Step 2 - Train model

Create an experiment to track all the model training iterations. Experiments are a great way to organize your data science work. You can create experiments to organize all your model development work for : [1] a business use case you are addressing (e.g. create experiment named “customer churn prediction”), or [2] a data science team that owns the experiment (e.g. create experiment named “marketing analytics experiment”), or [3] a specific data science and ML project. Think of it as a “folder” for organizing your “files”.

### Create an Experiment

In [None]:
%%time

experiment_name = "sagemaker-workshop-experiments-mnist-classification"
experiment_description = "SageMaker Workshop - Experiments - Classification of mnist hand-written digits"

try:
    print('Experiment already exists, loading it...')
    mnist_experiment = Experiment.load(
        experiment_name=experiment_name, sagemaker_boto_client=boto_sm)
    print('Deleting it...')
    mnist_experiment.delete_all('--force')
except Exception as e:
    pass

In [None]:
mnist_experiment = Experiment.create(
    experiment_name=experiment_name, 
    description=experiment_description,
    sagemaker_boto_client=boto_sm)
print(mnist_experiment)

### Track Experiment
Now create a Trial for each training run to track the it's inputs, parameters, and metrics.

While training the CNN model on SageMaker, we will experiment with several values for the number of hidden channel in the model. We will create a Trial to track each training job run. We will also add a TrialComponent from the tracker we created before, and add to the Trial. This will enrich the Trial with the parameters we captured from the data pre-processing stage.

If you want to run the following training jobs asynchronously, you may need to increase your resource limit. Otherwise, you can run them sequentially.

Note the execution of the following code takes a while.

In [None]:
%%time

from sagemaker.pytorch import PyTorch

instance_type='ml.c5.xlarge'

hidden_channel_trial_name_map = {}


for i, num_hidden_channel in enumerate([3, 10, 32]):
    # create trial
    trial_name = f"cnn-training-job-{num_hidden_channel}-hidden-channels-{int(time.time())}"
    cnn_trial = Trial.create(
        trial_name=trial_name, 
        experiment_name=mnist_experiment.experiment_name,
        sagemaker_boto_client=boto_sm,
    )
    hidden_channel_trial_name_map[num_hidden_channel] = trial_name
    
    # associate the proprocessing trial component with the current trial
    cnn_trial.add_trial_component(preprocessing_trial_component)
    
    # all input configurations, parameters, and metrics specified in estimator 
    # definition are automatically tracked
    estimator = PyTorch(
        entry_point='mnist.py',
        source_dir='code', # directory of your training script
        role=role,
        framework_version='1.6.0',
        py_version='py3',
        instance_type=instance_type,
        instance_count=1,
        use_spot_instances=True,
        max_run=300,
        max_wait=300,
        hyperparameters={
            'epochs': 2,
            'backend': 'gloo',
            'hidden_channels': num_hidden_channel,
            'dropout': 0.2,
            'kernel_size': 5,
            'optimizer': 'sgd'
        },
        metric_definitions=[
            {'Name':'train:loss', 'Regex':'Train Loss: (.*?);'},
            {'Name':'test:loss', 'Regex':'Test Average loss: (.*?),'},
            {'Name':'test:accuracy', 'Regex':'Test Accuracy: (.*?)%;'}
        ],
        enable_sagemaker_metrics=True,
    )
    
    cnn_training_job_name = "cnn-training-job-{}".format(int(time.time()))
    
    # Now associate the estimator with the Experiment and Trial
    estimator.fit(
        inputs={'training': inputs}, 
        job_name=cnn_training_job_name,
        experiment_config={
            "TrialName": cnn_trial.trial_name,
            "TrialComponentDisplayName": "Training",
        },
        wait=True,
    )
    
    # give it a while before dispatching the next training job
    time.sleep(2)

## Step 3 - Compare the model training runs for an experiment

Now we will use the analytics capabilities of Python SDK to query and compare the training runs for identifying the best model produced by our experiment. You can retrieve trial components by using a search expression.

### Some Simple Analyses

In [None]:
search_expression = {
    "Filters":[
        {
            "Name": "DisplayName",
            "Operator": "Equals",
            "Value": "Training",
        }
    ],
}

In [None]:
trial_component_analytics = ExperimentAnalytics(
    sagemaker_session=Session(boto_sess, boto_sm), 
    experiment_name=mnist_experiment.experiment_name,
    search_expression=search_expression,
    sort_by="metrics.test:accuracy.max",
    sort_order="Descending",
    metric_names=['test:accuracy'],
    parameter_names=['hidden_channels', 'epochs', 'dropout', 'optimizer']
)

In [None]:
trial_component_analytics.dataframe()

We will take the best (as sorted) 

In [None]:
#Pulling best based on sort in the analytics/dataframe so first is best....
best_trial_component_name = trial_component_analytics.dataframe().iloc[0]['TrialComponentName']
best_trial_component = TrialComponent.load(best_trial_component_name)

To isolate and measure the impact of change in hidden channels on model accuracy, we vary the number of hidden channel and fix the value for other hyperparameters.

## Step 4 - Optional sections

Next let's look at an example of tracing the lineage of a model by accessing the data tracked by SageMaker Experiments for `cnn-training-job-2-hidden-channels` trial

### Lineage

In [None]:
lineage_table = ExperimentAnalytics(
    sagemaker_session=Session(boto_sess, boto_sm), 
    search_expression={
        "Filters":[{
            "Name": "Parents.TrialName",
            "Operator": "Equals",
            "Value": hidden_channel_trial_name_map[3]
        }]
    },
    sort_by="CreationTime",
    sort_order="Ascending",
)

In [None]:
lineage_table.dataframe()

### Deploy endpoint for the best training-job / trial component

Now we'll create an endpoint for it.

In [None]:
%%time

from sagemaker.pytorch import PyTorchModel

model_data = best_trial_component.output_artifacts['SageMaker.ModelArtifact'].value
env = {'hidden_channels': str(int(best_trial_component.parameters['hidden_channels'])), 
       'dropout': str(best_trial_component.parameters['dropout']), 
       'kernel_size': str(int(best_trial_component.parameters['kernel_size']))}
model = PyTorchModel(
    model_data, 
    role, 
    './code/mnist.py', 
    env=env, 
    sagemaker_session=sm.Session(sagemaker_client=boto_sm),
    framework_version='1.6.0',
    py_version='py3',
    name=best_trial_component.trial_component_name,
)

predictor = model.deploy(
    instance_type='ml.m5.xlarge',
    initial_instance_count=1)

## Cleanup

Once we're doing don't forget to clean up the endpoint to prevent unnecessary billing.

> Trial components can exist independent of trials and experiments. You might want keep them if you plan on further exploration. If so, comment out tc.delete()

In [None]:
def cleanup(experiment):
    for trial_summary in experiment.list_trials():
        trial = Trial.load(sagemaker_boto_client=boto_sm, trial_name=trial_summary.trial_name)
        for trial_component_summary in trial.list_trial_components():
            tc = TrialComponent.load(
                sagemaker_boto_client=boto_sm,
                trial_component_name=trial_component_summary.trial_component_name)
            trial.remove_trial_component(tc)
            try:
                # comment out to keep trial components
                tc.delete()
            except:
                # tc is associated with another trial
                continue
            # to prevent throttling
            time.sleep(.5)
        trial.delete()
    experiment.delete()

In [None]:
cleanup(mnist_experiment)
predictor.delete_endpoint()