# Run a SageMaker Experiment

**Before starting the lab please ensure the notegbook is deployed in <span style="color:red">us-west-2</span> and the Kernel is selected as <span style="color:red">MXNet 1.6 Python 3.6 CPU Optimized</span>**


This notebook will demonstrate how to organize, track, compare, and evaluate your machine learning (ML) model training using SageMaker Experiments. The Experiment will demonstrate these capabilities using an MNIST handwritten digits classification example. 

## Install the Prerquisites

Before we begin we need to ensure sagemaker experiments and pytorch are installed at the correct versions that support this demo.  

In [None]:
!pip install --upgrade pip botocore sagemaker --no-cache-dir --quiet
!pip install sagemaker-experiments torch torchvision pillow --no-cache-dir --quiet

## Initalize the Libraries and Download the Data

Next, we initalize all the required libraries, download the MNIST dataset and prepare it for the experiment.

In [None]:
import time
import torchvision
import boto3
import numpy as np
import pandas as pd
from IPython.display import set_matplotlib_formats
from matplotlib import pyplot as plt
from torchvision import datasets, transforms

import sagemaker
from sagemaker import get_execution_role
from sagemaker.session import Session
from sagemaker.analytics import ExperimentAnalytics

from smexperiments.experiment import Experiment
from smexperiments.trial import Trial
from smexperiments.trial_component import TrialComponent
from smexperiments.tracker import Tracker

set_matplotlib_formats("retina")

#Set up the S3 bucket

sm_sess = sagemaker.Session()
sess = sm_sess.boto_session
sm = sm_sess.sagemaker_client
role = get_execution_role()
bucket = sm_sess.default_bucket()
prefix = "DEMO-mnist"

#Download the dataset to the ./mnist folder, and load and prepare the training and valiadation datasets

datasets.MNIST.urls = [
    "https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/train-images-idx3-ubyte.gz",
    "https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/train-labels-idx1-ubyte.gz",
    "https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/t10k-images-idx3-ubyte.gz",
    "https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/t10k-labels-idx1-ubyte.gz",
]
train_set = datasets.MNIST(
    "mnist",
    train=True,
    transform=transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]
    ),
    download=True,
)
test_set = datasets.MNIST(
    "mnist",
    train=False,
    transform=transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]
    ),
    download=False,
)

inputs = sagemaker.Session().upload_data(path="mnist", bucket=bucket, key_prefix=prefix)

## Run the Experiment

Finally we will create an experiment, thatr will pre-process the data and run the training. Experiments are a great way to organize your data science work, and quickly determine the performance of the models created in each training run.

The following set of code defines the preprocessing, sets the S3 bucket to use for the jobs and defines the experiment. In this scenario we create five trials and run the processing and training job for each trial in paralel. The experiment will take about 5-7 minutes to complete.

In [None]:
#Set up the preprocessing
from sagemaker.pytorch import PyTorch, PyTorchModel

with Tracker.create(display_name="Preprocessing", sagemaker_boto_client=sm) as tracker:
    tracker.log_parameters(
        {
            "normalization_mean": 0.1307,
            "normalization_std": 0.3081,
        }
    )
    # We can log the S3 uri to the dataset we just uploaded
    tracker.log_input(name="mnist-dataset", media_type="s3/uri", value=inputs)

hidden_channel_trial_name_map = {}
preprocessing_trial_component = tracker.trial_component
    
#Define the exeriment
    
mnist_experiment = Experiment.create(
    experiment_name=f"demo-mnist-experiment-{int(time.time())}",
    description="Classification of mnist hand-written digits",
    sagemaker_boto_client=sm,
)

#create a trainig job with a different number of hidden channels (2, 5, 10, 20 and 32) 

for i, num_hidden_channel in enumerate([2, 5, 10, 20, 32]):
    
# Create trial for each instance
    trial_name = f"cnn-training-job-{num_hidden_channel}-hidden-channels-{int(time.time())}"
    cnn_trial = Trial.create(
        trial_name=trial_name,
        experiment_name=mnist_experiment.experiment_name,
        sagemaker_boto_client=sm,
    )
    hidden_channel_trial_name_map[num_hidden_channel] = trial_name

# Associate the proprocessing trial component with the current experiment
    cnn_trial.add_trial_component(preprocessing_trial_component)

# Define the estimator for the training job
    estimator = PyTorch(
        py_version="py3",
        entry_point="./mnist.py",
        role=role,
        sagemaker_session=sagemaker.Session(sagemaker_client=sm),
        framework_version="1.1.0",
        instance_count=1,
        max_jobs=5,
        max_parallel_jobs=5,
        instance_type="ml.c5.xlarge",
        hyperparameters={
            "epochs": 2,
            "backend": "gloo",
            "hidden_channels": num_hidden_channel,
            "dropout": 0.2,
            "kernel_size": 5,
            "optimizer": "sgd",
        },
        metric_definitions=[
            {"Name": "train:loss", "Regex": "Train Loss: (.*?);"},
            {"Name": "test:loss", "Regex": "Test Average loss: (.*?),"},
            {"Name": "test:accuracy", "Regex": "Test Accuracy: (.*?)%;"},
        ],
        enable_sagemaker_metrics=True,
    )

    cnn_training_job_name = "cnn-training-job-{}".format(int(time.time()))

# Associate the estimator with the Experiment and Trial
    estimator.fit(
        inputs={"training": inputs},
        job_name=cnn_training_job_name,
        experiment_config={
            "TrialName": cnn_trial.trial_name,
            "TrialComponentDisplayName": "Training",
        },
        wait=False,
    )

## Explore Results in the Experiments and Trials Section

1. Select the SageMaker resources
2. Choose Experiments and trials
3. Right click the *demo-mnist-experiment-date-time* experiment we just ran and select "Open in trial components list"
4. Explore the metrics and other results of the 5 trainings that are part of this experiment

This way we can easily can select the best performing experiment and deploy that model into a production endpoint.