## 1. Introduction 

[Amazon SageMaker](https://aws.amazon.com/sagemaker/) helps data scientists and developers to prepare, build, train, and deploy high-quality machine learning (ML) models quickly by bringing together a broad set of capabilities purpose-built for ML.

## 2. MATLAB on Amazon SageMaker

With Amazon SageMaker, users can package their own algorithms that can then be trained and deployed in the SageMaker environment. This notebook will guide you through an example that shows you how to build a MATLAB Docker container for SageMaker and use it for launching a [Hyperparameter Tuning Job](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html).

This notebook shows how you can:

1. Build a Docker container with MATLAB by using the [official docker hub MATLAB image](https://hub.docker.com/r/mathworks/matlab-deep-learning).  
2. Publish the docker container to [Amazon ECR](https://aws.amazon.com/ecr/), from where the SageMaker can use it to run tuning jobs.
3. Create an [Estimator](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.EstimatorBase) and [HyperparameterTuner](https://sagemaker.readthedocs.io/en/stable/api/training/tuner.html) to launch a tuning job. 
4. Get the best training job from the set of hyperparameters job. 

## 3. Prerequisites

### 3.1 Roles, Permissions and Docker Service

To get started, we'll import the Python libraries we need, and set up the environment with a few prerequisites for permissions and configurations.

In [None]:
from sagemaker import get_execution_role
import pandas as pd
import sagemaker
import boto3
import os

role = get_execution_role()
print(role)

Beacuse we would be pulling the [matlab-deep-learning docker image](https://hub.docker.com/r/mathworks/matlab-deep-learning) (which has a compressed size of 7.89 GB), we need to change the default docker location to the EBS volume that we mounted. 

We stop the docker service, move the default docker directory from `/var/lib/docker` to `/home/ec2-user/Sagemaker/docker`, and then start the docker service again.

In [None]:
!mkdir -p /home/ec2-user/SageMaker/docker/
!sudo service docker stop
!sudo mv /var/lib/docker/ /home/ec2-user/SageMaker/docker/
!sudo ln -s /home/ec2-user/SageMaker/docker/ /var/lib/docker
!sudo service docker start

### 3.2 License Manager for MATLAB

Follow the steps in the GitHub repo to launch tha License Manager for MATLAB on AWS - https://github.com/mathworks-ref-arch/license-manager-for-matlab-on-aws. 

Once you have the License Manager up and running - note down the private IP Address of the License Manager. Our docker instance that's running MATLAB, would have to talk to our License Manager for licensing via the `MLM_LICENSE_FILE` flag.

### 3.3 Dockerfile & dependencies

In [None]:
!mkdir -p matlab-docker

Create a dockerfile which pulls MATLAB's image from https://hub.docker.com/r/mathworks/matlab-deep-learning and adds a new CMD. The CMD command specifies the instruction that is to be executed when a Docker container starts. 


The docker container runs the `train` script located in `/opt/ml/input/data/train` when the MATLAB container starts, and set the `MLM_LICENSE_FILE` to use the License Manager created above.  

In [None]:
%%writefile matlab-docker/Dockerfile
FROM mathworks/matlab-deep-learning:r2021b
USER root
ENV MLM_LICENSE_FILE="27000@123.12.12.123"
ENTRYPOINT ["matlab", "-batch", "cd /opt/ml/input/data/code; tuning; exit"]

In [None]:
!echo ==== Generated Dockerfile ====
!cat matlab-docker/Dockerfile

## 4. MATLAB docker image on ECR  

A Docker image with MATLAB needs to be available for SageMaker to use. 

The following steps:
* Builds a MATLAB Deep Learning Container from Dockerhub.
* Creates an ECR Repo, and pushes the container to it.

These steps can be skipped if you already have a Docker Container with MATLAB installed in an [Amazon ECR repository](https://console.aws.amazon.com/ecr/home?region=us-east-1#).


In [None]:
account_id = boto3.client('sts').get_caller_identity().get('Account')
region = boto3.Session().region_name

# Your ECR repo name
ecr_repository = 'sagemaker-demo-ecr'
tag = ':tuning'
processing_repository_uri = '{}.dkr.ecr.{}.amazonaws.com/{}'.format(account_id, region, ecr_repository + tag)

print("ECR Repository Name: ", ecr_repository)
print("ECR Repository URI:", processing_repository_uri)

### 4.1 Create docker image from Dockerfile

In [None]:
# Create ECR repository and push docker image
!docker build -t $ecr_repository$tag matlab-docker/

In [None]:
!docker tag {ecr_repository + tag} $processing_repository_uri
!docker image ls

### 4.2 Push MATLAB image to ECR

In [None]:
# Creates the ECR Repository if it doesn't exist
!aws ecr describe-repositories --repository-names ${ecr_repository} || aws ecr create-repository --repository-name ${ecr_repository}

# Authorize Docker to publish to ECR
!aws ecr get-login-password --region {region} | docker login --username AWS --password-stdin {account_id}.dkr.ecr.{region}.amazonaws.com

In [None]:
# push MATLAB image to ECR
!docker push $processing_repository_uri

## 5. MALTAB Code

Overview of the script - 

- read hyperparameters from the `hyperparameters.json` file.
- load the digit sample dataset as an [image datastore](https://www.mathworks.com/help/matlab/ref/matlab.io.datastore.imagedatastore.html).
- split the dataset into training & testing set.
- define the convolutional neural network architecture.
- specify training options.
- train the network.
- save the trained model in the S3 models URI.
- classify validation images and compute accuracy.


In [None]:
%%writefile tuning.m

rng(10); % For reproducibility

tic

disp('starting the Hyperparameter optimisation Example (KNN)')
disp(pwd)

prefixPath = '/opt/ml/'
outputPath = append(prefixPath, 'output/')
modelPath = append(prefixPath, 'model/')
hyperparamPath = append(prefixPath, 'input/config/hyperparameters.json')

if isfile('/opt/ml/input/config/hyperparameters.json')
    type '/opt/ml/input/config/hyperparameters.json'
else
    disp('no hyperpatamers file')
end

% Reading hyperpatemrs for this training job -
fid = fopen(hyperparamPath);
raw = fread(fid,inf);
str = char(raw');
fclose(fid);
val = jsondecode(str);

if isfield(val, 'miniBatchSize')
    miniBatchSize = val.miniBatchSize
else
    miniBatchSize = 64
end
if isfield(val, 'L2Regularization')
    regularization = val.L2Regularization
else
    regularization = 1.0000e-04
end


digitDatasetPath = fullfile(matlabroot,'toolbox','nnet','nndemos', ...
    'nndatasets','DigitDataset');
imds = imageDatastore(digitDatasetPath, ...
    'IncludeSubfolders',true,'LabelSource','foldernames');
disp('dataset loaded in memory')

[imdsTrain,imdsValidation] = splitEachLabel(imds,750,'randomize');

layers = [
    imageInputLayer([28 28 1])

    convolution2dLayer(3,8,'Padding','same')
    batchNormalizationLayer
    reluLayer

    maxPooling2dLayer(2,'Stride',2)

    convolution2dLayer(3,16,'Padding','same')
    batchNormalizationLayer
    reluLayer

    maxPooling2dLayer(2,'Stride',2)

    convolution2dLayer(3,32,'Padding','same')
    batchNormalizationLayer
    reluLayer

    fullyConnectedLayer(10)
    softmaxLayer
    classificationLayer];

options = trainingOptions('sgdm', ...
    'InitialLearnRate',0.01, ...
    'MaxEpochs',4, ...
    'Shuffle','every-epoch', ...
    'ValidationData',imdsValidation, ...
    'ValidationFrequency',30, ...
    'Verbose',false, ...
    'MiniBatchSize',miniBatchSize, ...
    'L2Regularization',regularization, ...
    'Plots','training-progress');

disp('Training started')

net = trainNetwork(imdsTrain,layers,options);
save(append(modelPath, 'model.mat'),'net')
disp('Training finsished')

YPred = classify(net,imdsValidation);
YValidation = imdsValidation.Labels;
accuracy = 100*(sum(YPred == YValidation)/numel(YValidation));

toc

disp("accuracy: " + string(accuracy))

try
    fileID = fopen(append(modelPath, 'results.txt'),'w');
    disp(fileID)
    if fileID==-1
        disp('cannot open file properly')
    else
        fprintf(fileID,'accuracy - %g\n', accuracy);
        fclose(fileID);
    end
catch
    disp('error saving file to output')
end


Upload the `hyper.m` script to the S3 bucket for the MATLAB deep learning container to access it. 

In [None]:
# You can change sagemaker-demo-bucket to a different S3 bucket as well
bucket = 'sagemaker-demo-bucket'
prefix = 'digit-dataset'

In [None]:
s3 = boto3.resource('s3')
s3.meta.client.upload_file('tuning.m', bucket, f'{prefix}/code/tuning.m')

Create [SageMaker Estimator](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.EstimatorBase) with the following arguments - 

- Set `subnets` and `security_group_ids`, same as the network license manager in section [License Manager for MATLAB](#3.2-License-Manager-for-MATLAB).

- Set `use_spot_instances` argument, to use [Spot Training in Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/model-managed-spot-training.html) to optimize the cost of training models up to 90% over on-demand instances. 
- Set `max_wait` flag time in seconds for training (default: 24 * 60 * 60). After this amount of time Amazon SageMaker terminates the job regardless of its current status.
- Set `max_run` flag time in seconds waiting for spot training job. After this amount of time Amazon SageMaker will stop waiting for managed spot training job to complete.

## 6. SageMaker Estimator & HyperparameterTuner 

### 6.1 SageMaker Estimator

In [None]:
estimator = sagemaker.estimator.Estimator(
    image_uri=processing_repository_uri,
    role=role,
    instance_count=1,
    instance_type="ml.m4.xlarge",
    volume_size=30,
    subnets=["subnet-abcd123"], 
    security_group_ids=["sg-abcdefg123"],
    output_path="s3://{}/{}/output".format(bucket, prefix),
    use_spot_instances=True,
    max_wait=60*50,
    max_run=60*30,
    hyperparameters={"miniBatchSize": 64, "L2Regularization":0.0001},
)  # Setting constant hyperparameter


In [None]:
# Train the model using the default hyperparameters defined above
estimator.fit({"code": "s3://{}/{}/code".format(bucket, prefix)}, wait=True)

### 6.2 Hyperparamter tuning

In [None]:
# Import libraries for sagemaker tuning 
from sagemaker.tuner import (
    IntegerParameter,
    CategoricalParameter,
    ContinuousParameter,
    HyperparameterTuner,
)

In [None]:
# Define hyperparameter ranges for your tuning job
hyperparameter_ranges = {
    "miniBatchSize": IntegerParameter(48, 64),
    "L2Regularization": ContinuousParameter(0.0001, 0.001),
}

In [None]:
# Create custom objective metric for your tuning job
objective_metric_name = "accuracy"
metric_definitions = [{"Name": "accuracy", "Regex": "accuracy: ([0-9\\.]+)"}]

Create [Hyperparameter Tuner](https://sagemaker.readthedocs.io/en/stable/api/training/tuner.html) which consumes the Estimator, hyperparameter_ranges, metrics, etc. created above.

In [None]:
# Create Hyperparameter Tuner
tuner = HyperparameterTuner(
    estimator,
    objective_metric_name,
    hyperparameter_ranges,
    metric_definitions,
    objective_type="Maximize",
    max_jobs=4,
    max_parallel_jobs=2,
)

In [None]:
tuner.fit({"code": "s3://{}/{}/code".format(bucket, prefix)}, wait=False)

In [None]:
# Repeatedly ping to get the status of the tuning job
import time

status = boto3.client("sagemaker").describe_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=tuner.latest_tuning_job.job_name
)["HyperParameterTuningJobStatus"]

while status != "Completed":
    status = boto3.client("sagemaker").describe_hyper_parameter_tuning_job(
        HyperParameterTuningJobName=tuner.latest_tuning_job.job_name
    )["HyperParameterTuningJobStatus"]

    completed = boto3.client("sagemaker").describe_hyper_parameter_tuning_job(
        HyperParameterTuningJobName=tuner.latest_tuning_job.job_name
    )["TrainingJobStatusCounters"]["Completed"]

    prog = boto3.client("sagemaker").describe_hyper_parameter_tuning_job(
        HyperParameterTuningJobName=tuner.latest_tuning_job.job_name
    )["TrainingJobStatusCounters"]["InProgress"]

    print(f"{status}, Completed Jobs: {completed}, In Progress Jobs: {prog}")

    time.sleep(30)


### 6.3 Get the best training job

In [None]:
# Get the latest tuning job
boto3.client("sagemaker").describe_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=tuner.latest_tuning_job.job_name
)["BestTrainingJob"]


In [None]:
# Get the best training job from all the tuning jobs
best_training = boto3.client("sagemaker").describe_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=tuner.latest_tuning_job.job_name
)["BestTrainingJob"]

In [None]:
# Get S3 location for the model file
best_model_s3 = boto3.client("sagemaker").describe_training_job(
    TrainingJobName=best_training["TrainingJobName"]
)["ModelArtifacts"]["S3ModelArtifacts"]

In [None]:
# Extract the best model for further analysis
!mkdir -p best-model/
!aws s3 cp $best_model_s3 best_model.tar.gz
!tar -xzvf best_model.tar.gz -C ./best-model

In [None]:
ls best-model/