# MATLAB with Amazon SageMaker Example

1. [Introduction](#1.-Introduction)

2. [MATLAB on Amazon SageMaker](#2.-MATLAB-on-Amazon-SageMaker)

3. [Prerequisites](#3.-Prerequisites)
    1. [Roles, Permissions and Docker Service](#3.1-Roles,-Permissions-and-Docker-Service)
    2. [License Manager for MATLAB](#3.2-License-Manager-for-MATLAB)
    3. [Dockerfile & dependencies](#3.3-Dockerfile-&-dependencies)
    
4. [MATLAB docker image on ECR](#4.-MATLAB-docker-image-on-ECR)
    1. [Create docker image from Dockerfile](#4.1-Create-docker-image-from-Dockerfile)
    2. [Push MATLAB image to ECR](#4.2-Push-MATLAB-image-to-ECR)
    
5. [SageMaker processor](#5.-SageMaker-processor)
    1. [Defining network configuration for Processing jobs](#5.1-Defining-network-configuration-for-Processing-jobs)
    1. [Write the MATLAB script `main.m`](#5.2-Write-the-MATLAB-script)
    2. [Running the docker processing container](#5.3-Running-the-docker-processing-container)
    3. [Getting results back and printing accuracy](#5.4-Getting-results-back-from-docker-processing-container-to-SageMaker-instance)
6. [Clean up](#6.-Cleaning-up-resources)

## 1. Introduction 

[Amazon SageMaker](https://aws.amazon.com/sagemaker/) helps data scientists and developers to prepare, build, train, and deploy high-quality machine learning (ML) models quickly by bringing together a broad set of capabilities purpose-built for ML.

## 2. MATLAB on Amazon SageMaker

With Amazon SageMaker, users can package their own algorithms that can then be trained and deployed in the SageMaker environment. This notebook will guide you through an example that shows you how to build a Docker container for SageMaker with MATLAB and use it for processing, training and inference.

This notebook shows how you can:

1. Build a docker container with MATLAB by using the [official docker hub MATLAB image](https://hub.docker.com/r/mathworks/matlab-deep-learning).  
2. Publish the docker container to [Amazon ECR](https://aws.amazon.com/ecr/), from where SageMaker can use it to run processing jobs.
3. Run a processing job on dataset for preprocessing, training and inference.
4. Get results back from the processing job inside the SageMaker environment.



## 3. Prerequisites

### 3.1 Roles, Permissions and Docker Service

To get started, import the Python libraries you'll need, and set up the environment with a few prerequisites for permissions and configurations.

In [None]:
from sagemaker import get_execution_role
import pandas as pd
import sagemaker
import boto3
import os

role = get_execution_role()
print(role)

Beacuse you would be pulling the [matlab-deep-learning docker image](https://hub.docker.com/r/mathworks/matlab-deep-learning) (which has a compressed size of 7.89 GB), change the default docker location to the EBS volume mounted when creating the SageMaker notebook instance. 

The below commands stop the docker service, move the default docker directory from `/var/lib/docker` to `/home/ec2-user/Sagemaker/docker`, and then start the docker service again.

In [None]:
!mkdir -p /home/ec2-user/SageMaker/docker/
!sudo service docker stop
!sudo mv /var/lib/docker/ /home/ec2-user/SageMaker/docker/
!sudo ln -s /home/ec2-user/SageMaker/docker/ /var/lib/docker
!sudo service docker start

### 3.2 License Manager for MATLAB

Follow the steps in the GitHub repo to launch tha License Manager for MATLAB on AWS - https://github.com/mathworks-ref-arch/license-manager-for-matlab-on-aws. 

The docker instance communicates with the License Manager for licensing via the `MLM_LICENSE_FILE` flag. So, note down the private IP Address of the License Manager.

### 3.3 Dockerfile & dependencies

In [None]:
!mkdir -p matlab-docker

Add a new `CMD` command which specifies the instruction to be executed when a Docker container starts. Create a dockerfile which pulls MATLAB's image from https://hub.docker.com/r/mathworks/matlab-deep-learning and adds a new CMD.

In the Dockerfile, docker container runs the script `main` located in `/opt/ml/processing/src_files` when it starts. You create this script in section [Write the MATLAB script `main.m`](#5.2-Write-the-MATLAB-script) - which is then uploaded to the docker container in section [Running the docker processing container](#5.3-Running-the-docker-processing-container).

In [None]:
%%writefile matlab-docker/Dockerfile
FROM mathworks/matlab-deep-learning
USER root
CMD ["matlab", "-batch", "cd /opt/ml/processing/src_files; main; exit"]

In [None]:
!echo ==== Generated Dockerfile ====
!cat matlab-docker/Dockerfile

## 4. MATLAB docker image on ECR  

A Docker image with MATLAB needs to be available for SageMaker to use. 

The following steps:
* Builds a MATLAB deep learning container from Dockerfile in section [Dockerfile & dependencies](#3.3-Dockerfile-&-dependencies) .
* Creates an ECR Repo, and [pushes the container image](https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html) to it.

These steps can be skipped if you already have a Docker Container image with MATLAB installed in an Amazon ECR repository.


In [None]:
account_id = boto3.client('sts').get_caller_identity().get('Account')
region = boto3.Session().region_name

ecr_repository = 'sagemaker-demo-ecr' #ECR repository. which contains MATLAB deep learning container
tag = ':latest'
processing_repository_uri = '{}.dkr.ecr.{}.amazonaws.com/{}'.format(account_id, region, ecr_repository + tag)

print("ECR Repository Name: ", ecr_repository)
print("ECR Repository URI:", processing_repository_uri)

### 4.1 Create docker image from Dockerfile

In [None]:
# Build docker image locally in SageMaker environment
!docker build -t $ecr_repository matlab-docker/

In [None]:
!docker tag {ecr_repository + tag} $processing_repository_uri
!docker image ls

### 4.2 Push MATLAB image to ECR

In [None]:
# Creates the ECR Repository
!aws ecr create-repository --repository-name $ecr_repository

# Authorize Docker to publish to ECR
!aws ecr get-login-password --region {region} | docker login --username AWS --password-stdin {account_id}.dkr.ecr.{region}.amazonaws.com

In [None]:
# push MATLAB image to ECR
!docker push $processing_repository_uri

## 5. SageMaker processor 

### 5.1 Defining network configuration for Processing jobs

Follow this documentation from Amazon to give SageMaker Processing Jobs Access to Resources in your Amazon VPC - https://docs.aws.amazon.com/sagemaker/latest/dg/process-vpc.html. 

Make sure the same subnet is used to create Network license manager in section [License Manager for MATLAB](#3.2-License-Manager-for-MATLAB). 


In [3]:
from sagemaker.network import NetworkConfig

# Network configuration documentation - https://sagemaker.readthedocs.io/en/stable/api/utility/network.html
processing_network_config = NetworkConfig(security_group_ids=["sg-abcdefg1234"], 
                                          subnets=["subnet-abcdefg"])
print(processing_network_config._to_request_dict())

ModuleNotFoundError: No module named 'sagemaker'

Intialize the `sagemaker.processing.Processor` using the following arguments -

- `image_uri`:  the processing_repository_uri defined above.
-  `role`: the role defined above.
- `instance_count`: number of instances to spawn in the job.
- `instance_type`: the type of instance to spawn (For more information - https://aws.amazon.com/sagemaker/pricing/).
- `network_config` : network configration for processing jobs  
- `env`: environment variable to be passed to the docker instance. Includes `MLM_LICENSE_FILE` variable generated via the License Manager for MATLAB.

In [1]:
from sagemaker.processing import Processor, ProcessingInput, ProcessingOutput

processor = Processor(
    image_uri=processing_repository_uri,
    role=role,
    instance_count=1,
    instance_type="ml.m5.xlarge", #replace with required instance
    network_config = processing_network_config,
    env = {"MLM_LICENSE_FILE":"27000@111.222.333.444"} #replace with your network license manager private IP address
)

ModuleNotFoundError: No module named 'sagemaker'

### 5.2 Write the MATLAB script

The training code is written in the file `main.m`. It is inspired from MathWorks example - [Create Simple Deep Learning Network for Classification](https://www.mathworks.com/help/deeplearning/ug/create-simple-deep-learning-network-for-classification.html). 

Overview of the script - 

- loads the digit sample dataset as an [image datastore](https://www.mathworks.com/help/matlab/ref/matlab.io.datastore.imagedatastore.html).
- splits the dataset into training & testing set.
- define the convolutional neural network architecture.
- specify training options.
- train the network.
- classify validation images and compute accuracy.


In [None]:
%%writefile main.m
tic
disp("starting the Deep Learning Example")
disp(pwd)

digitDatasetPath = fullfile(matlabroot,'toolbox','nnet','nndemos', ...
    'nndatasets','DigitDataset');
imds = imageDatastore(digitDatasetPath, ...
    'IncludeSubfolders',true,'LabelSource','foldernames');

disp("dataset loaded in memory")

labelCount = countEachLabel(imds)

numTrainFiles = 750;
[imdsTrain,imdsValidation] = splitEachLabel(imds,numTrainFiles,'randomize');

layers = [
    imageInputLayer([28 28 1])

    convolution2dLayer(3,8,'Padding','same')
    batchNormalizationLayer
    reluLayer

    maxPooling2dLayer(2,'Stride',2)

    convolution2dLayer(3,16,'Padding','same')
    batchNormalizationLayer
    reluLayer

    maxPooling2dLayer(2,'Stride',2)

    convolution2dLayer(3,32,'Padding','same')
    batchNormalizationLayer
    reluLayer

    fullyConnectedLayer(10)
    softmaxLayer
    classificationLayer];

options = trainingOptions('sgdm', ...
    'InitialLearnRate',0.01, ...
    'MaxEpochs',4, ...
    'Shuffle','every-epoch', ...
    'ValidationData',imdsValidation, ...
    'ValidationFrequency',30, ...
    'Verbose',false, ...
    'Plots','training-progress');

disp("Training started")

net = trainNetwork(imdsTrain,layers,options);

disp("Training finsished")

YPred = classify(net,imdsValidation);
YValidation = imdsValidation.Labels;

accuracy = 100*(sum(YPred == YValidation)/numel(YValidation));
toc
disp("Accuracy - " + string(accuracy))

try
    fileID = fopen('/opt/ml/processing/output_data/results.txt','w');
    disp(fileID)
    if fileID==-1
        disp("cannot open file properly")
    else
        fprintf(fileID,'Accuracy - %g\n', accuracy);
        fclose(fileID);
    end
catch
    disp("error saving file to output")
end

### 5.3 Running the docker processing container

Have a look at "Running MATLAB via Processing Job" section of the README file to understand how to MATLAB via Processing job inside the new processing container.

Summarizing the same below:

- Uploading `main.m` script file from SageMaker instance to docker processing container.

- In section [Dockerfile & dependencies](#3.3-Dockerfile-&-dependencies), a CMD command to run the script `main` located in `/opt/ml/processing/src_files/` directory when the docker container runs.

- `main.m` script is uploaded to the `/opt/ml/processing/src_files/` directory of the docker processing container.

- Processor gets the `output_data` directory back from the docker processing container as a [ProcessingOutput](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ProcessingOutput) object.

In [None]:
%%time
processor.run(
    inputs=
    [ProcessingInput(
        source='/home/ec2-user/SageMaker/main.m', #location to your main.m script
        destination='/opt/ml/processing/src_files/'),
    ],
    outputs = [
        ProcessingOutput(
            output_name="results",
            source="/opt/ml/processing/output_data",
        ),
    ]
)

### 5.4 Getting results back from docker processing container to SageMaker instance

You can get the `ProcessingOutputConfig` field from latest processing job.

In the `main.m` script, you stored accuracy in a `results.txt` file, which is stored in an S3 location. Extract the S3 path of the `results.txt` from the latest processing job, read the contents of the file via [`pd.read_csv`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) function and print the accuracy. 

In [None]:
# get the output of the latest processing job
preprocessing_job_description = processor.jobs[-1].describe()
output_config = preprocessing_job_description["ProcessingOutputConfig"]
print(output_config)

In [None]:
# read the results.txt file from output
s3_output_dir = output_config['Outputs'][0]['S3Output']['S3Uri']
s3_result = os.path.join(s3_output_dir, "results.txt")

In [None]:
print(pd.read_csv(s3_result, header=None)[0][0])

## 6. Cleaning up resources

To avoid incurring unnecessary charges, use the AWS Management Console to delete the resources that you created while running the example.

- Open the Amazon S3 console at https://console.aws.amazon.com/s3/, and then delete the bucket that you created for storing model artifacts and the training dataset.
- Open the Amazon ECR console at https://console.aws.amazon.com/ecr/, and then delete the repository that you created for storing MATLAB docker image container.