# MATLAB with Amazon SageMaker Processing Job

1. [Getting ECR Images from setup notebook](#1.-Getting-ECR-Image-from-setup-notebook)
2. [SageMaker processor](#2.-SageMaker-processor)
    1. [Write the MATLAB script `main.m`](#2.1-Write-the-MATLAB-script)
    2. [SageMaker training job](#2.2-SageMaker-training-job)
    3. [Getting results back and printing accuracy](#2.3-Getting-results-back-from-training-container-to-SageMaker-instance)

## 1. Getting ECR Image from setup notebook

In [None]:
from sagemaker import get_execution_role
import pandas as pd
import sagemaker
import boto3
import os

role = get_execution_role()
print(role)

You can refer the `setup.ipynb` notebook to push your MATLAB docker image to Amazon ECR

In [None]:
account_id = boto3.client('sts').get_caller_identity().get('Account')
region = boto3.Session().region_name

# Your ECR repo name
ecr_repository = 'sagemaker-demo-ecr'
tag = ':matlab'
processing_repository_uri = '{}.dkr.ecr.{}.amazonaws.com/{}'.format(account_id, region, ecr_repository + tag)

print("ECR Repository Name: ", ecr_repository)
print("ECR Repository URI:", processing_repository_uri)

## 2. SageMaker training  

### 2.1 Write the MATLAB script

The training code is written in the file “main.m”. It is inspired from MathWorks example - [Create Simple Deep Learning Network for Classification](https://www.mathworks.com/help/deeplearning/ug/create-simple-deep-learning-network-for-classification.html). 

Overview of the script - 

- Load the digit sample dataset as an [image datastore](https://www.mathworks.com/help/matlab/ref/matlab.io.datastore.imagedatastore.html).
- splits the dataset into training & testing set.
- define the convolutional neural network architecture.
- specify training options.
- train the network.
- classify validation images and compute accuracy.


In [None]:
%%writefile train.m

rng(10); % For reproducibility

tic
disp('starting the Deep Learning Example')

digitDatasetPath = fullfile(matlabroot,'toolbox','nnet','nndemos', ...
    'nndatasets','DigitDataset');
imds = imageDatastore(digitDatasetPath, ...
    'IncludeSubfolders',true,'LabelSource','foldernames');

disp('dataset loaded in memory')
prefixPath = '/opt/ml/'
modelPath = append(prefixPath, 'model/')

[imdsTrain,imdsValidation] = splitEachLabel(imds,750,'randomize');

layers = [
    imageInputLayer([28 28 1])

    convolution2dLayer(3,8,'Padding','same')
    batchNormalizationLayer
    reluLayer

    maxPooling2dLayer(2,'Stride',2)

    convolution2dLayer(3,16,'Padding','same')
    batchNormalizationLayer
    reluLayer

    maxPooling2dLayer(2,'Stride',2)

    convolution2dLayer(3,32,'Padding','same')
    batchNormalizationLayer
    reluLayer

    fullyConnectedLayer(10)
    softmaxLayer
    classificationLayer];


options = trainingOptions('sgdm', ...
    'InitialLearnRate',0.01, ...
    'MaxEpochs',4, ...
    'Shuffle','every-epoch', ...
    'ValidationData',imdsValidation, ...
    'ValidationFrequency',30, ...
    'Verbose',false, ...
    'Plots','training-progress');


disp('Training started')

net = trainNetwork(imdsTrain,layers,options);

save(append(modelPath, 'model.mat'),'net')
disp('Training finsished')

YPred = classify(net,imdsValidation);
YValidation = imdsValidation.Labels;

accuracy = 100*(sum(YPred == YValidation)/numel(YValidation));
toc
disp('Accuracy - ' + string(accuracy))

try
    fileID = fopen(append(modelPath, 'results.txt'),'w');
    disp(fileID)
    if fileID==-1
        disp('cannot open file properly')
    else
        fprintf(fileID,'Accuracy - %g\n', accuracy);
        fclose(fileID);
    end
catch
    disp('error saving file to output')
end

Upload `train.m` file to your S3 bucket for the MATLAB training container to access.

In [None]:
s3 = boto3.resource('s3')
s3.meta.client.upload_file('train.m', 'mthakker-example-dataset', 'digit-dataset/code/train.m')

### 2.2 SageMaker training job

Create Training job configuration.
For more information - https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html

In [None]:
import time
matlab_job = 'TRAIN-matlab-' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime()) # Training job name

print("Training job", matlab_job)

bucket = 'mthakker-example-dataset' # Name of the bucket where code is stored
prefix = 'digit-dataset'

matlab_training_params = {
    "RoleArn": role,
    "TrainingJobName": matlab_job,
    "EnableManagedSpotTraining": True,
    "AlgorithmSpecification": {
        "TrainingImage": processing_repository_uri,
        "TrainingInputMode": "File"
    },
    "ResourceConfig": {
        "InstanceCount": 1,
        "InstanceType": "ml.m4.xlarge",
        "VolumeSizeInGB": 25
    },
    "OutputDataConfig": {
        "S3OutputPath": "s3://{}/{}/output".format(bucket, prefix)
    },
    "InputDataConfig": [
        {
            "ChannelName": "code",
            "DataSource": {
                "S3DataSource": {
                    "S3DataType": "S3Prefix",
                    "S3Uri": "s3://{}/{}/code".format(bucket, prefix),
                    "S3DataDistributionType": "FullyReplicated"
                }
            },
            "CompressionType": "None",
            "RecordWrapperType": "None"
        }
    ],
    "VpcConfig" : {
        "SecurityGroupIds": [
            "sg-0a309c5a52ca2d061"
        ],
        "Subnets": [
            "subnet-81733da0"
        ]
    },
    "StoppingCondition": {
        "MaxRuntimeInSeconds": 60 * 20,
        "MaxWaitTimeInSeconds": 60 * 30,
    },
    "Environment": {
        "MLM_LICENSE_FILE": "27000@172.31.90.211",
        "MATLAB_USE_USERWORK": "1",
        "MATLAB_USERWORKDIR": "/opt/ml/input/data/code"
    }
}

Run training job with the configurations created above.

In [None]:
%%time
sm = boto3.client('sagemaker')
sm.create_training_job(**matlab_training_params)

status = sm.describe_training_job(TrainingJobName=matlab_job)['TrainingJobStatus']
print(status)
sm.get_waiter('training_job_completed_or_stopped').wait(TrainingJobName=matlab_job)
status = sm.describe_training_job(TrainingJobName=matlab_job)['TrainingJobStatus']
print("Training job ended with status: " + status)
if status == 'Failed':
    message = sm.describe_training_job(TrainingJobName=matlab_job)['FailureReason']
    print('Training failed with the following error: {}'.format(message))
    raise Exception('Training job failed')

### 2.3 Getting results back from training container to SageMaker instance

We extract the S3 path of the `results.txt` from the training job, read the contents of the file via [`pd.read_csv`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) function and print the accuracy. 

In [None]:
s3_output_dir = sm.describe_training_job(TrainingJobName=matlab_job)['ModelArtifacts']['S3ModelArtifacts']
!aws s3 cp $s3_output_dir output.tar.gz
!tar -xf output.tar.gz

In [None]:
df = pd.read_csv('results.txt', header=None)

In [None]:
print(df[0][0])