<div style="text-align: right"> &uarr;   Ensure Kernel is set to  &uarr;  </div><br><div style="text-align: right"> 
conda_python3  </div>

# PyTorch Estimator Bring your own Script

In this notebook we will go through and run a PyTorch model to classify the junctions as priority, signal and roundabout as seen in data prep.

The outline of this notebook is 

1. to prepare a training script (provided).

2. use the AWS provided PyTorch container and provide our script to it.

3. Run training.

4. deploy model to end point.

5. Test using an image in couple of possible ways 

Upgrade Sagemaker so we can access the latest containers

In [2]:
!pip install -U sagemaker



Next we will import the libraries and set up the initial variables we will be using in this lab

In [3]:
import os
import sagemaker
import numpy as np
from sagemaker.pytorch import PyTorch

ON_SAGEMAKER_NOTEBOOK = True

sagemaker_session = sagemaker.Session()
if ON_SAGEMAKER_NOTEBOOK:
    role = sagemaker.get_execution_role()
else:
    role = "[YOUR ROLE]"

import boto3
client = boto3.client('sagemaker-runtime')



In the cell below, replace **your-unique-bucket-name** with the name of bucket you created in the data-prep notebook

In [4]:
bucket = "mherrfurth-bucket"
# key = "data-folder"   (in case you structure your data as your-bucket/data-folder) 
training_data_uri="s3://{}".format(bucket)

### PyTorch Estimator

Use AWS provided open source containers, these containers can be extended by starting with the image provided by AWS and the add additional installs in dockerfile

or you can use requirements.txt in source_dir to install additional libraries.

Below code is for PyTorch


In [5]:
estimator = PyTorch(entry_point='ptModelCode.py',
                    role=role,
                    framework_version='1.8',
                    instance_count=1,
                    instance_type='ml.m5.12xlarge',
                    py_version='py3',
                    # available hyperparameters: emsize, nhid, nlayers, lr, clip, epochs, batch_size,
                    #                            bptt, dropout, tied, seed, log_interval
                    )

Now we call the estimators fit method with the URI location of the training data to start the training <br>
**Note:** This cell takes approximately **20 mins** to run

In [6]:
%%time
estimator.fit(training_data_uri)

2022-05-23 17:28:35 Starting - Starting the training job...
2022-05-23 17:29:03 Starting - Preparing the instances for trainingProfilerReport-1653326915: InProgress
.........
2022-05-23 17:30:31 Downloading - Downloading input data......
2022-05-23 17:31:34 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2022-05-23 17:31:38,545 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2022-05-23 17:31:38,547 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-05-23 17:31:38,555 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2022-05-23 17:31:38,563 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2022-05-23 17:31:39,333 sagemaker-training-toolkit INFO     No GPUs

## **NOTE:** <br>
If at this point your kernel disconnects from the server (you can tell because the kernel in the top right hand corner will say **No Kernel**),<br>you can reattach to the training job (so you dont to start the training job again).<br>Follow the steps below
1. Scoll your notebook to the top and set the kernel to the recommended kernel specified in the top right hand corner of the notebook
2. Go to your SageMaker console, Go to Training Jobs and copy the name of the training job you were disconnected from
3. Scoll to the bottom of this notebook, paste your training job name to replace the **your-training-job-name** in the cell
4. Replace **your-unique-bucket-name** with the name of bucket you created in the data-prep notebook
5. Run the edited cell
6. Return to this cell and continue executing the rest of this notebook

We can call the model_data method on the estimator to find the location of the trained model artifacts

In [7]:
estimator.model_data

's3://sagemaker-us-east-1-779416346969/pytorch-training-2022-05-23-17-28-34-228/output/model.tar.gz'

#### Deploying a model
Once trained, deploying a model is a simple call.

**Note:** Replace the **'your_model_uri'** with the URI from the cell above

In [8]:
from sagemaker.pytorch import PyTorchModel
pytorch_model = PyTorchModel(model_data='s3://sagemaker-us-east-1-779416346969/pytorch-training-2022-05-23-17-28-34-228/output/model.tar.gz', 
                             role=role, 
                             entry_point='ptInfCode.py', 
                             framework_version='1.7',
                             py_version='py3')
predictor = pytorch_model.deploy(instance_type='ml.m5.4xlarge', initial_instance_count=1)

------!

Now lets get the endpoint name from predictor

In [9]:
print(predictor.endpoint_name)

pytorch-inference-2022-05-23-17-44-03-444


Now that our endpoint is up and running, lets test it with an image and see how well it does
In the cell below, replace the **your_endpoint_name** with the your endpoint name you had printed out

In [10]:
%%time

im_name="../data/test/Signal/S2.png"

response = client.invoke_endpoint(
    EndpointName='pytorch-inference-2022-05-23-17-44-03-444',
    ContentType='application/x-image',
    Body=open(im_name, 'rb').read())

CPU times: user 15.4 ms, sys: 0 ns, total: 15.4 ms
Wall time: 210 ms


Now let us view the JSON response

In [None]:
import json
json.loads(response['Body'].read().decode("utf-8"))

### Clean up

When we're done with the endpoint, we can just delete it and the backing instances will be released.  Run the following cell to delete the endpoint.

In [None]:
predictor.delete_endpoint()

### Attach to a training job that has been left to run 

If your kernel becomes disconnected and your training has already started, you can reattach to the training job.<br>
In the cell below, replace **your-unique-bucket-name** with the name of bucket you created in the data-prep notebook<br>
Simply look up the training job name and replace the **your-training-job-name** and then run the cell below. <br>
Once the training job is finished, you can continue the cells after the training cell

In [None]:
import sagemaker
import boto3
from sagemaker.pytorch import PyTorch

sess = sagemaker.Session()
role = sagemaker.get_execution_role()
client = boto3.client('sagemaker-runtime')

bucket = "your-unique-bucket-name"

training_job_name = 'your-training-job-name'

if 'your-training' not in training_job_name:
    estimator = sagemaker.estimator.Estimator.attach(training_job_name=training_job_name, sagemaker_session=sess)