<div style="text-align: right"> &uarr;   Ensure Kernel is set to  &uarr;  </div><br><div style="text-align: right"> 
conda_amazonei_pytorch_latest_p36  </div>

In [None]:
!pip install -U sagemaker
#restart your kernel

# FastAI Training using SageMaker Bring your own Container (BYOC)

In this notebook, we will cover how to bring our own container with either a framework or algorithm to train a model on SageMaker. 

We will use fastai in this case and build our container with custom training code integrated into the container. The other option is to use script mode which is easily done by changing the entrypoint.

The outline of this notebook is 

1. **Build docker image** for FastAI and serving and training code (provided).

2. Log into ECR, tag and **push docker image to ECR**

3. Use the FastAI container image in SageMaker to **train model**

4. **Deploy model** to endpoint using the container image

5. **Test inference** using an image in couple of possible ways 

#### Container Image
Let's start with building a container image locally and then push that to ECR (Elastic Container Registry)

In [1]:
%cd ~/SageMaker/sageMakerWorkshop/byoc/docker

/home/ec2-user/SageMaker/sageMakerWorkshop/byoc/docker


In [2]:
!docker build -t fastai .

Sending build context to Docker daemon   12.8kB
Step 1/8 : FROM fastdotai/fastai:2021-02-11
 ---> c15a6ed2e7f0
Step 2/8 : LABEL maintainer="Raj Kadiyala"
 ---> Running in 78929be9a70c
Removing intermediate container 78929be9a70c
 ---> c8519e4295ec
Step 3/8 : WORKDIR /
 ---> Running in ea5d9472f850
Removing intermediate container ea5d9472f850
 ---> 253118e580b5
Step 4/8 : RUN pip3 install --no-cache --upgrade requests
 ---> Running in bee595f2f4a7
Collecting requests
  Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB)
Collecting charset-normalizer~=2.0.0
  Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB)
Installing collected packages: charset-normalizer, requests
  Attempting uninstall: requests
    Found existing installation: requests 2.24.0
    Uninstalling requests-2.24.0:
      Successfully uninstalled requests-2.24.0
Successfully installed charset-normalizer-2.0.12 requests-2.27.1
Removing intermediate container bee595f2f4a7
 ---> 08eb32013bc6
Step 5/8 : ENV P

In [3]:
!docker images

REPOSITORY         TAG          IMAGE ID       CREATED         SIZE
fastai             latest       15bb4e9aaa2a   4 seconds ago   7.53GB
fastdotai/fastai   2021-02-11   c15a6ed2e7f0   15 months ago   7.43GB


## Set the ecr details and tags 
Lets set a few params here like ecr name space , tag name etc.

In [4]:
from sagemaker import get_execution_role
import boto3
ecr_namespace = "sagemaker-training-containers/"
prefix = "script-mode-container-fastai"

ecr_repository_name = ecr_namespace + prefix
role = get_execution_role()
account_id = role.split(":")[4]
region = boto3.Session().region_name
tag_name=account_id+'.dkr.ecr.'+region+'.amazonaws.com/'+ecr_repository_name+':latest'

In [5]:
tag_name

'601877770506.dkr.ecr.us-east-1.amazonaws.com/sagemaker-training-containers/script-mode-container-fastai:latest'

Now we tag our image with the tag name we generated above

In [6]:
!docker tag fastai $tag_name

### ECR Repository and push steps

All of these can be scripted out but they are laid out this way for transparency and step evolution understanding

First we get a token credential to ECR. This will allow us to perform ECR operations

In [7]:
!$(aws ecr get-login --no-include-email)

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded


Here we create an ECR repository

In [8]:
!aws ecr create-repository --repository-name $ecr_repository_name


An error occurred (RepositoryAlreadyExistsException) when calling the CreateRepository operation: The repository with name 'sagemaker-training-containers/script-mode-container-fastai' already exists in the registry with id '601877770506'


Now that our ECR respoitory has been created, we can now push our docker image to it with the tag name we assigned to it

In [9]:
!docker push $tag_name

The push refers to repository [601877770506.dkr.ecr.us-east-1.amazonaws.com/sagemaker-training-containers/script-mode-container-fastai]

[1B8d942104: Preparing 
[1Bbb450374: Preparing 
[1Bee6e32ab: Preparing 
[1B03119f42: Preparing 
[1B9d9efed5: Preparing 
[1B5dee3f41: Preparing 
[1Be46047de: Preparing 
[1Bea1e71e9: Preparing 
[1Bbf18a086: Preparing 
[1Bfc49132e: Preparing 
[1B5e116b6d: Preparing 
[1B5da50cc0: Preparing 
[1B722bdc07: Preparing 
[1Bb673a1d6: Preparing 
[1B150d2459: Preparing 
[1B6268583e: Preparing 
[1Bcc6eae8b: Preparing 
[1B8881187d: Preparing 
[1B5df75b44: Preparing 
[19Bb450374: Pushed   97.79MB/95.62MB[2K[18A[2K[14A[2K[20A[2K[11A[2K[19A[2K[8A[2K[7A[2K[4A[2K[3A[2K[20A[2K[19A[2K[18A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2K[19A[2

This is how we get the URI of our uploaded docker image in ECR

In [10]:
container_image_uri = "{0}.dkr.ecr.{1}.amazonaws.com/{2}:latest".format(
    account_id, region, ecr_repository_name
)
print(container_image_uri)

601877770506.dkr.ecr.us-east-1.amazonaws.com/sagemaker-training-containers/script-mode-container-fastai:latest


#### Call your custom container to train the model

In the cell below, replace **your-unique-bucket-name** with the name of bucket you created in the data-prep notebook<br>
**Note:** This cell takes around **20 mins** to run

In [11]:
%%time
import sagemaker
import json

bucket = "amagroup-workshop-2022"


# JSON encode hyperparameters
def json_encode_hyperparameters(hyperparameters):
    return {str(k): json.dumps(v) for (k, v) in hyperparameters.items()}


hyperparameters = json_encode_hyperparameters({"lr":1e-03})

est = sagemaker.estimator.Estimator(
    container_image_uri,
    role,
    instance_count=1,
    instance_type='ml.m5.12xlarge',
    base_job_name=prefix,
    hyperparameters=hyperparameters,
)

train_config = sagemaker.session.TrainingInput(f's3://{bucket}/train')

est.fit({"train": train_config})

2022-05-25 00:21:27 Starting - Starting the training job...
2022-05-25 00:21:45 Starting - Preparing the instances for trainingProfilerReport-1653438087: InProgress
......
2022-05-25 00:22:53 Downloading - Downloading input data......
2022-05-25 00:23:39 Training - Downloading the training image..............[34m2022-05-25 00:26:12,070 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-05-25 00:26:12,102 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-05-25 00:26:12,112 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-05-25 00:26:12,121 sagemaker-training-toolkit INFO     Invoking user script[0m
[34mTraining Env:[0m
[34m{
    "additional_framework_parameters": {},
    "channel_input_dirs": {
        "train": "/opt/ml/input/data/train"
    },
    "current_host": "algo-1",
    "framework_module": null,
    "hosts": [
        "algo-1"
    ],
 

Finally let us print out the trained FastAI model location. You will need this information for the inference step

In [12]:
print(f'FastAI Model located at \n{est.output_path}{est._current_job_name}/output/model.tar.gz')

FastAI Model located at 
s3://sagemaker-us-east-1-601877770506/script-mode-container-fastai-2022-05-25-00-21-26-874/output/model.tar.gz


### Attach to a training job that has been left to run 

If your kernel becomes disconnected and your training has already started, you can reattach to the training job.<br>
Simply look up the training job name and replace the **your-training-job-name** and then run the cell below. <br>
Once the training job is finished, you can continue the cells after the training cell

In [None]:
import sagemaker
import boto3

sess = sagemaker.Session()
role = sagemaker.get_execution_role()

training_job_name = 'your-training-job-name'

if training_job_name != 'your-training-job-name':
    est = sagemaker.estimator.Estimator.attach(training_job_name=training_job_name, sagemaker_session=sess)