### Bring your own Container

In this notebook, we will cover how to bring our own container with either a framework or algorithm to train a model on SageMaker. 

We will use fastai in this case and build our container with custom training code integrated into the container. The other option is to use script mode which is easily done by changing the entrypoint.


#### Container Image
Let's start with building a container image locally and then push that to ECR (Elastic Container Registry)

In [1]:
%cd docker

/home/ec2-user/SageMaker/explore-digits/byoc-workshop/docker


In [2]:
!docker build -t am-scikit .

Sending build context to Docker daemon  10.75kB
Step 1/11 : FROM debian
 ---> fe3c5de03486
Step 2/11 : LABEL maintainer="Ashley Miller"
 ---> Using cache
 ---> 9a55b42f5382
Step 3/11 : WORKDIR /
 ---> Using cache
 ---> a58f6daeef5c
Step 4/11 : RUN apt-get update
 ---> Using cache
 ---> 2d8e58cc7cdf
Step 5/11 : RUN apt-get install python3 -y
 ---> Using cache
 ---> d7a30b34805a
Step 6/11 : RUN apt-get install pip -y
 ---> Using cache
 ---> 98121f1fa42e
Step 7/11 : RUN pip3 install --no-cache scikit-learn
 ---> Using cache
 ---> e057055b5a57
Step 8/11 : RUN pip3 install --no-cache pandas
 ---> Using cache
 ---> 802a561398f1
Step 9/11 : RUN pip3 install --no-cache --upgrade sagemaker-training
 ---> Using cache
 ---> 73a7b1715100
Step 10/11 : COPY code/* /opt/ml/code/
 ---> Using cache
 ---> 797417c15b82
Step 11/11 : ENV SAGEMAKER_PROGRAM train.py
 ---> Using cache
 ---> 8ebfbfcd977d
Successfully built 8ebfbfcd977d
Successfully tagged am-scikit:latest


In [3]:
!docker images

REPOSITORY                                                                                          TAG                 IMAGE ID            CREATED             SIZE
662559257807.dkr.ecr.us-east-1.amazonaws.com/sagemaker-training-containers/scikit-training          latest              8ebfbfcd977d        29 minutes ago      936MB
am-scikit                                                                                           latest              8ebfbfcd977d        29 minutes ago      936MB
662559257807.dkr.ecr.us-east-1.amazonaws.com/sagemaker-training-containers/script-mode-containers   latest              5e8de973ad88        6 hours ago         937MB
debian                                                                                              latest              fe3c5de03486        6 days ago          124MB


## Set the ecr details and tags 
Lets set a few params here like ecr name space , tag name etc.

In [4]:
from sagemaker import get_execution_role
import boto3
ecr_namespace = "sagemaker-training-containers/"
prefix = "scikit-training"

ecr_repository_name = ecr_namespace + prefix

role = get_execution_role()
account_id = role.split(":")[4]
region = boto3.Session().region_name
tag_name = account_id+'.dkr.ecr.'+region+'.amazonaws.com/'+ecr_repository_name+':latest'

In [5]:
tag_name

'662559257807.dkr.ecr.us-east-1.amazonaws.com/sagemaker-training-containers/scikit-training:latest'

In [6]:
!docker tag am-scikit $tag_name

### ECR Repository and push steps

All of these can be scripted out but they are laid out this way for transparency and step evolution understanding

In [7]:
!$(aws ecr get-login --no-include-email)

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded


In [8]:
!aws ecr create-repository --repository-name $ecr_repository_name


An error occurred (RepositoryAlreadyExistsException) when calling the CreateRepository operation: The repository with name 'sagemaker-training-containers/scikit-training' already exists in the registry with id '662559257807'


In [9]:
!docker push $tag_name

The push refers to repository [662559257807.dkr.ecr.us-east-1.amazonaws.com/sagemaker-training-containers/scikit-training]

[1B693515b6: Preparing 
[1Bc9d042e5: Preparing 
[1Bae548d4f: Preparing 
[1Bcb97c0e2: Preparing 
[1B358dd881: Preparing 
[1B6d42ec6a: Preparing 
[1Bb4d0e7a4: Preparing 
[2Bb4d0e7a4: Layer already exists [3A[2K[2A[2Klatest: digest: sha256:9cab507c5ae90709009bdffe53fab282b7cae1b7a0a30e89dbbdca3f3c1aba05 size: 2009


In [10]:
container_image_uri = "{0}.dkr.ecr.{1}.amazonaws.com/{2}:latest".format(
    account_id, region, ecr_repository_name
)
print(container_image_uri)

662559257807.dkr.ecr.us-east-1.amazonaws.com/sagemaker-training-containers/scikit-training:latest


#### Call your custom container to train the model
Our customer Docker image is now complete and uploaded to our ECR (Elastic Container Registry).  
Our code can now reference the customer Docker container to run our 'train.py' script.  

In [11]:
import sagemaker
import json

# JSON encode hyperparameters
def json_encode_hyperparameters(hyperparameters):
    return {str(k): json.dumps(v) for (k, v) in hyperparameters.items()}

hyperparameters = json_encode_hyperparameters({'min-samples-leaf':2, 'n-estimators':500})

# now we will call the generic SageMaker Estimator
est = sagemaker.estimator.Estimator(
    container_image_uri,
    role,
    instance_count=1,
    #train_instance_type="local",  # we use local mode
    instance_type='ml.m5.4xlarge',
    base_job_name=prefix,
    hyperparameters=hyperparameters,
)

# s3 URI of the preprocessed training data that we created in the BYOM lab
preprocessed_training_data = 's3://sagemaker-us-east-1-662559257807/sagemaker-scikit-learn-2021-08-20-22-37-42-314/output/train/'
train_config = sagemaker.session.TrainingInput(preprocessed_training_data)

In [12]:
%%time
est.fit({"train": train_config})

2021-08-24 00:52:22 Starting - Starting the training job...
2021-08-24 00:52:45 Starting - Launching requested ML instancesProfilerReport-1629766342: InProgress
......
2021-08-24 00:53:46 Starting - Preparing the instances for training......
2021-08-24 00:54:50 Downloading - Downloading input data...
2021-08-24 00:55:06 Training - Downloading the training image...
2021-08-24 00:55:46 Training - Training image download completed. Training in progress.[34m2021-08-24 00:55:34,719 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2021-08-24 00:55:40,951 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2021-08-24 00:55:40,961 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2021-08-24 00:55:40,968 sagemaker-training-toolkit INFO     Invoking user script
[0m
[34mTraining Env:
[0m
[34m{
    "additional_framework_parameters": {},
    "channel_input_dirs": {
     

In [13]:
training_job_description = est.jobs[-1].describe()
model_data_s3_uri = "{}{}/{}".format(
    training_job_description["OutputDataConfig"]["S3OutputPath"],
    training_job_description["TrainingJobName"],
    "output/model.tar.gz",
)
print(training_job_description["TrainingJobName"])
print(model_data_s3_uri)

scikit-training-2021-08-24-00-52-21-942
s3://sagemaker-us-east-1-662559257807/scikit-training-2021-08-24-00-52-21-942/output/model.tar.gz


#### Evaluate the trained model
Now that we have used our custom Docker container to train a Scikit-learn 0.24 model, let's see how well it performs.  

In [None]:
training_job_description = sklearn.jobs[-1].describe()

model_data_s3_uri = "{}{}/{}".format(
    training_job_description["OutputDataConfig"]["S3OutputPath"],
    training_job_description["TrainingJobName"],
    "output/model.tar.gz",
)
print(training_job_description["TrainingJobName"])
print(model_data_s3_uri)

In [None]:
sklearn_processor = SKLearnProcessor(
    framework_version='0.23-1',
    role=role,
    instance_type='ml.m5.xlarge',
    instance_count=1
)

sklearn_processor.run(
    code="code/evaluation.py",
    inputs=[
        ProcessingInput(source=model_data_s3_uri, destination="/opt/ml/processing/model"),
#       ProcessingInput(source=preprocessed_training_data, destination="/opt/ml/processing/train"),
        ProcessingInput(source=preprocessed_test_data, destination="/opt/ml/processing/test"),
    ],
    outputs=[ProcessingOutput(output_name="evaluation", source="/opt/ml/processing/evaluation")],
)
evaluation_job_description = sklearn_processor.jobs[-1].describe()