# Part I. Preparing a Docker Image

Before diving into the nitty-gritty of Sagemaker training and deploy, it is crutial to make sure the training and deploy "container" is set up. This container will provide the most up-to-date version of GluonCV, MXNet and other essential programming environments, which enable us to achieve state-of-the-art(SOTA) model training and deployment.
Let's take a look of the process of setting up a container.

### 1. Building a Docker Image

If this is the first time of using SageMaker training and deployment, you will need to prepare a Docker image by running the following commands:

In [2]:
%%writefile DockerfileCV
FROM ubuntu:18.04

MAINTAINER Amazon AI <sage-learner@amazon.com>

ARG APP=image_classification

RUN apt-get -y update && apt-get install -y --no-install-recommends \
         wget \
         python3-dev \
         nginx \
         ca-certificates \
         libgomp1 \
    && rm -rf /var/lib/apt/lists/*

RUN ln -s /usr/bin/python3 /usr/bin/python & \
    ln -s /usr/bin/pip3 /usr/bin/pip

# Here we get all python packages.
# There's substantial overlap between scipy and numpy that we eliminate by
# linking them together. Likewise, pip leaves the install caches populated which uses
# a significant amount of space. These optimizations save a fair amount of space in the
# image, which reduces start up time.
RUN wget https://bootstrap.pypa.io/get-pip.py && python get-pip.py && \
    pip install numpy scipy flask gevent gunicorn mxnet-mkl==1.6.0 gluoncv==0.7.0 && \
        rm -rf /root/.cache

# Set some environment variables. PYTHONUNBUFFERED keeps Python from buffering our standard
# output stream, which means that logs can be delivered to the user quickly. PYTHONDONTWRITEBYTECODE
# keeps Python from writing the .pyc files which are unnecessary in this case. We also update
# PATH so that the train and serve programs are found when the container is invoked.

ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"

# Set up the program in the image
COPY $APP /opt/program
WORKDIR /opt/program

Writing DockerfileCV


In [4]:
!cat build.sh


#!/usr/bin/env bash

# This script shows how to build the Docker image and push it to ECR to be ready for use
# by SageMaker.

# The arguments to this script are the image name and application name
image=$1
app=$2

chmod +x $app/train
chmod +x $app/serve

# Get the account number associated with the current IAM credentials
account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration
region=$(aws configure get region)
fullname="${account}.dkr.ecr.${region}.amazonaws.com/${image}:latest"


# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${image}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${image}" > /dev/null
fi


# Edit ECR policy permission rights
aws ecr set-repository-policy --repository-name "${image}" --policy-text ecr_policy.json

# Get the login command from ECR and execute it directly

In [10]:
%%writefile buildcv.sh
#!/usr/bin/env bash

# This script shows how to build the Docker image and push it to ECR to be ready for use
# by SageMaker.

# The arguments to this script are the image name and application name
image=$1
app=$2
dockerfile=$3

chmod +x $app/train
chmod +x $app/serve

# Get the account number associated with the current IAM credentials
account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration
region=$(aws configure get region)
fullname="${account}.dkr.ecr.${region}.amazonaws.com/${image}:latest"


# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${image}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${image}" > /dev/null
fi

# Edit ECR policy permission rights
aws ecr set-repository-policy --repository-name "${image}" --policy-text ecr_policy.json

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.
docker build  -t ${image} --build-arg APP=$app . -f ${dockerfile}
docker tag ${image} ${fullname}

Overwriting buildcv.sh


In [11]:
import os
# os.chdir("./container/") ## Change the working directory to `container`
os.getcwd()

!bash buildcv.sh gluoncv image_classification DockerfileCV


An error occurred (InvalidParameterException) when calling the SetRepositoryPolicy operation: Invalid parameter at 'PolicyText' failed to satisfy constraint: 'Invalid repository policy provided'
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Sending build context to Docker daemon  70.14kB
Step 1/11 : FROM ubuntu:18.04
 ---> 2eb2d388e1a2
Step 2/11 : MAINTAINER Amazon AI <sage-learner@amazon.com>
 ---> Running in c32d0e45596a
Removing intermediate container c32d0e45596a
 ---> d9814b9b207d
Step 3/11 : ARG APP=image_classification
 ---> Running in 6de31644436f
Removing intermediate container 6de31644436f
 ---> 8fdcfc5f909e
Step 4/11 : RUN apt-get -y update && apt-get install -y --no-install-recommends          wget          python3-dev          nginx          ca-certificates          libgomp1     && rm -rf /var/lib/apt/lists/*
 ---> Running in 708effa1fe36
Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Get:2 http

Get:33 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libx11-6 amd64 2:1.6.4-3ubuntu0.2 [569 kB]
Get:34 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 wget amd64 1.19.4-1ubuntu2.2 [316 kB]
Get:35 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 python3-lib2to3 all 3.6.9-1~18.04 [77.4 kB]
Get:36 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 python3-distutils all 3.6.9-1~18.04 [144 kB]
Get:37 http://archive.ubuntu.com/ubuntu bionic/main amd64 dh-python all 3.20180325ubuntu2 [89.2 kB]
Get:38 http://archive.ubuntu.com/ubuntu bionic/main amd64 fonts-dejavu-core all 2.37-1 [1041 kB]
Get:39 http://archive.ubuntu.com/ubuntu bionic/main amd64 fontconfig-config all 2.12.6-0ubuntu2 [55.8 kB]
Get:40 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libc-dev-bin amd64 2.27-3ubuntu1.2 [71.8 kB]
Get:41 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 linux-libc-dev amd64 4.15.0-112.113 [982 kB]
Get:42 http://archive.ubuntu.com/ubuntu b

Selecting previously unselected package libelf1:amd64.
Preparing to unpack .../03-libelf1_0.170-0.4ubuntu0.1_amd64.deb ...
Unpacking libelf1:amd64 (0.170-0.4ubuntu0.1) ...
Selecting previously unselected package libmnl0:amd64.
Preparing to unpack .../04-libmnl0_1.0.4-2_amd64.deb ...
Unpacking libmnl0:amd64 (1.0.4-2) ...
Selecting previously unselected package iproute2.
Preparing to unpack .../05-iproute2_4.15.0-2ubuntu1.2_amd64.deb ...
Unpacking iproute2 (4.15.0-2ubuntu1.2) ...
Selecting previously unselected package libbsd0:amd64.
Preparing to unpack .../06-libbsd0_0.8.7-1ubuntu0.1_amd64.deb ...
Unpacking libbsd0:amd64 (0.8.7-1ubuntu0.1) ...
Selecting previously unselected package libicu60:amd64.
Preparing to unpack .../07-libicu60_60.2-3ubuntu3.1_amd64.deb ...
Unpacking libicu60:amd64 (60.2-3ubuntu3.1) ...
Selecting previously unselected package libxml2:amd64.
Preparing to unpack .../08-libxml2_2.9.4+dfsg1-6.1ubuntu1.3_amd64.deb ...
Unpacking libxml2:amd64 (2.9.4+dfsg1-6.1ubuntu1.3) 

Setting up libgomp1:amd64 (8.4.0-1ubuntu1~18.04) ...
Setting up readline-common (7.0-3) ...
Setting up libicu60:amd64 (60.2-3ubuntu3.1) ...
Setting up mime-support (3.60ubuntu1) ...
Setting up libpng16-16:amd64 (1.6.34-1ubuntu0.18.04.2) ...
Setting up libjbig0:amd64 (2.1-3.1build1) ...
Setting up fonts-dejavu-core (2.37-1) ...
Setting up libreadline7:amd64 (7.0-3) ...
Setting up libpsl5:amd64 (0.19.1-5build1) ...
Setting up libelf1:amd64 (0.170-0.4ubuntu0.1) ...
Setting up nginx-common (1.14.0-0ubuntu1.7) ...
debconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (Can't locate Term/ReadLine.pm in @INC (you may need to install the Term::ReadLine module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.26.1 /usr/local/share/perl/5.26.1 /usr/lib/x86_64-linux-gnu/perl5/5.26 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/

[0mCollecting pip
  Downloading pip-20.2.2-py2.py3-none-any.whl (1.5 MB)
Collecting setuptools
  Downloading setuptools-49.6.0-py3-none-any.whl (803 kB)
Collecting wheel
  Downloading wheel-0.35.1-py2.py3-none-any.whl (33 kB)
Installing collected packages: pip, setuptools, wheel
Successfully installed pip-20.2.2 setuptools-49.6.0 wheel-0.35.1
Collecting numpy
  Downloading numpy-1.19.1-cp36-cp36m-manylinux2010_x86_64.whl (14.5 MB)
Collecting scipy
  Downloading scipy-1.5.2-cp36-cp36m-manylinux1_x86_64.whl (25.9 MB)
Collecting flask
  Downloading Flask-1.1.2-py2.py3-none-any.whl (94 kB)
Collecting gevent
  Downloading gevent-20.6.2-cp36-cp36m-manylinux2010_x86_64.whl (5.3 MB)
Collecting gunicorn
  Downloading gunicorn-20.0.4-py2.py3-none-any.whl (77 kB)
Collecting mxnet-mkl==1.6.0
  Downloading mxnet_mkl-1.6.0-py2.py3-none-manylinux1_x86_64.whl (76.7 MB)
Collecting gluoncv==0.7.0
  Downloading gluoncv-0.7.0-py2.py3-none-any.whl (752 kB)
Collecting click>=5.1
  Downloading click-7.1.2-p

### 2. Granting the ECR Repo Access

Since Amazon ECR repository policies are a subset of IAM policies that are scoped for, and specifically used for, controlling access to individual Amazon ECR repositories. IAM policies are generally used to apply permissions for the entire Amazon ECR service but can also be used to control access to specific resources as well. Amazon ECR requires that users have allow permissions to the ecr:GetAuthorizationToken API through an IAM policy before they can authenticate to a registry and push or pull any images from any Amazon ECR repository. More details: https://docs.aws.amazon.com/AmazonECR/latest/userguide/repository-policies.html

We can go to the [ECR Repo](https://console.aws.amazon.com/ecr/repositories/my-repo/permissions?region=us-east-1) to grant permission to access the repo with the permission like the following.

```{`json}
{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Sid": "All-Allow",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::383827541835:user/rlhu",
          "arn:aws:sts::383827541835:assumed-role/AmazonSageMaker-ExecutionRole-20200409T103675/SageMaker"
        ]
      },
      "Action": "*"
    }
  ]
}
```

In [13]:
import sagemaker
sagemaker.get_execution_role()

'arn:aws:iam::383827541835:role/service-role/AmazonSageMaker-ExecutionRole-20200409T103675'

In [None]:
%%writefile push.sh
#!/usr/bin/env bash
image=$1
account=$(aws sts get-caller-identity --query Account --output text)
region=$(aws configure get region)
region=${region:-us-west-2}
fullname="${account}.dkr.ecr.${region}.amazonaws.com/${image}:latest"
docker push ${fullname}

In [14]:
import os
# os.chdir("./container/") ## Change the working directory to `container`
os.getcwd()
!bash push.sh gluoncv image_classifiction

The push refers to repository [383827541835.dkr.ecr.us-east-1.amazonaws.com/gluoncv]

[1Bc26e47d4: Preparing 
[1B3bfc06a9: Preparing 
[1B1ba89785: Preparing 
[1Bb9384921: Preparing 
[1Bf9a74649: Preparing 
[1Bda143c91: Preparing 
[1B287e1f04: Preparing 
[7B3bfc06a9: Pushed   528.1MB/522.1MB[5A[2K[4A[2K[7A[2K[8A[2K[7A[2K[3A[2K[7A[2K[5A[2K[5A[2K[2A[2K[5A[2K[7A[2K[5A[2K[7A[2K[5A[2K[5A[2K[7A[2K[2A[2K[5A[2K[1A[2K[7A[2K[5A[2K[7A[2K[7A[2K[1A[2K[7A[2K[1A[2K[7A[2K[5A[2K[7A[2K[5A[2K[7A[2K[5A[2K[1A[2K[7A[2K[1A[2K[7A[2K[5A[2K[1A[2K[5A[2K[7A[2K[5A[2K[7A[2K[1A[2K[7A[2K[7A[2K[1A[2K[7A[2K[5A[2K[7A[2K[1A[2K[7A[2K[1A[2K[5A[2K[7A[2K[5A[2K[1A[2K[7A[2K[5A[2K[7A[2K[1A[2K[5A[2K[1A[2K[5A[2K[7A[2K[5A[2K[7A[2K[1A[2K[5A[2K[1A[2K[5A[2K[5A[2K[5A[2K[7A[2K[7A[2K[5A[2K[5A[2K[1A[2K[5A[2K[7A[2K[5A[2K[7A[2K[5A[2K[7A[2K[5A[2K[1A[2K[7A[2K[5A

### 3. Pushing the Docker image

Now with the access permission to the new ECR repo, let's push Docker image by calling the `push.sh` script. 

In [2]:
!bash push.sh my-repo image_classification

The push refers to repository [383827541835.dkr.ecr.us-east-1.amazonaws.com/my-repo]

[1A[2K57463602230f: Preparing [1B
[1A[2K95d9df2ee067: Preparing [1B
[1A[2Ke06bb0dfff24: Preparing [1B
[1A[2K90a688d4a924: Preparing [1B
[1A[2K8682f9a74649: Preparing [1B
[1A[2Kd3a6da143c91: Preparing [1B
[1A[2K83f4287e1f04: Preparing [1B
[1A[2K7ef368776582: Preparing [1B[2A[2K83f4287e1f04: Waiting [2B[1A[2K7ef368776582: Waiting [1B[3A[2Kd3a6da143c91: Waiting [3Bdenied: User: arn:aws:sts::383827541835:assumed-role/AmazonSageMaker-ExecutionRole-20200409T103675/SageMaker is not authorized to perform: ecr:InitiateLayerUpload on resource: arn:aws:ecr:us-east-1:383827541835:repository/my-repo


# Part 2: Training, Infernce and Deployment


## 1. Training

Once we have the container packaged, you can use it to train and serve models. Let's do that with the algorithm we made above.

### Setting up the Environment

Here we specify a bucket to use and the role that will be used for working with Amazon SageMaker.

In [15]:
os.chdir("../") ## Change the working directory back to main

from sagemaker import get_execution_role
role = get_execution_role()

### Creating the Session

The session remembers our connection parameters to Amazon SageMaker. We'll use it to perform all of our SageMaker operations.

In [16]:
import sagemaker as sage

sess = sage.Session()

### Defining the account, region and ECR address


In [26]:
account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name
ecr_name = "gluoncv"
ecr_image = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account, region, ecr_name)

### Uploading Training Data

We can upload the training data to the corresponding S3 bucket: https://s3.console.aws.amazon.com/s3/buckets/sagemaker-us-east-1-383827541835/sagemaker-deploy-gluoncv/data/?region=us-east-1

In [27]:
! ls ./data/minc-2500/train

brick  carpet  food  mirror  sky  water


In [19]:
s3_bucket = "sagemaker-deploy-gluoncv"
# model_path = "s3://{}/{}/model".format(sess.default_bucket(), s3_bucket)
# os.path.join(model_path, "model.tar.gz")
# model_prefix = s3_bucket + "/model"
train_data_local = "./data/minc-2500/train"
train_data_dir_prefix = s3_bucket + "/data/train"


# model_local_path = "model_output"
train_data_upload = sess.upload_data(path=train_data_local, 
#                                 bucket=s3_bucket, 
                                key_prefix=train_data_dir_prefix)


Check the training data at : 

In [20]:
print("https://s3.console.aws.amazon.com/s3/buckets/{}".format(train_data_upload.split("//")[1]))

https://s3.console.aws.amazon.com/s3/buckets/sagemaker-us-east-1-383827541835/sagemaker-deploy-gluoncv/data/train


### Creating an Estimator

In order to use Amazon SageMaker to fit our algorithm, we'll create an `Estimator` that defines how to use the container to train. This includes the configuration we need to invoke SageMaker training:

* The __role__ is defined as above.
* The __session__ is the SageMaker session object that we defined above.
* The __image name__ is the name of ECR image we created above.
* The __training instance type__ which is the type of machine to use for training.
* The __training instance count__ which is the number of machines to use for training.
* The __output path__ determines where the model artifact will be written.

In [21]:
from sagemaker.estimator import Estimator

train_dir = "data/minc-2500/train"
hyperparameters = {'epochs': 1, 
                   'model_name': 'resnet18_v1b'}
train_instance_type = 'ml.p2.xlarge' # 'ml.c4.2xlarge'  # 
s3_path = "s3://{}/{}/model".format(sess.default_bucket(), s3_bucket)
model_path = os.path.join(s3_path, "model.tar.gz")
print(model_path)

s3://sagemaker-us-east-1-383827541835/sagemaker-deploy-gluoncv/model/model.tar.gz


### Fitting the Estimator

Then we call the `fit()` function on the estimator to train against the data that we uploaded above.

In [28]:
classifier = Estimator(role=role, 
                       sagemaker_session=sess,
                       image_name=ecr_image, 
                       train_instance_count=1,
                       train_instance_type=train_instance_type,
                       hyperparameters=hyperparameters,
#                        checkpoint_local_path="model_output/", 
                       output_path=s3_path
                       )
# train_data_upload = model_upload
classifier.fit(train_data_upload)

Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.


2020-08-18 05:45:13 Starting - Starting the training job...
2020-08-18 05:45:17 Starting - Launching requested ML instances.........
2020-08-18 05:47:01 Starting - Preparing the instances for training......
2020-08-18 05:48:12 Downloading - Downloading input data.........
2020-08-18 05:49:24 Training - Downloading the training image...
2020-08-18 05:50:03 Training - Training image download completed. Training in progress.[34mStarting the training.[0m
[34mFilling weights from resnet18_v1b[0m
[34mDownloading /root/.mxnet/models/resnet18_v1b-2d9d980c.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1b-2d9d980c.zip...[0m
[34m#015  0%|          | 0/42432 [00:00<?, ?KB/s]#015  5%|4         | 2105/42432 [00:00<00:02, 17638.09KB/s]#015 16%|#6        | 6952/42432 [00:00<00:01, 21797.36KB/s]#015 30%|##9       | 12572/42432 [00:00<00:01, 26700.46KB/s]#015 43%|####2     | 18099/42432 [00:00<00:00, 31600.68KB/s]#015 56%|#####6    | 23846/42432 [00:00

## 2.Batch Inferencing

After our model has been trained, we simply use a demo image for testing our model. We first upload this image the S3 bucket and we can test the model after deplyment.

In [29]:
demo_dir = "data/demo"
test_image = "cat1.jpg"
sample_inference_input_prefix = s3_bucket + "/data/test"

demo_input = sess.upload_data(os.path.join(demo_dir, test_image), 
                                   key_prefix=sample_inference_input_prefix) 
print("Demo input uploaded to " + demo_input)

Demo input uploaded to s3://sagemaker-us-east-1-383827541835/sagemaker-deploy-gluoncv/data/test/cat1.jpg


##  3. Deploying the Model

Deploying the model to Amazon SageMaker hosting just requires a `deploy` call on the fitted model. This call takes an instance count, instance type, and optionally serializer and deserializer functions. These are used when the resulting predictor is created on the endpoint.

Note that deployment takes a little bit longer than all the previous steps.

In [30]:
# from sagemaker.predictor import csv_serializer

model = classifier.create_model()
predictor = classifier.deploy(1, 'ml.m4.xlarge')

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
Parameter image will be renamed to image_uri in SageMaker Python SDK v2.


-------------!

### Choose some data and use it for a prediction

In order to do some predictions, we'll use a demo jpeg image to test the model.

In [31]:
with open(os.path.join(demo_dir, test_image), 'rb') as f:
    x = f.read()
    print(predictor.predict(x, initial_args={'ContentType':'image/jpeg'}).decode('utf-8'))

[lynx], with probability 0.253.
[Egyptian cat], with probability 0.252.
[tiger cat], with probability 0.106.
[tabby], with probability 0.063.
[soft-coated wheaten terrier], with probability 0.041.



### Cleanup Endpoint

When you're done with the endpoint, you'll want to clean it up.

In [None]:
sess.delete_endpoint(predictor.endpoint)