### The Dockerfile

The Dockerfile describes the image that we want to build. 

In [1]:
!cat container/Dockerfile

# Build an image that can do training and inference in SageMaker
# This is a Python 2 image that uses the nginx, gunicorn, flask stack
# for serving inferences in a stable way.

FROM ubuntu:16.04

MAINTAINER Shiyu Fu <fushiyu9@gmail.com>


RUN apt-get -y update && apt-get install -y --no-install-recommends \
         wget \
         python \
         nginx \
         ca-certificates \
    && rm -rf /var/lib/apt/lists/*

# Here we get all python packages.
# There's substantial overlap between scipy and numpy that we eliminate by
# linking them together. Likewise, pip leaves the install caches populated which uses
# a significant amount of space. These optimizations save a fair amount of space in the
# image, which reduces start up time.
RUN wget https://bootstrap.pypa.io/get-pip.py && python get-pip.py && \
    pip install numpy==1.14.5 scipy scikit-learn pandas flask gevent gunicorn tensorflow keras boto3 gensim && \
        (cd /usr/local/lib/python2.7/dist-pack

### Building and registering the container

build the container image using `docker build` and push the container image to ECR using `docker push`.

In [56]:
%%sh

# The name of our algorithm
algorithm_name=text-classfication-test

cd container

chmod +x text_classification/train
chmod +x text_classification/serve

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-east-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.

aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build  -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

Login Succeeded
Sending build context to Docker daemon  271.1MB
Step 1/9 : FROM ubuntu:16.04
 ---> 4a689991aa24
Step 2/9 : MAINTAINER Shiyu Fu <fushiyu9@gmail.com>
 ---> Using cache
 ---> 7be7cd8893b5
Step 3/9 : RUN apt-get -y update && apt-get install -y --no-install-recommends          wget          python          nginx          ca-certificates     && rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> f190eadd87cc
Step 4/9 : RUN wget https://bootstrap.pypa.io/get-pip.py && python get-pip.py &&     pip install numpy==1.14.5 scipy scikit-learn pandas flask gevent gunicorn tensorflow keras boto3 gensim &&         (cd /usr/local/lib/python2.7/dist-packages/scipy/.libs; rm *; ln ../../numpy/.libs/* .) &&         rm -rf /root/.cache
 ---> Using cache
 ---> ba59f9efa089
Step 5/9 : ENV PYTHONUNBUFFERED=TRUE
 ---> Using cache
 ---> 1c1b85432add
Step 6/9 : ENV PYTHONDONTWRITEBYTECODE=TRUE
 ---> Using cache
 ---> 4adfcc7733f0
Step 7/9 : ENV PATH="/opt/program:${PATH}"
 ---> Using cache
 ---

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



## Set up the environment

In [3]:
# S3 prefix
prefix = 'text-classification'

# Define IAM role
import boto3
import re

import os
import numpy as np
import pandas as pd
from sagemaker import get_execution_role

role = get_execution_role()

## Create the session

In [4]:
import sagemaker as sage
from time import gmtime, strftime

sess = sage.Session()

## Upload the data for training

In [5]:
WORK_DIRECTORY = 'data'

data_location = sess.upload_data(WORK_DIRECTORY, key_prefix=prefix)

INFO:sagemaker:Created S3 bucket: sagemaker-us-east-1-265363646340


## Create an estimator and fit the model

In [None]:
account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name
image = '{}.dkr.ecr.{}.amazonaws.com/text-classfication-test:latest'.format(account, region)

classifier = sage.estimator.Estimator(image,
                       role, 1, 'ml.c4.2xlarge',
                       output_path="s3://{}/output".format(sess.default_bucket()),
                       sagemaker_session=sess)

classifier.fit(data_location)

INFO:sagemaker:Creating training-job with name: text-classfication-test-2018-10-26-21-02-19-206


2018-10-26 21:02:19 Starting - Starting the training job...
2018-10-26 21:02:22 Starting - Launching requested ML instances......
2018-10-26 21:03:25 Starting - Preparing the instances for training...
2018-10-26 21:04:17 Downloading - Downloading input data...
2018-10-26 21:04:25 Training - Downloading the training image.
[31mUsing TensorFlow backend.[0m
[31mStarting the training.[0m
[31m/opt/ml/output/dict.json[0m
  num_elements)[0m
[31mTrain on 49763 samples, validate on 12441 samples[0m
[31mEpoch 1/3[0m
[31m2018-10-26 21:05:00.885738: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA[0m

2018-10-26 21:04:51 Training - Training image download completed. Training in progress.[31m  256/49763 [..............................] - ETA: 24:15 - loss: 2.6068 - acc: 0.1055[0m
[31m  512/49763 [..............................] - ETA: 15:18 - loss: 2.6080 - acc: 0.1016[0m
[31m  768/497



[0m
[31mEpoch 00001: val_acc improved from -inf to 0.30367, saving model to /opt/ml/model/text_clas.hdf5[0m
[31mEpoch 2/3[0m
[31m  256/49763 [..............................] - ETA: 6:27 - loss: 1.7264 - acc: 0.3750[0m
[31m  512/49763 [..............................] - ETA: 6:27 - loss: 1.7387 - acc: 0.3594[0m
[31m  768/49763 [..............................] - ETA: 6:33 - loss: 1.7228 - acc: 0.3711[0m
[31m 1024/49763 [..............................] - ETA: 6:31 - loss: 1.7166 - acc: 0.3711[0m
[31m 1280/49763 [..............................] - ETA: 6:29 - loss: 1.7352 - acc: 0.3594[0m
[31m 1536/49763 [..............................] - ETA: 6:24 - loss: 1.7437 - acc: 0.3568[0m
[31m 1792/49763 [>.............................] - ETA: 6:21 - loss: 1.7338 - acc: 0.3616[0m
[31m 2048/49763 [>.............................] - ETA: 6:19 - loss: 1.7298 - acc: 0.3579[0m
[31m 2304/49763 [>.............................] - ETA: 6:16 - loss: 1.7421 - acc: 0.3589[0m
[31m 2560/4976



[0m
[31mEpoch 00002: val_acc improved from 0.30367 to 0.63982, saving model to /opt/ml/model/text_clas.hdf5[0m
[31mEpoch 3/3[0m
[31m  256/49763 [..............................] - ETA: 6:48 - loss: 1.2620 - acc: 0.6484
  512/49763 [..............................] - ETA: 6:44 - loss: 1.2488 - acc: 0.6465
  768/49763 [..............................] - ETA: 6:37 - loss: 1.2498 - acc: 0.6536[0m
[31m 1024/49763 [..............................] - ETA: 6:31 - loss: 1.2937 - acc: 0.6338[0m
[31m 1280/49763 [..............................] - ETA: 6:28 - loss: 1.2881 - acc: 0.6359[0m
[31m 1536/49763 [..............................] - ETA: 6:24 - loss: 1.2922 - acc: 0.6335[0m
[31m 1792/49763 [>.............................] - ETA: 6:21 - loss: 1.2871 - acc: 0.6345[0m
[31m 2048/49763 [>.............................] - ETA: 6:18 - loss: 1.3001 - acc: 0.6294[0m
[31m 2304/49763 [>.............................] - ETA: 6:15 - loss: 1.2912 - acc: 0.6302[0m
[31m 2560/49763 [>...........

[31m 9984/49763 [=====>........................] - ETA: 5:13 - loss: 1.2792 - acc: 0.6425[0m
[31m10240/49763 [=====>........................] - ETA: 5:11 - loss: 1.2783 - acc: 0.6423[0m
[31m10496/49763 [=====>........................] - ETA: 5:09 - loss: 1.2771 - acc: 0.6433[0m
[31m10752/49763 [=====>........................] - ETA: 5:07 - loss: 1.2739 - acc: 0.6436[0m
[31m11008/49763 [=====>........................] - ETA: 5:05 - loss: 1.2718 - acc: 0.6442[0m
[31m11264/49763 [=====>........................] - ETA: 5:03 - loss: 1.2692 - acc: 0.6444[0m
[31m11520/49763 [=====>........................] - ETA: 5:01 - loss: 1.2678 - acc: 0.6443[0m


[0m
[31mEpoch 00003: val_acc improved from 0.63982 to 0.68274, saving model to /opt/ml/model/text_clas.hdf5[0m
[31mTruth:  ['BILL', 'POLICY CHANGE', 'BILL', 'POLICY CHANGE', 'NON-RENEWAL NOTICE'][0m
[31mPrediction:  ['BILL', 'POLICY CHANGE', 'BILL', 'POLICY CHANGE', 'CANCELLATION NOTICE'][0m
[31mTraining complete.[0m

2018-10-26 21:26:09 Uploading - Uploading generated training model
2018-10-26 21:27:55 Completed - Training job completed
Billable seconds: 1418


## Deploy the model

Deploying the model to SageMaker hosting just requires a `deploy` call on the fitted model. This call takes an instance count, instance type, and optionally serializer and deserializer functions. These are used when the resulting predictor is created on the endpoint.

In [None]:
from sagemaker.predictor import csv_serializer
predictor = classifier.deploy(1, 'ml.m4.xlarge', serializer=csv_serializer)

INFO:sagemaker:Creating model with name: text-classfication-test-2018-10-26-21-28-19-282
INFO:sagemaker:Creating endpoint with name text-classfication-test-2018-10-26-21-02-19-206


---------------------------------------------------------------!

## Choose some data to do prediction

In [50]:
# shape=pd.read_csv("container/local_test/payload.csv", header=None)

# # shape=pd.read_csv("data/full-set.csv", header=None)
# # import itertools
# # import random

# # r = random.randint(0, 62000)
# # a = [r]
# # b = [40+i for i in range(6)]
# # indices = [i+j for i,j in itertools.product(a,b)]

# # test_data=shape.iloc[indices[:-1]]
# # test_X=test_data.iloc[:,1:]
# # test_y=test_data.iloc[:,0]

# test_X = shape.iloc[0]
# print(type(test_X))
# print(test_X.shape)

import csv
with open("container/local_test/payload.csv") as csvfile:
    content = list(csv.reader(csvfile, delimiter=','))
document = content[4][0]
document = document.split()
test_X = np.array(document)

In [53]:
# print(predictor.predict(test_X.values).decode('utf-8'))
print(predictor.predict(test_X).decode('utf-8'))

['CANCELLATION NOTICE']CANCELLATION NOTICE



## Optional cleanup

In [None]:
sess.delete_endpoint(predictor.endpoint)