# Text Classification Using AWS Deep Learning Docker Containers on Singularity

A modified version of this AWS SageMaker lab guide: https://github.com/lbnl-science-it/aws-sagemaker-keras-text-classification

## Building the Singularity container using available aws deep learning docker containers Images
https://aws.amazon.com/releasenotes/available-deep-learning-containers-images/

The following shell code shows how to build the container image using `docker` and convert the container image to a `Singularity` image. 

### Download and unzip the dataset

In [None]:
%%sh
cd container

####################################################
########## Download and unzip the dataset ##########
####################################################
cd ../data/
wget https://danilop.s3-eu-west-1.amazonaws.com/reInvent-Workshop-Data-Backup.zip && unzip reInvent-Workshop-Data-Backup.zip
mv reInvent-Workshop-Data-Backup/* ./
rm -rf reInvent-Workshop-Data-Backup reInvent-Workshop-Data-Backup.zip
cd ../container/

### Build the SageMaker Container & Convert it to Singularity image

In [10]:
%%sh
cd container

###################################################################################
######### Build the SageMaker Container & Convert it to Singularity image #########
###################################################################################
algorithm_name=sagemaker-keras-text-classification

chmod +x sagemaker_keras_text_classification/train
chmod +x sagemaker_keras_text_classification/serve

# Get the region defined in the current configuration
region=$(aws configure get region)
fullname="local_${algorithm_name}:latest"

# Get the login command from ECR and execute it directly
$(aws ecr get-login --no-include-email --region ${region} --registry-ids 763104351884)

# Build the docker image locally with the image name
# In the "Dockerfile", modify the source image to select one of the available deep learning docker containers images:
# https://aws.amazon.com/releasenotes/available-deep-learning-containers-images
docker build  -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

# Build Singularity image from local docker image
sifname="local_sagemaker-keras-text-classification.sif"
sudo singularity build ${sifname} docker-daemon:${fullname}

Login Succeeded
Sending build context to Docker daemon  456.3MB
Step 1/9 : FROM 763104351884.dkr.ecr.us-east-2.amazonaws.com/tensorflow-training:1.14.0-cpu-py36-ubuntu16.04
 ---> e6a210ff54e4
Step 2/9 : RUN apt-get update &&     apt-get install -y nginx imagemagick graphviz
 ---> Using cache
 ---> 32ff2dce1af3
Step 3/9 : RUN pip install --upgrade pip
 ---> Using cache
 ---> 4e1b65ea3a65
Step 4/9 : RUN pip install gevent gunicorn flask tensorflow_hub seqeval graphviz nltk spacy tqdm
 ---> Using cache
 ---> d97c22f6de86
Step 5/9 : RUN python -m spacy download en_core_web_sm
 ---> Using cache
 ---> 14c8854a1901
Step 6/9 : RUN python -m spacy download en
 ---> Using cache
 ---> 185661d9e15d
Step 7/9 : ENV PATH="/opt/program:${PATH}"
 ---> Using cache
 ---> b5d5c6867074
Step 8/9 : COPY sagemaker_keras_text_classification /opt/program
 ---> Using cache
 ---> ac73b50bd646
Step 9/9 : WORKDIR /opt/program
 ---> Using cache
 ---> c5fe52a83024
Successfully built c5fe52a83024
Successfully tagged s

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

[34mINFO:   [0m Starting build...
Getting image source signatures
Copying blob sha256:f749b9b0fb213e9897417a985aaa9753d41bff474e1d0c0d1d266c4512eaf031
Copying blob sha256:2558e637fbff95178cb4b43e0ca5f20a04ddeaf9673053bfa4dc10c72833d15a
Copying blob sha256:aeda103e78c90b573700d64f6660efda378b59fe3e636ebfa28a0a105e2e2168
Copying blob sha256:e79142719515e5304607fdd9adeb31db96b7acf00cabadac2678b056ed83bca6
Copying blob sha256:9d2fda619715fb1f04019aab97191889de8648aaffde53347801b48bfbc8619e
Copying blob sha256:7083756ef61fef3e835676de491bd26271ffd4812c2cc54f83336176e2c9745e
Copying blob sha256:8722c9641a57d3bd9e09e3a3bd09d44354775543a363bc2b8d9c80ea23d583f5
Copying blob sha256:d456742927ee9aab70e2c1ed4a27c56cc58dde0104de93a514e81dbfbd256908
Copying blob sha256:0cf88c3675cd386365aea9dd2f9a70d7e481ff67534f3d510cc00d5e32bb615b
Copying blob sha256:11fc4467b8a36b9752c60038f3dba5e25035c1642c7d63a1406fab6d8c860936
Cop

### Training Text Classifier

In [12]:
%%sh
cd container

################################
########## Local Test ########## 
################################
cd ../data
cp -a . ../container/local_test/test_dir/input/data/training/
cd ../container
cd local_test

### Train
sifname="local_sagemaker-keras-text-classification.sif"
./train_local.sh ../${sifname}

Starting the training.
                                               TITLE  ...      TIMESTAMP
1  Fed official says weak data caused by weather,...  ...  1394470370698
2  Fed's Charles Plosser sees high bar for change...  ...  1394470371207
3  US open: Stocks fall after Fed official hints ...  ...  1394470371550
4  Fed risks falling 'behind the curve', Charles ...  ...  1394470371793
5  Fed's Plosser: Nasty Weather Has Curbed Job Gr...  ...  1394470372027

[5 rows x 7 columns]
Found 65990 unique tokens.
Shape of data tensor: (422417, 100)
Shape of label tensor: (422417, 4)
x_train shape:  (337933, 100)
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 100, 100)          1000000   
_________________________________________________________________
flatten (Flatten)            (None, 10000)             0         
_______________________________________

rm: cannot remove 'test_dir/output/*': No such file or directory
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2020-06-08 22:24:19.850096: I tensorflow/core/platform/cpu_feature_guard.cc:142] 