# Sect 43: Operationalizing Code & AWS

- online-ds-ft-100719
- 04/07/20

## Resources:

- [Udemy Course: Deployment of Machine Learning Models](https://www.udemy.com/share/101Y5KAEYbdVdWRXQ=/)
- [Amazon Web Services](https://aws.amazon.com/)
    - [Getting Started Resource Center](https://aws.amazon.com/getting-started/)


## Productionizing Models as a Career Skill


1. Many data scientists don't know how to put machine learning models into production.  
2. Putting a model into production is a mandatory skill for data scientists at most small to medium-sized companies.
3. Being able to productionize models will make you a much more attractive candidate to employers, and give you a competitive advantage!

<img src="https://raw.githubusercontent.com/learn-co-students/dsc-data-science-and-machine-learning-engineering-online-ds-ft-100719/master/images/new-venn-diagram.png">

-  A decade ago, productionizing a machine learning model would have meant building your own web server with something like [Flask](http://flask.pocoo.org/) or [Django](https://www.djangoproject.com/) and hosting somewhere, just like you would with any web app. 
- Now, we don't even need to worry about things like server code -- instead, we can use preexisting services from AWS that were created specifically to simplify the process of productionizing machine learning solutions!

# AWS Eco System


<img src="https://raw.githubusercontent.com/jirvingphd/fsds_100719_cohort_notes/master/images/awscloud.png">

> * AWS is a **_Cloud-Computing Platform_** which we can use for a variety of use cases in data science.


- **To Manage your account use the AWS Console**
    - http://console.aws.amazon.com/
    - Sign into console using Root User
    
- **Resources**
    - https://aws.amazon.com/getting-started/

- **AWS Components:**
    - S3: storage

## Sign up for AWS
- [Follow Learn lesson steps to set up account](https://learn.co/tracks/data-science-career-v2/module-6-natural-language-processing-and-deep-learning/section-50-operationalizing-code-and-aws/the-aws-ecosystem)

- [Amazon Web Services](https://aws.amazon.com/)

AWS has data centers all over the world, and they are **not** interchangeable when it comes to your projects. Click on the "Region" tab in the top right corner of the navigation bar, and you should see a dropdown of all the different data centers you can choose from. It is **_very important_** that you always choose the same region to connect to with your projects.

- Create an AWS account
- Sign into console using Root User

# Amazon SageMaker

> ***SageMaker is a platform created by Amazon to centralize all the various services related to Data Science and Machine Learning. If you're a data scientist working on AWS, chances are that you'll be spending most (if not all) of your time in SageMaker getting things done.***


> * Amazon has centralized all of the major data science services inside **_Amazon SageMaker_**. SageMaker provides numerous services for things such as:
    * Data Labeling
    * Cloud-based Notebooks
    * Training and Model Tuning
    * Inference
    
#### SageMaker Components
<img src="https://raw.githubusercontent.com/learn-co-students/dsc-introduction-to-aws-sagemaker-online-ds-ft-100719/master/images/use_cases.png">


    


# Productionizing Models with SageMaker

- **SAVE THESE 2 RESOURCES:**
    - [Learn Lesson: Productionizing Models with SageMaker](https://learn.co/tracks/data-science-career-v2/module-6-natural-language-processing-and-deep-learning/section-50-operationalizing-code-and-aws/productionizing-models-with-sagemaker)
        - [Lesson Repo](https://github.com/learn-co-students/dsc-productionizing-models-with-sagemaker-online-ds-ft-100719)

    - [Official SageMaker Tutorial](https://github.com/aws-samples/amazon-sagemaker-keras-text-classification)


## Overview of Process

When productionizing a machine learning model using AWS, you'll typically use the following workflow:

1. Explore and preprocess data
2. Build SageMaker container (Docker)
3. Test training and inference code on your local machine 
4. Train and deploy model with SageMaker

### Steps for Deploying A Model (from lesson)


#### 0. Complete AWS training notebooks

- https://github.com/aws-samples/amazon-sagemaker-keras-text-classification
- Do labs 1,2,& 3
    - These cover all of the set up required
    
    

#### 1. Build and Register the container

- In Learn lesson repo, there is a `container` folder that has Docker images

- The Code Below uses the container folder contens to create and register the docker image needed on AWS.
- It is best to "upload this notebook into your notebook to your AWS Jupyter environment



"After you have successfully uploaded this notebook, if you are asked to choose a kernel, use the same kernel which runs the `sagemaker_keras_text_classification.ipynb` notebook. 


> NOTE: If you deactivated this process (stopped your Jupyter instance, like _Step 8_ below) and then came back to continue, you'll need to go back and start from [Lab 2](https://github.com/aws-samples/amazon-sagemaker-keras-text-classification#lab-2-building-the-sagemaker-tensorflow-container), because you need the Docker instance running in order to run the following cells. "

#####  Code for Building/Registering containers

```ython
%%sh

# The name of our algorithm
algorithm_name=sagemaker-keras-text-classification

cd container

chmod +x sagemaker_keras_text_classification/train
chmod +x sagemaker_keras_text_classification/serve

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-west-2}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.

aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

# On a SageMaker Notebook Instance, the docker daemon may need to be restarted in order
# to detect your network configuration correctly.  (This is a known issue.)
if [ -d "/home/ec2-user/SageMaker" ]; then
  sudo service docker restart
fi

docker build  -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}
```

#### 2. Setting up the environment


Once we've created the container, we'll need to set up the environment. The cell below contains more boilerplate code, which is used to handle a couple sticking points in order to set up the environment. 


```python 
# S3 prefix
prefix = 'sagemaker-keras-text-classification'

# Define IAM role
import boto3
import re

import os
import numpy as np
import pandas as pd
from sagemaker import get_execution_role

role = get_execution_role()
```

#### 3. Creating the Session

Now that we've created the container and set up our environment, the next step is to create a SageMaker session. 
```python
import sagemaker as sage
from time import gmtime, strftime

sess = sage.Session()
```

#### 4. Upload the Data for Training

Steps 4 and 5 are where you'll add the code unique to your project.In this step, make sure have a folder called `'data'` that contains the data you'll be working with. The actual structure of the data is up to you, as you'll be the one consuming it to train your model in step 5. 

```python 
WORK_DIRECTORY = 'data'

data_location = sess.upload_data(WORK_DIRECTORY, key_prefix=prefix)

```

#### 5. Fitting the Model


- This is the part where you'll do the brunt of the work. 
- You'll train your own model on the data you uploaded in the previous step. 

- Note that in the sample code below, the first 3 lines are boilerplate code. 
    
- The actual creation and training of the model happen on the last two lines of code, where `tree` is instantiated and used. 

> **_NOTE_**: You may have noticed that the code in the cell below uses an `Estimator` from `sage` (which is just an alias we set for `sagemaker` up above), the SageMaker library for python, rather than a model from scikit-learn. The `sagemaker` library contains a massive amount of useful models that we can use directly. Under the hood, the `sagemaker` library wraps in the same open-source frameworks such as scikit-learn, Keras, and TensorFlow that you're used to using. The code below is an example from AWS of how to use one of their `Estimator` objects for training. If you read the output of the cell when you run everything, you'll notice that much of it is warning messages or other printouts from sklearn and keras!

For more information on the models and other tools included in the aws sagemaker library, check out [Amazon SageMaker Python SDK Documentation](https://sagemaker.readthedocs.io/en/stable/)!

```python 
account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name
image = '{}.dkr.ecr.{}.amazonaws.com/sagemaker-keras-text-classification'.format(account, region)

tree = sage.estimator.Estimator(image,
                       role, 1, 'ml.c5.2xlarge',
                       output_path="s3://{}/output".format(sess.default_bucket()),
                       sagemaker_session=sess)

tree.fit(data_location)
```

#### 6. Deploying the Model


This is where the magic happens -- we have a trained model, and now we need to actually **_deploy_** it to the AWS cloud! Notice how during this step, we include a `json_serializer` -- this is so that the model can serialize and deserialize data as needed when taking data in as input. 

Running the cell below will create an endpoint for your trained model. 

```python
from sagemaker.predictor import json_serializer
predictor = tree.deploy(1, 'ml.t2.medium', serializer=json_serializer)
```

#### 7. Cleanup (IMPORTANT!!)


As a final step for this exercise, **be sure to run the following line of code to delete your endpoint!** Although you are running this lab on the free tier, you don't want to leave it running, because that is how costs can accrue. Run the cell below to delete your endpoint.

```python 
sess.delete_endpoint(predictor.endpoint)
```

#### 8. Deactivate Everything in AWS


In AWS, you pay for usage. **This means that anything left running is being used.  While the AWS Free Tier we've signed up for allows us to do small things for free for prototyping or learning, leaving some things running may take us past the usage limits for the AWS Free Tier.** In order to avoid getting charged, you'll need to do the following steps: 


### 8.1: Deactivate the notebook in Sagemaker

First, you'll need to deactivate your notebook in SageMaker. When you enter the SageMaker platform, you'll always see the number of open notebooks you have up and running highlighted in green under the 'Recent Activity' section. 

<img src='https://raw.githubusercontent.com/jirvingphd/dsc-productionizing-models-with-sagemaker-online-ds-ft-100719/master/images/create-notebook-7.png'>


To deactivate a running notebook, select it and then go to the 'Actions' tab and select stop. Stopping the notebook instance will take a minute or two. You'll know it's done when you see the 'Status' column for the highlighted notebook change from 'InService' to 'Stopped'. 

<img src='https://raw.githubusercontent.com/jirvingphd/dsc-productionizing-models-with-sagemaker-online-ds-ft-100719/master/images/create-notebook-8.png'>

### 8.2: Keep an Eye on Cost Explorer

As you've seen from this lab, getting a handle on all the different services in AWS and how they interact with one another can be a bit daunting until you have some experience. It's very important that you don't leave services running when you aren't using them, because you will be charged for that. If you want to make sure that you haven't left anything running, the easiest thing to do is to check the 'Costs Explorer' page inside AWS. You can find this by searching for 'AWS Cost Explorer' in the search bar on the main page for the AWS Console. This service will show you what your usage is for everything that you can be charged for. It's quite intuitive and easy to use, and should make it easy to see if you are accruing charges because you left something running that you didn't realize. If you left something running that you aren't aware of, you'll see it here -- once you've noticed it, just navigate to the service in question and deactivate it. 