## Heart failure is a common event caused by Cardiovascular Disease. The 12 variables can be used to predict mortality.
### This is an example for bringing your own algorithm
This notebook shows how to deploy your own container, this will show sklearn example, however do not deploy this way a sklearn model, use instead already containerized framework.

In [12]:
import pandas as pd
import sagemaker as sage
import boto3
import os

sess = sage.Session()
role = sage.get_execution_role()

In [5]:
df = pd.read_csv("heart_failure.csv")
df.head()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1


In [6]:
df.isnull().sum()

age                         0
anaemia                     0
creatinine_phosphokinase    0
diabetes                    0
ejection_fraction           0
high_blood_pressure         0
platelets                   0
serum_creatinine            0
serum_sodium                0
sex                         0
smoking                     0
time                        0
DEATH_EVENT                 0
dtype: int64

In [8]:
df.groupby("DEATH_EVENT")["age"].count()

DEATH_EVENT
0    203
1     96
Name: age, dtype: int64

##### Since I made no changes I do not have to export the dataframe to a CSV file

In [7]:
#Use default bucket
bucket = sess.default_bucket()

In [13]:
#upload data to input/data/training
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join('tryouts/input/data/training', 'train.csv')).upload_file('heart_failure.csv')

# Containers
"It is like a virtual machine but lighter!"<br>
Software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. Available for both Linux and Windows-based applications, containerized software will always run the same, regardless of the infraestructure. Is basically, packaging software into standarized units for development, shipment and deployment.<br>
Containers are an abstraction at the app layer that packages code and dependencies together. Multiple containers can run on the same machine and share the OS kernel with other containers, each running as isolated processes in user space. Containers take up less space than VMs (container images are typically tens of MBs in size), can handle more applications and require fewer VMs and Operating systems:
https://www.docker.com/resources/what-container.
<br>
Is more powerful than a virtual enviroment since it is completely independent from language.<br>
Common **Docker commands** are:
* docker pull
* docker build
* docker images
* docker ps
* docker rmi/rm
* docker run

If you open dockerfile you will see installation of flask, that is a web framework that handles requests. Gunicorn that is a HTTP server for Unix and Nginx that handles HTTP requests that manage the input/output of the container efficently.

### To test locally:
#### I name the image pure-genius, feel free to name it as you want, just make sure to remember it.
On the folder where dockerfile is run the following command:
**docker build -t pure-genius .**<br>
Once done, go to local_test folder, inside test_dir, go to **input/data/training** and store training dataset. After that run **train_local.sh** and **serve_local.sh**. Place your test file where your predict.sh for easy and run bash predict.sh name_of_file.csv text/csv. In my case I called payload.csv my file. Then **./predict.sh payload.csv text/csv**

### Use build_push.sh to build and push image to ECR
ECR stands for Elastic Container Registry, is basically for storing docker containers. It aims to make easier for developers to store, manage and implement docker images.**ECR** is fully integrated with **ECS**. Run either ./build_push.sh or bash build_push.sh, both followed by the name you want to give to the image, this case I will call it **pure-genius**

Inside Jupyter Labs you will find the **terminal service**, go inside your folder where storing all the info. Type command: **cd SageMaker** (remember you can take a look at all files inside your current directory with ls). And go inside the folder containing the Dockerfile.

In [16]:
account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name
image = '{}.dkr.ecr.{}.amazonaws.com/pure-genius:latest'.format(account, region)

#https://aws.amazon.com/sagemaker/pricing/
#I will use default bucket to store input, model, output
svm = sage.estimator.Estimator(image,
                       role, 1, 'ml.m5.large',
                       output_path="s3://{}/tryouts/model".format(sess.default_bucket()),
                       sagemaker_session=sess)

svm.fit("s3://" + bucket + "/tryouts/input/data/training/train.csv")

Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.


2020-09-09 01:57:55 Starting - Starting the training job...
2020-09-09 01:57:57 Starting - Launching requested ML instances.........
2020-09-09 01:59:33 Starting - Preparing the instances for training...
2020-09-09 02:00:22 Downloading - Downloading input data
2020-09-09 02:00:22 Training - Downloading the training image.....
2020-09-09 02:01:16 Uploading - Uploading generated training model
2020-09-09 02:01:16 Completed - Training job completed
Training seconds: 68
Billable seconds: 68


### Will do a batch transform

In [19]:
output_folder = "tryouts/output"
output_path="s3://{}/{}".format(sess.default_bucket(), output_folder)

transformer = svm.transformer(instance_count=1,
                               instance_type='ml.m5.large',
                               output_path=output_path,
                               assemble_with='Line',
                               accept='text/csv')

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.


#### I will manually upload the test data from the test folder found here, call payload

In [20]:
data_test = "s3://{}/{}".format(sess.default_bucket(), "tryouts/input/data/payload.csv")

's3://sagemaker-us-east-1-563718358426/tryouts/input/data/payload.csv'

In [21]:
transformer.transform(data_test, content_type='text/csv', split_type='Line')
transformer.wait()

........................
[32m2020-09-09T02:19:59.975:[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=6, BatchStrategy=MULTI_RECORD[0m
[34mStarting the inference server with 2 workers.[0m
[34m2020/09/09 02:19:58 [crit] 10#10: *1 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or directory) while connecting to upstream, client: 169.254.255.130, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping", host: "169.254.255.131:8080"[0m
[34m169.254.255.130 - - [09/Sep/2020:02:19:58 +0000] "GET /ping HTTP/1.1" 502 182 "-" "Go-http-client/1.1"[0m
[34m[2020-09-09 02:19:59 +0000] [9] [INFO] Starting gunicorn 19.10.0[0m
[34m2020/09/09 02:19:59 [error] 10#10: *3 connect() to unix:/tmp/gunicorn.sock failed (111: Connection refused) while connecting to upstream, client: 169.254.255.130, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping", host: "169.254.255.131:8080"[0m
[34m169.254.255.130 - 