### Setup

First, we need to install the relevant libaries required for this local training usecase. You will need to pip install boto3, pandas and Sagemaker into your Python environment. For this lab, running Python 3.9 as your kernel is recommended. Please see the requirements.txt for what's needed.


In [1]:
import tarfile
import boto3
import pandas as pd
import os
from sagemaker.estimator import Estimator
from sagemaker.local import LocalSession
from sagemaker.predictor import csv_serializer

In [2]:
from __future__ import print_function

import json
import os
import pickle
import sys
import traceback

import pandas as pd
from causalnex.discretiser import Discretiser
import warnings
from causalnex.structure import StructureModel
from sklearn.model_selection import train_test_split
from causalnex.network import BayesianNetwork
from causalnex.evaluation import classification_report
from causalnex.evaluation import roc_auc

  from .autonotebook import tqdm as notebook_tqdm


The next step is to create a SageMaker Local session. Please insert the ARN for Sagemeker execution role below. You can create an IAM role in your AWS account and ensure it has permissions to SageMaker, S3 and ECR.


In [3]:
sagemaker_session = LocalSession()
sagemaker_session.config = {'local': {'local_code': True}}

role = 'arn:aws:iam::403775705461:role/SageMaker-IAM-Role-AB3'

Define the data location, which is apart of this repository.


In [4]:
data_location = "./data/heart_failure_clinical_records_dataset.csv"

### Data Pre-processing

In the cell, we are doing data pre-processing to make our dataset ML friendly. This code is using a Discretiser class from the causalnex library to transform a continuous feature into a discrete one. 

Taking age as an example, we are using an numeric_split_points=[60], which means it will split the data into two bins: below 60 and above 60. Similar approaches are used on other columns.

In [5]:
from causalnex.discretiser import Discretiser
import pandas as pd

initial_df = pd.read_csv(data_location)

initial_df["age"] = Discretiser(method="fixed", numeric_split_points=[60]).transform(
    initial_df["age"].values
)
initial_df["serum_sodium"] = Discretiser(method="fixed", numeric_split_points=[136]).transform(
    initial_df["serum_sodium"].values
)
initial_df["serum_creatinine"] = Discretiser(
    method="fixed", numeric_split_points=[1.1, 1.4]
).transform(initial_df["serum_sodium"].values)

initial_df["ejection_fraction"] = Discretiser(
    method="fixed", numeric_split_points=[30, 38, 42]
).transform(initial_df["ejection_fraction"].values)

initial_df["creatinine_phosphokinase"] = Discretiser(
    method="fixed", numeric_split_points=[120, 540, 670]
).transform(initial_df["creatinine_phosphokinase"].values)

initial_df["platelets"] = Discretiser(method="fixed", numeric_split_points=[263358]).transform(
    initial_df["platelets"].values
)

print ("Dataset after pre-processing")
initial_df.head()

Dataset after pre-processing


Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,1,0,2,0,0,1,1,0,0,1,0,4,1
1,0,0,3,0,2,0,1,0,1,1,0,6,1
2,1,0,1,0,0,0,0,0,0,1,1,7,1
3,0,1,0,0,0,0,0,0,1,1,0,7,1
4,1,1,1,1,0,0,1,0,0,0,0,8,1


### Training

In the next cell, we will be training the model on our dataset from above. You should see training information when the training is complete.

In [6]:
sm = StructureModel()
sm.add_edges_from([
    ('ejection_fraction', 'DEATH_EVENT'),
    ('creatinine_phosphokinase', 'DEATH_EVENT'),
    ('age','DEATH_EVENT'),
    ('smoking','high_blood_pressure'),
    ('age','high_blood_pressure'),            
    ('serum_sodium','DEATH_EVENT'),
    ('high_blood_pressure','DEATH_EVENT'),
    ('anaemia','DEATH_EVENT'),
    ('creatinine_phosphokinase','DEATH_EVENT'),
    ('smoking','DEATH_EVENT')
])

train, test = train_test_split(initial_df, train_size=0.8, test_size=0.2, random_state=42)
        
bn = BayesianNetwork(sm)
bn = bn.fit_node_states(initial_df)
bn = bn.fit_cpds(train, method="BayesianEstimator", bayes_prior="K2")

roc, auc = roc_auc(bn, test, "DEATH_EVENT")
print("Model AUC: " + str(auc))

print(classification_report(bn, test, "DEATH_EVENT"))

# save the model
model_path = "models"
isExist = os.path.exists(model_path)
if not isExist:
   os.makedirs(model_path)
with open(os.path.join(model_path, 'causal_model.pkl'), 'wb') as out:
    pickle.dump(bn, out)

Model AUC: 0.7368055555555555
{'DEATH_EVENT_0': {'precision': 0.6122448979591837, 'recall': 0.8571428571428571, 'f1-score': 0.7142857142857143, 'support': 35.0}, 'DEATH_EVENT_1': {'precision': 0.5454545454545454, 'recall': 0.24, 'f1-score': 0.3333333333333333, 'support': 25.0}, 'accuracy': 0.6, 'macro avg': {'precision': 0.5788497217068646, 'recall': 0.5485714285714285, 'f1-score': 0.5238095238095238, 'support': 60.0}, 'weighted avg': {'precision': 0.5844155844155844, 'recall': 0.6, 'f1-score': 0.5555555555555555, 'support': 60.0}}


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col] = df[col].map(self._node_states[col])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col] = df[col].map(self._node_states[col])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col] = df[col].map(self._node_states[col])
A value is trying to be set on a copy of a slice from a DataFrame.


### Deploy the Docker Container Locally

For this step, we need to create a docker image in our local environment. Please ensure you have 'sagemaker-causalnex-local' in your Docker Images. 

1. Please ensure Docker is running locally
2. Please run 'docker build -t sagemaker-causal-nex:latest .' in the 'sagemaker-local-to-cloud/local/container' path
3. From the previous steps, please insert the IAM role from your AWS account with permissions to access SageMaker, S3 and ECR and insert the ARN in the 'role_value' section

In [7]:
from sagemaker.local import LocalSession

image = 'sagemaker-causal-nex'

env={
    "MODEL_SERVER_WORKERS":"2"
    }

local_regressor = Estimator(
    image,
    role = 'arn:aws:iam::403775705461:role/SageMaker-IAM-Role-AB3',
    train_instance_count=1,
    train_instance_type="local")

train_location = 'file://'+data_location

local_regressor.fit(train_location, logs=True)



 Network sagemaker-local  Creating
 Network sagemaker-local  Created
 Container tmp7mz9wu3x-algo-1-eoff0-1  Creating
 Container tmp7mz9wu3x-algo-1-eoff0-1  Created
Attaching to tmp7mz9wu3x-algo-1-eoff0-1
tmp7mz9wu3x-algo-1-eoff0-1  | Starting the training.
tmp7mz9wu3x-algo-1-eoff0-1  | Model AUC: 0.7368055555555555
tmp7mz9wu3x-algo-1-eoff0-1  | {'DEATH_EVENT_0': {'precision': 0.6122448979591837, 'recall': 0.8571428571428571, 'f1-score': 0.7142857142857143, 'support': 35.0}, 'DEATH_EVENT_1': {'precision': 0.5454545454545454, 'recall': 0.24, 'f1-score': 0.3333333333333333, 'support': 25.0}, 'accuracy': 0.6, 'macro avg': {'precision': 0.5788497217068646, 'recall': 0.5485714285714285, 'f1-score': 0.5238095238095238, 'support': 60.0}, 'weighted avg': {'precision': 0.5844155844155844, 'recall': 0.6, 'f1-score': 0.5555555555555555, 'support': 60.0}}
tmp7mz9wu3x-algo-1-eoff0-1  | Training complete.
tmp7mz9wu3x-algo-1-eoff0-1 exited with code 0
Aborting on container exit...
 Container tmp7mz9wu

Now we can launch this container and run it!

In [8]:
predictor = local_regressor.deploy(1, 'local', env=env)



Attaching to tmpgn8jwpla-algo-1-slbba-1
tmpgn8jwpla-algo-1-slbba-1  | Starting the inference server with 2 workers.
tmpgn8jwpla-algo-1-slbba-1  | [2023-08-04 14:50:51 +0000] [10] [INFO] Starting gunicorn 21.2.0
tmpgn8jwpla-algo-1-slbba-1  | [2023-08-04 14:50:51 +0000] [10] [INFO] Listening at: unix:/tmp/gunicorn.sock (10)
tmpgn8jwpla-algo-1-slbba-1  | [2023-08-04 14:50:51 +0000] [10] [INFO] Using worker: sync
tmpgn8jwpla-algo-1-slbba-1  | [2023-08-04 14:50:51 +0000] [12] [INFO] Booting worker with pid: 12
tmpgn8jwpla-algo-1-slbba-1  | [2023-08-04 14:50:51 +0000] [13] [INFO] Booting worker with pid: 13
!tmpgn8jwpla-algo-1-slbba-1  | 172.18.0.1 - - [04/Aug/2023:14:50:56 +0000] "GET /ping HTTP/1.1" 200 1 "-" "python-urllib3/1.26.10"


### Inference

We can now send a sample JSON payload for inference with our model. 

In [None]:
test_data = open('payload.json')
test_data1 = '{"age": 1, "anaemia": 0, "creatinine_phosphokinase": 2, "diabetes": 0, "ejection_fraction": 0, "high_blood_pressure": 1, "platelets": 1, "serum_creatinine": 0, "serum_sodium": 0, "sex": 1, "smoking": 0, "time": 4}'

#with open('payload.json') as f:
#    d = json.load(f)
#    s = json.dumps(d)
#    print(d)

with open('payload.json') as f:
    test = json.load(f)


# s = json.dumps(test_data1)
print (test)
print(type(test)) 
# print (s)

In [None]:
predicted = predictor.predict(test["data"]).decode('utf-8')
# predicted = predictor.predict(test["data"]).decode('utf-8')

# predicted = predictor.predict(s)

In [None]:
print(predicted)

### Push the Container to ECR on the AWS Cloud

At this Point, we have successfully launched the container on our local machine and we are able to send inference commands. We would now like to push this container to the ECR repository on the AWS Cloud.  

We start by defining some variables like the current execution role, the ECR repository that we are going to use for pushing the custom Docker container and a default Amazon S3 bucket to be used by Amazon SageMaker.

In [10]:
import sagemaker

ecr_namespace = "sagemaker-local-training-containers/"
prefix = "local-training"

ecr_repository_name = ecr_namespace + prefix
account_id = role.split(":")[4]
region = boto3.Session().region_name
sagemaker_session = sagemaker.session.Session()
bucket = sagemaker_session.default_bucket()

print(account_id)
print(region)
print(role)
print(bucket)


403775705461
eu-west-1
arn:aws:iam::403775705461:role/SageMaker-IAM-Role-AB3
sagemaker-eu-west-1-403775705461


Let's take a look at the Dockerfile which defines the statements for building our custom SageMaker training container:


In [9]:
! pygmentize ../local/container/Dockerfile

[37m# Build an image that can do training and inference in SageMaker[39;49;00m[37m[39;49;00m
[37m# This is a Python 3 image that uses the nginx, gunicorn, flask stack[39;49;00m[37m[39;49;00m
[37m# for serving inferences in a stable way.[39;49;00m[37m[39;49;00m
[37m[39;49;00m
[34mFROM[39;49;00m[37m [39;49;00m[33mpython:3.9-slim[39;49;00m[37m[39;49;00m
[37m[39;49;00m
[37m# Jonah insertion 1[39;49;00m[37m[39;49;00m
[34mRUN[39;49;00m[37m [39;49;00mpip[37m [39;49;00minstall[37m [39;49;00msagemaker-training[37m[39;49;00m
[37m[39;49;00m
[34mRUN[39;49;00m[37m [39;49;00mapt-get[37m [39;49;00m-y[37m [39;49;00mupdate[37m [39;49;00m&&[37m [39;49;00mapt-get[37m [39;49;00minstall[37m [39;49;00m-y[37m [39;49;00m--no-install-recommends[37m [39;49;00m[33m\[39;49;00m
[37m         [39;49;00mwget[37m [39;49;00m[33m\[39;49;00m
[37m         [39;49;00mnginx[37m [39;49;00m[33m\[39;49;00m
[37m         [39;49;00mca-certificates[37m[39

At high-level the Dockerfile specifies the following operations for building this container:

TODO

### Build and push the container
We are now ready to build this container and push it to Amazon ECR. It will create a new repo in ECR for you. Please ensure you have the correct IAM permissions to push to the ECR.

In the below cell we are building, tagging, authenticating and pushing the container to ECR with the above variables.

In [11]:
! docker build -f ../local/container/Dockerfile -t sagemaker-local-training-containers/tutorial ../local/container
! docker tag sagemaker-local-training-containers/tutorial {account_id}.dkr.ecr.{region}.amazonaws.com/sagemaker-local-training-containers/local-training:latest
! aws ecr get-login --no-include-email --registry-ids {account_id}
! aws ecr get-login-password --region {region} | docker login --username AWS --password-stdin {account_id}.dkr.ecr.{region}.amazonaws.com
! aws ecr describe-repositories --repository-names sagemaker-local-training-containers/local-training || aws ecr create-repository --repository-name sagemaker-local-training-containers/local-training
! docker push {account_id}.dkr.ecr.{region}.amazonaws.com/sagemaker-local-training-containers/local-training:latest



[1A[1B[0G[?25l[+] Building 0.0s (0/0)                                                         
[?25h[1A[0G[?25l[+] Building 0.1s (2/3)                                                         
[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 1.17kB                                     0.0s
[0m => [internal] load metadata for docker.io/library/python:3.9-slim         0.1s
[?25h[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (2/3)                                                         
[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m[34m => [internal] load build definition from Dockerfile        

Alternaitvely you can use the below cell and leverage the python script

In [None]:
# %%capture
# ! ../scripts/build_and_push.sh $account_id $region $ecr_repository_name

Shut down the endpoint we created for this tutorial.

In [None]:
predictor.delete_endpoint()