## Introduction

This is a binary classification algorithm used to figure out if a patient has heart disease. In this example, we will upload sample data from Cleveland Heart Disease dataset taken from the UCI repository (Kaggle).  The dataset consists of 1025 individuals data.  Please see data repository for column description and sample data.

In [12]:

bucket = 'sagemaker-heartify'
prefix = 'sagemaker/heart'

data_key = 'heart.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)
 
import boto3
import re
from sagemaker import get_execution_role

role = get_execution_role()

### Data ingestion

In [9]:
import pandas as pd
import json

heart_data = pd.read_csv(data_location)
heart_data.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,52,1,0,125,212,0,1,168,0,1.0,2,2,3,0
1,53,1,0,140,203,1,0,155,1,3.1,0,0,3,0
2,70,1,0,145,174,0,1,125,1,2.6,0,0,3,0
3,61,1,0,148,203,0,1,161,0,0.0,2,1,3,0
4,62,0,0,138,294,1,1,106,0,1.9,1,3,2,0


In [11]:
heart_data.count()

age         1025
sex         1025
cp          1025
trestbps    1025
chol        1025
fbs         1025
restecg     1025
thalach     1025
exang       1025
oldpeak     1025
slope       1025
ca          1025
thal        1025
target      1025
dtype: int64

### Data conversion

The Linear Learner algorithms expects a features matrix and labels vector.


In [13]:
import numpy as np
vectors = np.array(heart_data).astype('float32')

labels = vectors[:,13]
print ("label data is")
print (labels)
training_data = vectors[:, :13]
print ("Training data is")
print (training_data)



label data is
[0. 0. 0. ... 0. 1. 0.]
Training data is
[[52.  1.  0. ...  2.  2.  3.]
 [53.  1.  0. ...  0.  0.  3.]
 [70.  1.  0. ...  0.  0.  3.]
 ...
 [47.  1.  0. ...  1.  1.  2.]
 [50.  0.  0. ...  2.  0.  2.]
 [54.  1.  0. ...  1.  1.  3.]]


In [14]:
import io
import os
import sagemaker.amazon.common as smac

buf = io.BytesIO()
smac.write_numpy_to_dense_tensor(buf, training_data, labels)
buf.seek(0)

key = 'recordio-pb-data'
boto3.resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train', key)).upload_fileobj(buf)
s3_train_data = 's3://{}/{}/train/{}'.format(bucket, prefix, key)
print('uploaded training data location: {}'.format(s3_train_data))



uploaded training data location: s3://sagemaker-heartify/sagemaker/heart/train/recordio-pb-data


In [15]:
output_location = 's3://{}/{}/output'.format(bucket, prefix)
print('training artifacts will be uploaded to: {}'.format(output_location))

training artifacts will be uploaded to: s3://sagemaker-heartify/sagemaker/heart/output


## Training the linear model

We will do a binary classification (patient either has heart disease or not), train the model on the specified compute (e.g. ml.m4.xlarge), and we will sepcify the features or dimiensions in our training set.

In [16]:
from sagemaker.amazon.amazon_estimator import get_image_uri
import sagemaker

container = get_image_uri(boto3.Session().region_name, 'linear-learner', "latest")

sess = sagemaker.Session()
linear = sagemaker.estimator.Estimator(container,
                                       role, 
                                       train_instance_count=1, 
                                       train_instance_type='ml.m4.xlarge',
                                       output_path=output_location,
                                       sagemaker_session=sess)
linear.set_hyperparameters(feature_dim=13,
                           predictor_type='binary_classifier',
                           mini_batch_size=100)

linear.fit({'train': s3_train_data})

The method get_image_uri has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
Defaulting to the only supported framework/algorithm version: 1. Ignoring framework/algorithm version: latest.
train_instance_count has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
train_instance_type has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


2022-05-15 19:19:20 Starting - Starting the training job...
2022-05-15 19:19:46 Starting - Preparing the instances for trainingProfilerReport-1652642360: InProgress
.........
2022-05-15 19:21:08 Downloading - Downloading input data...
2022-05-15 19:21:45 Training - Downloading the training image.....[34mDocker entrypoint called with argument(s): train[0m
[34mRunning default environment configuration script[0m
[34m[05/15/2022 19:22:36 INFO 139914211612480] Reading default configuration from /opt/amazon/lib/python3.7/site-packages/algorithm/resources/default-input.json: {'mini_batch_size': '1000', 'epochs': '15', 'feature_dim': 'auto', 'use_bias': 'true', 'binary_classifier_model_selection_criteria': 'accuracy', 'f_beta': '1.0', 'target_recall': '0.8', 'target_precision': '0.8', 'num_models': 'auto', 'num_calibration_samples': '10000000', 'init_method': 'uniform', 'init_scale': '0.07', 'init_sigma': '0.01', 'init_bias': '0.0', 'optimizer': 'auto', 'loss': 'auto', 'margin': '1.0', 'q

## Hosting for the model

In [21]:
heartdisease_predictor = linear.deploy(initial_instance_count=1,
                                 instance_type='ml.m4.xlarge')

---------!

## Validate the model for use
Finally, we can now validate the model for use.  We can pass HTTP POST requests to the endpoint to get back predictions.  To make this easier, we'll again use the Amazon SageMaker Python SDK and specify how to serialize requests and deserialize responses that are specific to the algorithm.

In [22]:
from sagemaker.predictor import csv_serializer, json_deserializer
heartdisease_predictor.serializer = csv_serializer
heartdisease_predictor.deserializer = json_deserializer

In [23]:
print('Endpoint name: {}'.format(heartdisease_predictor.endpoint))

The endpoint attribute has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


Endpoint name: linear-learner-2022-05-15-19-36-25-040


# Testing (Model)

In [40]:
vectors[5][0:13]

array([ 58.,   0.,   0., 100., 248.,   0.,   0., 122.,   0.,   1.,   1.,
         0.,   2.], dtype=float32)

In [42]:
result = heartdisease_predictor.predict(vectors[5][0:13])
print(result)

The csv_serializer has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
The json_deserializer has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


{'predictions': [{'score': 0.7061784267425537, 'predicted_label': 1}]}


In [None]:
import sagemaker

sagemaker.Session().delete_endpoint(heartdisease_predictor.endpoint)