# Arrhythmia Classification with the LSTM-FCN SageMaker Algorithm

**Blog post:** https://fg-research.com/blog/product/posts/lstm-fcn-ecg-classification.html

### 1. Environment set-up

1. This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. Some hands-on experience using [Amazon SageMaker](https://aws.amazon.com/sagemaker/).
1. To use this algorithm successfully, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. or your AWS account has a subscription to the [Time Series Classification (LSTM-FCN) Algorithm from AWS Marketplace](https://aws.amazon.com/marketplace/pp/prodview-vzxmyw25oqtx6).

To subscribe to the algorithm:
1. Open the algorithm listing page.
1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you agree with EULA, pricing, and support terms. 
1. Once you click on the **Continue to configuration** button and then choose a **region**, you will see a **Product ARN**. This is the algorithm ARN that you need to specify while training a custom ML model. **Copy the ARN corresponding to your region and specify the same in the following cell.**

In [1]:
# SageMaker algorithm ARN, replace the placeholder below with your AWS Marketplace ARN
algo_arn = "arn:aws:sagemaker:<...>"

In [2]:
!pip install imbalanced-learn



In [3]:
import io
import sagemaker
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from imblearn.under_sampling import RandomUnderSampler
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score, roc_auc_score

# SageMaker session
sagemaker_session = sagemaker.Session()

# SageMaker role
role = sagemaker.get_execution_role()

# S3 bucket
bucket = sagemaker_session.default_bucket()

# EC2 instance
instance_type = "ml.m5.2xlarge"

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


### 2. Data preparation

Load the training data.

In [4]:
training_dataset = pd.read_csv("mitbih_train.csv", header=None)

In [5]:
training_dataset.shape

(87554, 188)

In [6]:
training_dataset.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,178,179,180,181,182,183,184,185,186,187
0,0.977941,0.926471,0.681373,0.245098,0.154412,0.191176,0.151961,0.085784,0.058824,0.04902,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.960114,0.863248,0.461538,0.196581,0.094017,0.125356,0.099715,0.088319,0.074074,0.082621,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.659459,0.186486,0.07027,0.07027,0.059459,0.056757,0.043243,0.054054,0.045946,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.925414,0.665746,0.541436,0.276243,0.196133,0.077348,0.071823,0.060773,0.066298,0.058011,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.967136,1.0,0.830986,0.586854,0.356808,0.248826,0.14554,0.089202,0.117371,0.150235,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [7]:
training_dataset.iloc[:, -1].sort_values().unique()

array([0., 1., 2., 3., 4.])

In [8]:
training_dataset.iloc[:, -1].rename(None).value_counts().sort_index()

0.0    72471
1.0     2223
2.0     5788
3.0      641
4.0     6431
Name: count, dtype: int64

Resample the training data.

In [9]:
sampler = RandomUnderSampler(random_state=42)
training_dataset = pd.concat(sampler.fit_resample(X=training_dataset.iloc[:, :-1], y=training_dataset.iloc[:, -1:]), axis=1)

In [10]:
training_dataset.shape

(3205, 188)

In [11]:
training_dataset.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,178,179,180,181,182,183,184,185,186,187
10153,0.162791,0.540698,0.755814,0.186047,0.168605,0.546512,0.616279,0.697674,0.651163,0.703488,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
33886,0.990066,0.938742,0.344371,0.034768,0.273179,0.331126,0.326159,0.34106,0.347682,0.347682,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
32005,0.974239,0.932084,0.590164,0.131148,0.014052,0.168618,0.238876,0.210773,0.196721,0.208431,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
56159,0.978495,0.723118,0.526882,0.298387,0.22043,0.158602,0.091398,0.091398,0.080645,0.083333,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
61783,0.963351,0.709424,0.060209,0.013089,0.057592,0.041885,0.04712,0.034031,0.039267,0.044503,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [12]:
training_dataset.iloc[:, -1].sort_values().unique()

array([0., 1., 2., 3., 4.])

In [13]:
training_dataset.iloc[:, -1].rename(None).value_counts().sort_index()

0.0    641
1.0    641
2.0    641
3.0    641
4.0    641
Name: count, dtype: int64

One-hot encode the class labels.

In [14]:
encoder = OneHotEncoder(sparse_output=False)
encoder.fit(training_dataset.iloc[:, -1:])

Rename the columns.

In [15]:
training_dataset = pd.concat([
    pd.DataFrame(data=encoder.transform(training_dataset.iloc[:, -1:]), columns=[f"y_{i + 1}" for i in range(training_dataset.iloc[:, -1].nunique())]),
    pd.DataFrame(data=training_dataset.iloc[:, :-1].values, columns=[f"x_{i + 1}" for i in range(training_dataset.shape[1] - 1)])
], axis=1)

In [16]:
training_dataset.shape

(3205, 192)

In [17]:
training_dataset.head()

Unnamed: 0,y_1,y_2,y_3,y_4,y_5,x_1,x_2,x_3,x_4,x_5,...,x_178,x_179,x_180,x_181,x_182,x_183,x_184,x_185,x_186,x_187
0,1.0,0.0,0.0,0.0,0.0,0.162791,0.540698,0.755814,0.186047,0.168605,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1.0,0.0,0.0,0.0,0.0,0.990066,0.938742,0.344371,0.034768,0.273179,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,0.0,0.0,0.0,0.974239,0.932084,0.590164,0.131148,0.014052,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1.0,0.0,0.0,0.0,0.0,0.978495,0.723118,0.526882,0.298387,0.22043,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1.0,0.0,0.0,0.0,0.0,0.963351,0.709424,0.060209,0.013089,0.057592,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Upload the training data to S3.

In [18]:
training_data = sagemaker_session.upload_string_as_file_body(
    body=training_dataset.to_csv(index=False),
    bucket=bucket,
    key="MITBIH_train.csv"
)

In [19]:
training_data

's3://sagemaker-eu-west-1-661670223746/MITBIH_train.csv'

### 3. Training

Fit the model to the training data.

In [20]:
estimator = sagemaker.algorithm.AlgorithmEstimator(
    algorithm_arn=algo_arn,
    role=role,
    instance_count=1,
    instance_type=instance_type,
    input_mode="File",
    sagemaker_session=sagemaker_session,
    hyperparameters={
        "num-layers": 1,
        "hidden-size": 128,
        "dropout": 0.8,
        "filters-1": 128,
        "filters-2": 256,
        "filters-3": 128,
        "kernel-size-1": 8,
        "kernel-size-2": 5,
        "kernel-size-3": 3,
        "batch-size": 256,
        "lr": 0.001,
        "epochs": 100,
        "task": "multiclass"
    },
)

estimator.fit({"training": training_data})

INFO:sagemaker:Creating training-job with name: lstm-fcn-v1-15-2024-09-06-18-03-49-599


2024-09-06 18:03:49 Starting - Starting the training job......
2024-09-06 18:04:32 Starting - Preparing the instances for training...
2024-09-06 18:05:12 Downloading - Downloading the training image..................
2024-09-06 18:08:09 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
  "cipher": algorithms.TripleDES,[0m
  "class": algorithms.TripleDES,[0m
[34m2024-09-06 18:08:26,532 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2024-09-06 18:08:26,533 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2024-09-06 18:08:26,533 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2024-09-06 18:08:26,533 sagemaker-training-toolkit INFO     Failed to parse hyperparameter task value multiclass to Json.

### 4. Inference

Deploy the model to a real-time endpoint.

In [21]:
serializer = sagemaker.serializers.CSVSerializer(content_type="text/csv")
deserializer = sagemaker.base_deserializers.PandasDeserializer(accept="text/csv")

In [22]:
predictor = estimator.deploy(
    initial_instance_count=1,
    instance_type=instance_type,
)

INFO:sagemaker:Creating model package with name: lstm-fcn-v1-15-2024-09-06-18-18-16-806


.........

INFO:sagemaker:Creating model with name: lstm-fcn-v1-15-2024-09-06-18-18-16-806





INFO:sagemaker:Creating endpoint-config with name lstm-fcn-v1-15-2024-09-06-18-18-16-806
INFO:sagemaker:Creating endpoint with name lstm-fcn-v1-15-2024-09-06-18-18-16-806


----------!

Load the test data.

In [23]:
test_dataset = pd.read_csv("mitbih_test.csv", header=None)

In [24]:
test_dataset.shape

(21892, 188)

In [25]:
test_dataset.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,178,179,180,181,182,183,184,185,186,187
0,1.0,0.758264,0.11157,0.0,0.080579,0.078512,0.066116,0.049587,0.047521,0.035124,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.908425,0.783883,0.531136,0.362637,0.3663,0.344322,0.333333,0.307692,0.296703,0.300366,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.730088,0.212389,0.0,0.119469,0.10177,0.10177,0.110619,0.123894,0.115044,0.132743,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1.0,0.910417,0.68125,0.472917,0.229167,0.06875,0.0,0.004167,0.014583,0.054167,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.57047,0.399329,0.238255,0.147651,0.0,0.003356,0.040268,0.080537,0.07047,0.090604,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [26]:
test_dataset.iloc[:, -1].sort_values().unique()

array([0., 1., 2., 3., 4.])

In [27]:
test_dataset.iloc[:, -1].rename(None).value_counts().sort_index()

0.0    18118
1.0      556
2.0     1448
3.0      162
4.0     1608
Name: count, dtype: int64

One-hot encode the class labels and rename the columns.

In [28]:
test_dataset = pd.concat([
    pd.DataFrame(data=encoder.transform(test_dataset.iloc[:, -1:]), columns=[f"y_{i + 1}" for i in range(test_dataset.iloc[:, -1].nunique())]),
    pd.DataFrame(data=test_dataset.iloc[:, :-1].values, columns=[f"x_{i + 1}" for i in range(test_dataset.shape[1] - 1)])
], axis=1)

In [29]:
test_dataset.shape

(21892, 192)

In [30]:
test_dataset.head()

Unnamed: 0,y_1,y_2,y_3,y_4,y_5,x_1,x_2,x_3,x_4,x_5,...,x_178,x_179,x_180,x_181,x_182,x_183,x_184,x_185,x_186,x_187
0,1.0,0.0,0.0,0.0,0.0,1.0,0.758264,0.11157,0.0,0.080579,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1.0,0.0,0.0,0.0,0.0,0.908425,0.783883,0.531136,0.362637,0.3663,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,0.0,0.0,0.0,0.730088,0.212389,0.0,0.119469,0.10177,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1.0,0.0,0.0,0.0,0.0,1.0,0.910417,0.68125,0.472917,0.229167,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1.0,0.0,0.0,0.0,0.0,0.57047,0.399329,0.238255,0.147651,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Invoke the endpoint.

In [31]:
batch_size = 100

predictions = pd.DataFrame()

for i in range(0, len(test_dataset), batch_size):

    response = sagemaker_session.sagemaker_runtime_client.invoke_endpoint(
        EndpointName=predictor.endpoint_name,
        ContentType="text/csv",
        Body=test_dataset.iloc[i:i + batch_size, 5:].to_csv(index=False)
    )

    predictions = pd.concat([
        predictions,
        deserializer.deserialize(response["Body"], content_type="text/csv"),
    ], axis=0, ignore_index=True)

In [32]:
predictions.shape

(21892, 10)

In [33]:
predictions.head()

Unnamed: 0,y_1,y_2,y_3,y_4,y_5,p_1,p_2,p_3,p_4,p_5
0,1,0,0,0,0,0.994498,0.00444,0.00031,0.00072,3.3e-05
1,1,0,0,0,0,0.619804,0.23262,0.019893,0.124084,0.003599
2,1,0,0,0,0,0.675945,0.32058,0.000104,0.001381,0.001989
3,1,0,0,0,0,0.991197,2.5e-05,0.001224,0.007459,9.5e-05
4,1,0,0,0,0,0.998229,0.000143,0.00071,0.000499,0.00042


Calculate the classification metrics.

In [34]:
metrics = pd.DataFrame(columns=[c.replace("y_", "") for c in test_dataset.columns if c.startswith("y")])
for c in metrics.columns:
    metrics[c] = {
        "Accuracy": accuracy_score(y_true=test_dataset[f"y_{c}"], y_pred=predictions[f"y_{c}"]),
        "ROC-AUC": roc_auc_score(y_true=test_dataset[f"y_{c}"], y_score=predictions[f"p_{c}"]),
        "Precision": precision_score(y_true=test_dataset[f"y_{c}"], y_pred=predictions[f"y_{c}"]),
        "Recall": recall_score(y_true=test_dataset[f"y_{c}"], y_pred=predictions[f"y_{c}"]),
        "F1": f1_score(y_true=test_dataset[f"y_{c}"], y_pred=predictions[f"y_{c}"]),
    }
metrics.columns = test_dataset.columns[:5]

In [35]:
metrics

Unnamed: 0,y_1,y_2,y_3,y_4,y_5
Accuracy,0.93404,0.976064,0.981774,0.969487,0.994747
ROC-AUC,0.983822,0.941757,0.992107,0.994527,0.999399
Precision,0.987772,0.521164,0.831752,0.190709,0.953799
Recall,0.931836,0.708633,0.908149,0.962963,0.975746
F1,0.958989,0.60061,0.868273,0.318367,0.964648


Delete the model.

In [36]:
predictor.delete_model()

INFO:sagemaker:Deleting model with name: lstm-fcn-v1-15-2024-09-06-18-18-16-806


Delete the endpoint.

In [37]:
predictor.delete_endpoint()

INFO:sagemaker:Deleting endpoint configuration with name: lstm-fcn-v1-15-2024-09-06-18-18-16-806
INFO:sagemaker:Deleting endpoint with name: lstm-fcn-v1-15-2024-09-06-18-18-16-806
