In [None]:
#Import necessary libraries
from tensorflow.keras.utils import to_categorical
import tensorflow as tf
from tensorflow.keras.layers import Dense, Activation, Flatten, Convolution1D, Dropout,MaxPooling1D,GlobalAveragePooling1D
from tensorflow.keras import Model, layers,Sequential,regularizers
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.optimizers import Adam
import mlflow
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import seaborn as sns
from sklearn.metrics import confusion_matrix
from utils import get_train_data,get_train_split,get_test_split,preprocess

In [None]:
#Get the features and target variables from train and test dataset respectively
X_train,y_train =  get_train_split()
X_test,y_test = get_test_split()

In [None]:
#Preprocess before model training
X_train_processed, y_train_processed, X_test_ = preprocess(X_train,y_train,X_test)

In [None]:
#Convert labels to one-hot encoded format
y_train_processed = to_categorical(y_train_processed, num_classes=5)
y_test = to_categorical(y_test, num_classes=5)

### Convolutional Neural Network(CNN)
The CNN algorithm performs classification better than conventional approaches and does not require feature extraction. The original ECG signals can be directly classified, and any human intervention is removed. Different levels of abstraction can be captured by CNNs in terms of local patterns and features. The convolutional, max pooling, and dense layers that make up the 1D-CNN model architecture automatically extract distinct nonlinear features from the ECG data and automatically categorise them into five separate classes. Additionally, parameter sharing is used by CNNs, where the same set of weights are applied to many input regions. By sharing parameters, CNNs become more computationally efficient and have a lower chance of overfitting, especially when there is a shortage of training data.

Some potential challenges associated with CNNs are as follows:
* CNN models are frequently referred to as "black-box models," which makes it difficult to evaluate and comprehend how they make decisions. Interpretability is especially important in the healthcare industry since it can increase confidence and provide insights into the clinical consequences of the forecasts. 
*CNN models can be computationally taxing, particularly when working with more substantial and intricate designs or large datasets. To achieve reasonable training times, training and inference methods may need high-performance hardware or distributed computing resources.

These challenges can be addressed by exploring interpretability methods specific to CNN models in healthcare applications. Additionally, optimizing and parallelizing computations on hardware accelerators (e.g., GPUs) can help overcome computational resource constraints.

In [None]:
# Define model architecture
model= Sequential()
model.add(Convolution1D(32,5,activation='relu',input_shape=(29,1)))
model.add(Convolution1D(64,5,activation='relu'))         
model.add(MaxPooling1D(3))
model.add(Convolution1D(128, 3, activation='relu'))
model.add(Convolution1D(256, 3, activation='relu'))
model.add(GlobalAveragePooling1D())
model.add(Dropout(0.3))
model.add(Dense(1024,activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(32,activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(5,activation='softmax'))

# Compile the model
learning_rate = 0.001  # Setting the desired learning rate

optimizer = Adam(learning_rate=learning_rate)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])


# Define early stopping callback
early_stopping = EarlyStopping(patience=3, monitor='val_loss')

# Train the model with early stopping
trained_model = model.fit(X_train, y_train, epochs=5, batch_size=64, validation_split=0.2, callbacks=[early_stopping])

To avoid overfitting or underfitting, increase efficiency, and determine the ideal stopping point based on the model's performance, early stopping and training callbacks are approaches used during model training.

* Training Callbacks: TensorFlow's training callbacks offer a mechanism to alter the training procedure by carrying out particular tasks at different training phases. Callbacks are objects or procedures that can be provided to a Keras model's fit() method. They are carried out during training at predetermined times, such as the beginning or end of an epoch, just before or just after batch processing, or when a particular metric threshold is reached. When adding new features or behaviours to training, such as early stopping, scheduling learning rates, model checkpointing, or logging metrics, training callbacks are employed. By enabling dynamic adjustments to the training process without interfering with or manually altering the training loop, they increase efficiency. I have implemented Early Stopping as a callback that monitors a specific metric during training.

* Early Stopping: The training process can be ended early using the early stopping strategy if the model's performance on a validation set remains unchanged or worsens. By minimising pointless training iterations that can result in memory of the training data, it helps minimise overfitting. Early termination is based on the observation that the model's performance on the validation set tends to approach an ideal level as training advances and may start to degrade if training is continued. Early stopping, which monitors a particular metric (such validation loss or accuracy), terminates the training procedure when the metric does not improve after a specified number of epochs (patience). By avoiding overfitting, this method aids in striking the correct balance between model performance.

The EarlyStopping callback in Keras implemented in this project waits for 3 epochs without improvement in the monitored metric before stopping the training process. The metric to be monitored is assigned as validation loss.

A regularisation method frequently employed in neural networks, especially deep learning models, is the dropout layer. It enhances the model's capacity for generalisation and helps avoid overfitting. The dropout layer works by randomly setting a fraction of input units to 0 at each update during training time, which temporarily removes those units from the network. The dropout layer used in the CNN model of this project has a dropout rate of 0.3. This means that approximately 30% of the input units will be dropped out or set to 0 at each update.


In [None]:
# Evaluate the model on test data
loss, accuracy = model.evaluate(X_test, y_test)

# Make predictions on test data
train_predictions = model.predict(X_test)

### Hyperparameter Tuning
I have chosen the following hyperparameters for hyperparameter tuning - learning rate, epochs, and batch size. The methodology used for tuning these hyperparameters is through the utilization of MLflow.

* Learning Rate: As the model's parameters are updated during training, the learning rate is a hyperparameter that sets the step size at each iteration. It controls the speed and magnitude of parameter updates. The learning rate is a crucial hyperparameter to adjust because it can cause unstable training or overshooting at high values and delayed convergence or getting stuck in local minima at low values. I tried with several learning rate values, such as 0.001, 0.01, and 0.1, to discover the best learning rate that strikes a balance between training efficiency and model performance.

* Epochs: During model training, an epoch represents a full iteration of the entire training dataset. A hyperparameter that controls how frequently the model is exposed to training data is the number of epochs. The performance of the model may be enhanced by increasing the number of epochs, but there is a risk of overfitting if the model begins to memorise the training data too thoroughly. I ran multiple learning runs with various epoch values, such as 5, 10, and then tuned the number of epochs by observing the model's performance on validation data and using early stopping approaches to find the right number of epochs that maximise performance without overfitting.

* Batch Size: The batch size is a hyperparameter that determines the number of samples presented to the model for evaluation and parameter updates at each iteration. It affects both training time and the quality of parameter updates. Smaller batch sizes can provide more frequent updates, leading to faster convergence. Larger batch sizes can stabilize training but may require more memory and computational resources. To tune the batch size, I experimented with various values, such as 16, 32, 64, and observed the trade-off between training speed and convergence stability.

The code iterates over hyperparameters and start a new MLflow run for each combination. For each combination, performance metrics are logged using MLflow. Validation data splits is also used to evaluate and compare the models' performance with different hyperparameter settings. MLflow provides a convenient framework to track and record these hyperparameter values, along with corresponding performance metrics, enabling efficient experimentation and analysis of the results.

In [None]:
# Split the data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

# Define the hyperparameters to tune
learning_rates = [0.001, 0.01, 0.1]
batch_sizes = [16, 32, 64]
num_epochs = [5, 10]

best_accuracy = 0.0
best_model_path = None
best_run_id = None

def train_model(learning_rate, batch_size, num_epochs):

    mlflow.log_param("learning_rate", learning_rate)
    mlflow.log_param("batch_size", batch_size)
    mlflow.log_param("num_epochs", num_epochs)

    global best_accuracy
    global best_model_path
    global best_run_id
    
    # Define the model architecture
    model= Sequential()
    model.add(Convolution1D(32,5,activation='relu',input_shape=(29,1)))
    model.add(Convolution1D(64,5,activation='relu'))         
    model.add(MaxPooling1D(3))
    model.add(Convolution1D(128, 3, activation='relu'))
    model.add(Convolution1D(256, 3, activation='relu'))
    model.add(GlobalAveragePooling1D())
    model.add(Dropout(0.3))
    model.add(Dense(1024,activation='relu'))
    model.add(Dropout(0.3))
    model.add(Dense(256,activation='relu'))
    model.add(Dropout(0.3))
    model.add(Dense(32,activation='relu'))
    model.add(Dropout(0.3))
    model.add(Dense(5,activation='softmax'))
    
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    
    # Train the model
    early_stopping = EarlyStopping(patience=3, monitor='val_loss')
    history = model.fit(X_train, y_train, epochs=num_epochs, batch_size=batch_size, validation_data=(X_val, y_val), 
                        callbacks=[early_stopping],verbose=1)
    
    # Evaluate the model on the validation set
    val_loss, val_accuracy = model.evaluate(X_val, y_val, verbose=0)
    
    # Log the validation loss and accuracy
    mlflow.log_metric("val_loss", val_loss)
    mlflow.log_metric("val_accuracy", val_accuracy)
    
    # Check if the current model is the best so far
    if val_accuracy > best_accuracy:
        best_accuracy = val_accuracy
        
        # Save the model
        best_model_path = f"best_model_lr_{learning_rate}_bs_{batch_size}_epochs_{num_epochs}.h5"
        model.save(best_model_path)
        
        # Get the run ID for the best model
        best_run_id = mlflow.active_run().info.run_id
    
    return model

# Iterate over the hyperparameters
for lr in learning_rates:
    for bs in batch_sizes:
        for epochs in num_epochs:
            # Start a new MLflow run
            with mlflow.start_run(run_name=f"lr_{lr}_bs_{bs}_epochs_{epochs}"):
                # Train the model
                model = train_model(lr, bs, epochs)
                

# Load the best model
best_model = tf.keras.models.load_model(best_model_path)

# Print the run ID for the best model
print(f"Best Run ID: {best_run_id}")

In [None]:
# Retrieve the run information for the best run
run_info = mlflow.get_run(best_run_id)

# Get the hyperparameters logged for the best run
hyperparams = run_info.data.params

if 'learning_rate' in hyperparams:
    print(f"Learning Rate: {hyperparams['learning_rate']}")

if 'batch_size' in hyperparams:
    print(f"Batch Size: {hyperparams['batch_size']}")

if 'num_epochs' in hyperparams:
    print(f"Number of Epochs: {hyperparams['num_epochs']}")

In [None]:
# Evaluate the best model on test data
loss, accuracy = best_model.evaluate(X_test, y_test)

# Use the best model for predictions or other tasks
predictions = best_model.predict(X_test)

In [None]:
# Convert the predicted labels to class indices
predicted_indices = tf.argmax(predictions, axis=1).numpy()

# Convert the actual labels to class indices
actual_indices = tf.argmax(y_test, axis=1).numpy()

# Calculate the accuracy
accuracy = accuracy_score(actual_indices, predicted_indices)

print("Accuracy:", accuracy)


In [None]:
#Plot showing the Training and Validation Accuracy
fig1, ax_acc = plt.subplots()
plt.plot(trained_model.history['accuracy'])
plt.plot(trained_model.history['val_accuracy'])
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.title('Model - Accuracy')
plt.legend(['Training', 'Validation'], loc='lower right')
plt.show()

In [None]:
#Plot showing the Training and Validation Loss
fig2, ax_loss = plt.subplots()
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Model- Loss')
plt.legend(['Training', 'Validation'], loc='upper right')
plt.plot(trained_model.history['loss'])
plt.plot(trained_model.history['val_loss'])
plt.show()
target_names=['0','1','2','3','4']

In [None]:
import seaborn as sns
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(actual_indices, predicted_indices)

# Plot confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.show()

Accuracy and Confusion matrix were chosen as the evaluation metrics for the classification task of heartbeat signals into different arrhythmia cases. These metrics are commonly used and well-suited for such tasks. However, if the task involved binary classification, such as identifying infected cases or anomaly detection, Recall would be a more appropriate evaluation metric. Recall specifically focuses on correctly predicting positive instances out of all actual positive instances, making it particularly relevant for such scenarios.

* Accuracy: Accuracy is a common metric that measures the overall correctness of the model's predictions. It calculates the ratio of correctly predicted samples to the total number of samples in the dataset. 
* Confusion Matrix: A confusion matrix provides a detailed breakdown of the model's predictions for each class. It shows the number of true positives, true negatives, false positives, and false negatives. 

The initial model showed satisfactory accuracy during training. However, to further enhance the model's performance and accuracy, hyperparameter tuning was done. This process helped find the optimal combination of hyperparameters that led to improved results. The model trained using the best hyperparameters achieved higher accuracy during validation, indicating an enhancement in its performance.


In [None]:
# Create a holdout set by randomly sampling from the training dataset
n_holdout_samples = 1000  # Creating 1000 new samples

holdout_samples_indices = np.random.choice(len(x), size=n_holdout_samples, replace=False)
X_holdout = X_train[holdout_samples_indices]
y_holdout = y_train[holdout_samples_indices]

In [None]:
loss, accuracy = model.evaluate(X_holdout, y_holdout)
loss, accuracy = best_model.evaluate(X_holdout, y_holdout)

The holdout data exhibited comparable performance to the training data. Both the trained model and the best model achieved good accuracy, with the best model slightly outperforming the trained model.