# 05b: CNN-LSTM Model Design and Training

This notebook provides a comprehensive workflow for designing, training, and evaluating a hybrid CNN-LSTM model for ICU mortality prediction using preprocessed time series and static patient data.


#### 1. Imports and Configuration

Import all necessary libraries for deep learning model development, training, and evaluation. This includes TensorFlow/Keras for model building, scikit-learn for metrics and cross-validation, and other utilities for data handling and visualization.

In [None]:
# Imports and Configuration

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from keras_tuner import HyperParameters
from tensorflow.keras.layers import Input, Conv1D, MaxPooling1D, LSTM, Dense, Dropout, Concatenate
from tensorflow.keras.regularizers import l2
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
import pickle
import matplotlib.pyplot as plt
import pandas as pd
import os
from sklearn.metrics import classification_report, roc_curve, auc, confusion_matrix
import seaborn as sns
from sklearn.model_selection import StratifiedKFold


np.random.seed(42)

tf.random.set_seed(42)

### 2. Model Architecture - CNN-LSTM Hybrid

Hybrid deep learning model architecture combining CNN and LSTM layers. This model leverages both temporal patterns in time series data and static patient features for ICU mortality prediction.

**Key Design Choices:**
- **CNN Layers:** Extract local temporal features from the time series input, capturing short-term dependencies and patterns.
- **LSTM Layer:** Captures long-term dependencies and trends in the sequential data, which are important for patient outcome prediction.
- **Static Input:** Patient static features  are included as a separate input and concatenated with the sequence features after feature extraction.
- **Dense Layers:** Combine the extracted sequence and static features to learn complex interactions.
- **Regularization:** Dropout and L2 regularization are used to prevent overfitting and improve generalization.
- **Output Layer:** A single neuron with sigmoid activation for binary classification (mortality prediction).

**Inputs:**
- Time series features: shape = [batch, timesteps, features]
- Static features: shape = [batch, static_features]

**Output:**
- Binary prediction: 0 = survived, 1 = deceased


In [26]:
def build_cnn_lstm_model(time_series_input_shape, static_input_shape):
    """Builds and compiles the CNN-LSTM model."""
    # Time-series branch
    time_series_input = Input(shape=time_series_input_shape, name='time_series_input')

    x = Conv1D(64, 3, activation='relu', padding='same', kernel_regularizer=l2(1e-4))(time_series_input)
    x = MaxPooling1D(pool_size=2)(x)
    x = Dropout(0.3)(x)

    x = Conv1D(128, 3, activation='relu', padding='same', kernel_regularizer=l2(1e-4))(x)
    # Reduce pooling impact for short sequences
    x = MaxPooling1D(pool_size=1)(x)
    x = Dropout(0.3)(x)

    x = LSTM(64, return_sequences=False, kernel_regularizer=l2(1e-4))(x)
    x = Dropout(0.3)(x)

    # Static branch
    static_input = Input(shape=static_input_shape, name='static_input')
    y = Dense(32, activation='relu')(static_input)
    y = Dense(16, activation='relu')(y)

    # Merge branches
    combined = Concatenate()([x, y])

    z = Dense(64, activation='relu')(combined)
    z = Dropout(0.3)(z)
    z = Dense(32, activation='relu')(z)

    output = Dense(1, activation='sigmoid', name='output')(z)

    model = Model(inputs=[time_series_input, static_input], outputs=output)

    # Compile the model
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['AUC', 'Precision', 'Recall'])

    return model


In [27]:
class CNN_LSTM_Model:
    def __init__(self, time_series_input_shape, static_input_shape):
        """Initializes the CNN-LSTM model."""
        self.model = build_cnn_lstm_model(time_series_input_shape, static_input_shape)

    def compile_model(self, learning_rate=1e-3):
        """Compiles the model with specified parameters."""
        self.model.compile(
            optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
            loss='binary_crossentropy',
            metrics=[
                'accuracy',
                tf.keras.metrics.AUC(name='AUC'),
                tf.keras.metrics.Precision(name='Precision'),
                tf.keras.metrics.Recall(name='Recall')
            ]
        )
        print("Model compiled successfully.")

    def train_model(self, X_train, static_train, y_train, X_val, static_val, y_val, epochs=100, batch_size=32, model_save_path='best_model.keras'):
        """Trains the model with early stopping and model checkpointing."""
        early_stopping = EarlyStopping(
            monitor='val_loss',
            patience=10,
            restore_best_weights=True
        )

        model_checkpoint = ModelCheckpoint(
            filepath=model_save_path,
            monitor='val_loss',
            save_best_only=True,
            save_weights_only=False,
            mode='min',
            verbose=1
        )

        print("Training model...")
        history = self.model.fit(
            x=[X_train, static_train],
            y=y_train,
            epochs=epochs,
            batch_size=batch_size,
            validation_data=([X_val, static_val], y_val),
            callbacks=[early_stopping, model_checkpoint],
            verbose=1
        )
        print("Model training finished.")
        return history

    def evaluate_model(self, X_test, static_test, y_test):
        """Evaluates the model on the test data."""
        print("Evaluating model...")
        loss, accuracy, auc, precision, recall = self.model.evaluate(
            x=[X_test, static_test],
            y=y_test,
            verbose=0
        )
        print("Model evaluation finished.")
        return loss, accuracy, auc, precision, recall

    def predict(self, X_test, static_test):
        """Generates predictions for the test data."""
        print("Generating predictions...")
        predictions = self.model.predict(x=[X_test, static_test], verbose=0)
        print("Prediction generation finished.")
        return predictions


### 3. Data Loading Method

Define a function to load ICU time series and static patient data from .npz files. This step prepares the training, validation, and test sets for model development and evaluation.

In [28]:
def load_icu_data(data_path):
    """Loads ICU data from a specified .npz file.

    Args:
        data_path (str): The path to the .npz data file.

    Returns:
        tuple: A tuple containing the loaded data arrays:
               (X_train, X_val, X_test, static_train, static_val, static_test, y_train, y_val, y_test)
    """
    data = np.load(data_path, allow_pickle=True)

    # Time-series inputs
    X_train = data['X_train']
    X_val = data['X_val']
    X_test = data['X_test']

    # Static inputs
    static_train = data['static_train']
    static_val = data['static_val']
    static_test = data['static_test']

    # Labels
    y_train = data['y_train']
    y_val = data['y_val']
    y_test = data['y_test']

    return X_train, X_val, X_test, static_train, static_val, static_test, y_train, y_val, y_test


### 4. Dataset Selection

Identify and select the appropriate temporal dataset for model training and evaluation. This step ensures the model is trained on data with the desired time window (e.g., 6, 12, 36, or 48 hours).

In [None]:
def run_experiment(data_path):
    """Runs a complete CNN-LSTM experiment on data from the given path."""

    # Load the data
    X_train, X_val, X_test, static_train, static_val, static_test, y_train, y_val, y_test = load_icu_data(data_path)
    print(f"Data loaded from: {data_path}")
    print(f"X_train shape: {X_train.shape}, static_train shape: {static_train.shape}, y_train shape: {y_train.shape}")

    # Determine input shapes
    time_series_input_shape = (X_train.shape[1], X_train.shape[2])
    static_input_shape = (static_train.shape[1],)
    print(f"Time series input shape: {time_series_input_shape}, Static input shape: {static_input_shape}")

    # Instantiate the model
    cnn_lstm_model = CNN_LSTM_Model(time_series_input_shape, static_input_shape)
    print("CNN_LSTM_Model instantiated.")

    # Compile the model
    cnn_lstm_model.compile_model()


    # Train the model
    epochs = 50  # Set a fixed number of epochs for each experiment
    batch_size = 32
    print(f"Starting model training for {epochs} epochs with batch size {batch_size}...")
    history = cnn_lstm_model.train_model(
        X_train, static_train, y_train,
        X_val, static_val, y_val,
        epochs=epochs,
        batch_size=batch_size,

    )
    print("Model training completed.")


    # Evaluate the model on the test data
    print("Evaluating model on test data...")
    loss, accuracy, auc, precision, recall = cnn_lstm_model.evaluate_model(
        X_test, static_test, y_test
    )
    print("Model evaluation completed.")

    # Generate predictions and classification report
    y_pred_prob = cnn_lstm_model.predict(X_test, static_test)
    y_pred_binary = (y_pred_prob > 0.5).astype(int)

    print("\n--- Classification Report ---")
    print(classification_report(y_test, y_pred_binary, target_names=['No Mortality', 'Mortality']))


    # Return evaluation metrics
    return {
        'loss': loss,
        'accuracy': accuracy,
        'auc': auc,
        'precision': precision,
        'recall': recall
    }

In [None]:
dataset_paths = [
    '../data/processed/cnn_lstm_48hour_data.npz',
    '../data/processed/cnn_lstm_6hour_data.npz',
    '../data/processed/cnn_lstm_12hour_data.npz',
    '../data/processed/cnn_lstm_36hour_data.npz'
]

print("Dataset paths list created:")
for path in dataset_paths:
    print(path)

Dataset paths list created:
/content/sample_data/cnn_lstm_48hour_data.npz
/content/sample_data/cnn_lstm_6hour_data.npz
/content/sample_data/cnn_lstm_12hour_data.npz
/content/sample_data/cnn_lstm_36hour_data.npz


### 5. Experiment Loop Across Datasets

Define the list of dataset paths and run the CNN-LSTM experiment for each dataset. Collect evaluation metrics for comparison across different temporal windows.

In [None]:
# List to store results for each dataset
all_results = []

# Iterate through the dataset paths and run experiments
for dataset_path in dataset_paths:
    print(f"\n--- Running experiment for {dataset_path} ---")
    results = run_experiment(dataset_path)
    results['dataset'] = dataset_path  
    all_results.append(results)

print("\n--- All experiments finished ---")


--- Running experiment for /content/sample_data/cnn_lstm_48hour_data.npz ---
Data loaded from: /content/sample_data/cnn_lstm_48hour_data.npz
X_train shape: (4406, 48, 28), static_train shape: (4406, 7), y_train shape: (4406,)
Time series input shape: (48, 28), Static input shape: (7,)
CNN_LSTM_Model instantiated.
Model compiled successfully.
Starting model training for 50 epochs with batch size 32...
Training model...
Epoch 1/50
[1m135/138[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 10ms/step - AUC: 0.7126 - Precision: 0.6603 - Recall: 0.6273 - accuracy: 0.6546 - loss: 0.6440
Epoch 1: val_loss improved from inf to 0.64264, saving model to best_model.keras
[1m138/138[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 16ms/step - AUC: 0.7147 - Precision: 0.6616 - Recall: 0.6301 - accuracy: 0.6563 - loss: 0.6423 - val_AUC: 0.7693 - val_Precision: 0.2545 - val_Recall: 0.7978 - val_accuracy: 0.6469 - val_loss: 0.6426
Epoch 2/50
[1m134/138[0m [32m━━━━━━━━━━━━━━━━━━━[0m[3

In [32]:
# Create a pandas DataFrame from the results
results_df = pd.DataFrame(all_results)

# Reorder columns for better readability
results_df = results_df[['dataset', 'loss', 'accuracy', 'auc', 'precision', 'recall']]

# Display the table
print("\n--- Experiment Results Summary ---")
display(results_df)


--- Experiment Results Summary ---


Unnamed: 0,dataset,loss,accuracy,auc,precision,recall
0,/content/sample_data/cnn_lstm_48hour_data.npz,0.442445,0.82625,0.797657,0.375,0.378378
1,/content/sample_data/cnn_lstm_6hour_data.npz,0.533155,0.77375,0.718315,0.291667,0.441441
2,/content/sample_data/cnn_lstm_12hour_data.npz,0.542132,0.72375,0.7316,0.272727,0.594595
3,/content/sample_data/cnn_lstm_36hour_data.npz,0.465912,0.775,0.817832,0.330049,0.603604


### 6. Results Summary Table

 The results obtained from training and evaluating the CNN-LSTM model on time windows Compare the performance metrics (loss, accuracy, AUC, precision, and recall) to understand the impact of using time series data of different lengths (6, 12, 36, and 48 hours) on the model's ability to predict ICU mortality.

**Experiment Results Summary:**

| Time Window                                      | loss     | accuracy | auc      | precision | recall   |
|------------------------------------------------- |----------|----------|----------|-----------|----------|
| 48hr                       | 0.421448 | 0.83875  | 0.819885 | 0.404255  | 0.342342 |
| 6hr                        | 0.528127 | 0.74375  | 0.741328 | 0.282407  | 0.549550 |
| 12hr                       | 0.538580 | 0.73875  | 0.752736 | 0.285088  | 0.585586 |
| 36hr                       | 0.485300 | 0.79125  | 0.824743 | 0.354167  | 0.612613 |

Based on the analysis of the evaluation metrics, the **`36hr`** time window appears to be the most promising for this ICU mortality prediction task. While the 48-hour dataset shows the highest accuracy and precision, the 36-hour dataset achieves a strong balance with the highest Recall (0.612613) and the highest AUC (0.824743). In a medical context, maximizing Recall is often crucial to ensure that as many true positive cases (patients who will unfortunately experience mortality) are identified as possible, even if it means a slight trade-off in precision. The high AUC for the 36-hour dataset also indicates good overall discriminative performance.

### 7. Cross-Validation on  Dataset

**Why Cross-Validation?**

Cross-validation is a robust statistical method used to assess the generalizability and reliability of machine learning models. In this notebook, stratified k-fold cross-validation is applied to the best-performing 36-hour dataset to ensure that the CNN-LSTM model's performance metrics (accuracy, AUC, precision, recall, loss) are not biased by a single train-test split.

**Key Reasons for Cross-Validation:**
- **Generalization:** It helps estimate how well the model will perform on unseen data, reducing the risk of overfitting.
- **Robustness:** By averaging results across multiple folds, we obtain more stable and reliable performance metrics.
- **Class Balance:** Stratified k-fold ensures each fold maintains the same proportion of mortality cases, which is crucial for medical datasets with class imbalance.

**Process:**
- The combined training and validation data are split into k folds (here, k=5).
- For each fold, the model is trained on k-1 folds and validated on the remaining fold.
- Performance metrics are recorded for each fold and then averaged to provide a comprehensive evaluation.

This approach provides a more trustworthy estimate of the model's true predictive power and helps guide model selection and hyperparameter tuning.

In [None]:
# Load the 36-hour dataset
data_path_36hour = '../data/processed/cnn_lstm_36hour_data.npz'
X_train_36, X_val_36, X_test_36, static_train_36, static_val_36, static_test_36, y_train_36, y_val_36, y_test_36 = load_icu_data(data_path_36hour)

# Reshape y arrays to be 1D for StratifiedKFold split
y_train_36_reshaped = y_train_36.flatten()
y_val_36_reshaped = y_val_36.flatten()


# Combine training and validation data for cross-validation
X_combined_36 = np.vstack((X_train_36, X_val_36))
static_combined_36 = np.vstack((static_train_36, static_val_36))
y_combined_36 = np.hstack((y_train_36_reshaped, y_val_36_reshaped))


# Verify the shapes
print("Shape of combined time-series data (36-hour):", X_combined_36.shape)
print("Shape of combined static data (36-hour):", static_combined_36.shape)
print("Shape of combined target variable (36-hour):", y_combined_36.shape)

# Determine input shapes for the model from the combined data
time_series_input_shape_36 = (X_combined_36.shape[1], X_combined_36.shape[2])
static_input_shape_36 = (static_combined_36.shape[1],)
print(f"Time series input shape (36-hour): {time_series_input_shape_36}, Static input shape (36-hour): {static_input_shape_36}")

Shape of combined time-series data (36-hour): (5046, 36, 28)
Shape of combined static data (36-hour): (5046, 7)
Shape of combined target variable (36-hour): (5046,)
Time series input shape (36-hour): (36, 28), Static input shape (36-hour): (7,)


In [34]:
# Define the number of splits for k-fold cross-validation
n_splits = 5
print(f"Number of splits for k-fold cross-validation: {n_splits}")

# Instantiate StratifiedKFold
kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)

# Initialize empty lists to store the evaluation metrics for each fold
fold_loss_36 = []
fold_accuracy_36 = []
fold_auc_36 = []
fold_precision_36 = []
fold_recall_36 = []

#  Start a loop that iterates through the splits
for fold, (train_index, val_index) in enumerate(kf.split(X_combined_36, y_combined_36)):
    #  Print the current fold number
    print(f"\n--- Starting Fold {fold + 1}/{n_splits} on 36-hour data ---")

    # Split data into training and validation sets for the current fold
    X_train_fold, X_val_fold = X_combined_36[train_index], X_combined_36[val_index]
    static_train_fold, static_val_fold = static_combined_36[train_index], static_combined_36[val_index]
    y_train_fold, y_val_fold = y_combined_36[train_index], y_combined_36[val_index]

    # Build a fresh model for the current fold
    model = build_cnn_lstm_model(time_series_input_shape_36, static_input_shape_36)

    # Compile the model inside the loop with the correct metrics
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
        loss='binary_crossentropy',
        metrics=[
            'accuracy',
            tf.keras.metrics.AUC(name='AUC'),
            tf.keras.metrics.Precision(name='Precision'),
            tf.keras.metrics.Recall(name='Recall')
        ]
    )

    # Define callbacks for the current fold
    fold_early_stopping = EarlyStopping(
        monitor='val_loss',
        patience=10,
        restore_best_weights=True
    )

    # Train the model on the training data for the current fold
    print("Training model for current fold...")
    history_fold = model.fit(
        x=[X_train_fold, static_train_fold],
        y=y_train_fold,
        epochs=100, # Use a sufficiently large number of epochs, EarlyStopping will manage the actual number
        batch_size=32,
        validation_data=([X_val_fold, static_val_fold], y_val_fold),
        callbacks=[fold_early_stopping],
        verbose=0 # Set to 1 to see training progress per epoch
    )
    print("Training finished for current fold.")

    # Evaluate the model on the validation data for the current fold
    print("Evaluating model for current fold...")
    loss, accuracy, auc, precision, recall = model.evaluate(
        x=[X_val_fold, static_val_fold],
        y=y_val_fold,
        verbose=0
    )
    print(f"Fold {fold + 1} - Loss: {loss:.4f}, Accuracy: {accuracy:.4f}, AUC: {auc:.4f}, Precision: {precision:.4f}, Recall: {recall:.4f}")

    # Store the metrics
    fold_loss_36.append(loss)
    fold_accuracy_36.append(accuracy)
    fold_auc_36.append(auc)
    fold_precision_36.append(precision)
    fold_recall_36.append(recall)


print("\n--- Cross-validation finished on 36-hour data ---")

Number of splits for k-fold cross-validation: 5

--- Starting Fold 1/5 on 36-hour data ---
Training model for current fold...
Training finished for current fold.
Evaluating model for current fold...
Fold 1 - Loss: 0.3587, Accuracy: 0.8762, AUC: 0.9435, Precision: 0.8236, Recall: 0.9259

--- Starting Fold 2/5 on 36-hour data ---
Training model for current fold...
Training finished for current fold.
Evaluating model for current fold...
Fold 2 - Loss: 0.3428, Accuracy: 0.8791, AUC: 0.9446, Precision: 0.8636, Recall: 0.8712

--- Starting Fold 3/5 on 36-hour data ---
Training model for current fold...
Training finished for current fold.
Evaluating model for current fold...
Fold 3 - Loss: 0.3110, Accuracy: 0.8890, AUC: 0.9545, Precision: 0.8914, Recall: 0.8603

--- Starting Fold 4/5 on 36-hour data ---
Training model for current fold...
Training finished for current fold.
Evaluating model for current fold...
Fold 4 - Loss: 0.2994, Accuracy: 0.8811, AUC: 0.9605, Precision: 0.9005, Recall: 0.8

In [None]:
# Determine input shapes for the final model based on the combined 36-hour data
time_series_input_shape_final = (X_combined_36.shape[1], X_combined_36.shape[2])
static_input_shape_final = (static_combined_36.shape[1],)

final_model = build_cnn_lstm_model(time_series_input_shape_final, static_input_shape_final)

# Compile the final model
final_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss='binary_crossentropy',
    metrics=[
        'accuracy',
        tf.keras.metrics.AUC(name='AUC'),
        tf.keras.metrics.Precision(name='Precision'),
        tf.keras.metrics.Recall(name='Recall')
    ]
)


# Define Early Stopping callback for the final training
final_early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=10,
    restore_best_weights=True
)

print("Training final model on combined training and validation data (36-hour dataset)...")

epochs_for_final_training = 25


history_final = final_model.fit(
    x=[X_combined_36, static_combined_36],
    y=y_combined_36,
    epochs=epochs_for_final_training,
    batch_size=32,
    verbose=1
)

print("Final model training finished.")

# Save the final model
model_save_path_final = '../models/base_model_36hour.keras'
final_model.save(model_save_path_final)
print(f"Final model saved to: {model_save_path_final}")

Training final model on combined training and validation data (36-hour dataset)...
Epoch 1/25
[1m158/158[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 9ms/step - AUC: 0.6912 - Precision: 0.6270 - Recall: 0.5017 - accuracy: 0.6470 - loss: 0.6471
Epoch 2/25
[1m158/158[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 9ms/step - AUC: 0.8397 - Precision: 0.7386 - Recall: 0.7687 - accuracy: 0.7734 - loss: 0.5167
Epoch 3/25
[1m158/158[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 9ms/step - AUC: 0.8711 - Precision: 0.7566 - Recall: 0.8092 - accuracy: 0.7969 - loss: 0.4688
Epoch 4/25
[1m158/158[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 10ms/step - AUC: 0.8923 - Precision: 0.7658 - Recall: 0.8348 - accuracy: 0.8107 - loss: 0.4340
Epoch 5/25
[1m158/158[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 9ms/step - AUC: 0.9111 - Precision: 0.7966 - Recall: 0.8405 - accuracy: 0.8314 - loss: 0.3995
Epoch 6/25
[1m158/158[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m

In [36]:
# Evaluate the final model on the test set of the 36-hour data
loss_final, accuracy_final, auc_final, precision_final, recall_final = final_model.evaluate(
    x=[X_test_36, static_test_36],
    y=y_test_36,
    verbose=0
)

print("--- Final Model Evaluation on Test Data (36-hour dataset) ---")
print(f"Test Loss: {loss_final:.4f}")
print(f"Test Accuracy: {accuracy_final:.4f}")
print(f"Test AUC: {auc_final:.4f}")
print(f"Test Precision: {precision_final:.4f}")
print(f"Test Recall: {recall_final:.4f}")

--- Final Model Evaluation on Test Data (36-hour dataset) ---
Test Loss: 0.7402
Test Accuracy: 0.7850
Test AUC: 0.7563
Test Precision: 0.3129
Test Recall: 0.4595


In [37]:
# Aggregate and present the cross-validation results for the 36-hour dataset
print("\n--- Cross-validation Results (36-hour dataset) ---")
print(f"Average Loss: {np.mean(fold_loss_36):.4f} +/- {np.std(fold_loss_36):.4f}")
print(f"Average Accuracy: {np.mean(fold_accuracy_36):.4f} +/- {np.std(fold_accuracy_36):.4f}")
print(f"Average AUC: {np.mean(fold_auc_36):.4f} +/- {np.std(fold_auc_36):.4f}")
print(f"Average Precision: {np.mean(fold_precision_36):.4f} +/- {np.std(fold_precision_36):.4f}")
print(f"Average Recall: {np.mean(fold_recall_36):.4f} +/- {np.std(fold_recall_36):.4f}")


--- Cross-validation Results (36-hour dataset) ---
Average Loss: 0.3307 +/- 0.0220
Average Accuracy: 0.8787 +/- 0.0068
Average AUC: 0.9488 +/- 0.0074
Average Precision: 0.8632 +/- 0.0298
Average Recall: 0.8739 +/- 0.0314


### 8. Hyperparameter Tuning

Hyperparameter tuning is a critical step in deep learning model development. It involves systematically searching for the best combination of model parameters (such as number of layers, units, dropout rates, and learning rate) to maximize predictive performance.


**Why Tune Hyperparameters?**

- Deep learning models have many settings that can dramatically affect their accuracy, generalization, and robustness.
- Manual selection is often suboptimal and time-consuming; automated tuning finds better configurations efficiently.


**How is Tuning Performed Here?**

- We use KerasTuner's Hyperband algorithm, which efficiently explores the hyperparameter space using adaptive resource allocation and early stopping.
- The tuner searches for the best model based on validation recall, which is crucial for medical tasks where identifying true positives is important.
- The process includes varying the number of filters, LSTM units, dense layer sizes, dropout rates, and learning rate.


**Impact:**

- Well-tuned models are more accurate, less prone to overfitting, and better suited for deployment in real-world clinical settings.
- The best hyperparameters found are used to build and evaluate the final enhanced model.

In [None]:
def build_tunable_cnn_lstm_model(hp: HyperParameters):
    """Builds a CNN-LSTM model with tunable hyperparameters."""

    # Time-series branch
    time_series_input = Input(shape=time_series_input_shape_final, name='time_series_input')

    # Tunable Conv1D layers
    filters_conv1d_1 = hp.Int('filters_conv1d_1', min_value=32, max_value=128, step=32)
    x = Conv1D(filters_conv1d_1, 3, activation='relu', padding='same', kernel_regularizer=l2(1e-4))(time_series_input)
    x = MaxPooling1D(pool_size=2)(x)
    x = Dropout(hp.Float('dropout_1', min_value=0.1, max_value=0.5, step=0.1))(x)

    filters_conv1d_2 = hp.Int('filters_conv1d_2', min_value=64, max_value=256, step=32)
    x = Conv1D(filters_conv1d_2, 3, activation='relu', padding='same', kernel_regularizer=l2(1e-4))(x)
    x = MaxPooling1D(pool_size=1)(x)  # Keep pool_size=1 for short sequences
    x = Dropout(hp.Float('dropout_2', min_value=0.1, max_value=0.5, step=0.1))(x)

    # Tunable LSTM layer
    lstm_units = hp.Int('lstm_units', min_value=32, max_value=128, step=32)
    x = LSTM(lstm_units, return_sequences=False, kernel_regularizer=l2(1e-4))(x)
    x = Dropout(hp.Float('dropout_3', min_value=0.1, max_value=0.5, step=0.1))(x)

    # Static branch
    static_input = Input(shape=static_input_shape_final, name='static_input')

    # Tunable Dense layers for static branch
    dense_units_static_1 = hp.Int('dense_units_static_1', min_value=16, max_value=64, step=16)
    y = Dense(dense_units_static_1, activation='relu')(static_input)
    dense_units_static_2 = hp.Int('dense_units_static_2', min_value=8, max_value=32, step=8)
    y = Dense(dense_units_static_2, activation='relu')(y)

    # Merge branches
    combined = Concatenate()([x, y])

    # Tunable Dense layers after concatenation
    dense_units_combined_1 = hp.Int('dense_units_combined_1', min_value=32, max_value=128, step=32)
    z = Dense(dense_units_combined_1, activation='relu')(combined)
    z = Dropout(hp.Float('dropout_4', min_value=0.1, max_value=0.5, step=0.1))(z)

    dense_units_combined_2 = hp.Int('dense_units_combined_2', min_value=16, max_value=64, step=16)
    z = Dense(dense_units_combined_2, activation='relu')(z)

    # Output layer
    output = Dense(1, activation='sigmoid', name='output')(z)

    # Build the model
    model = Model(inputs=[time_series_input, static_input], outputs=output)

    # Compile the model with a tunable learning rate
    learning_rate = hp.Float('learning_rate', min_value=1e-4, max_value=1e-2, sampling='LOG')
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
        loss='binary_crossentropy',
        metrics=[
            'accuracy',
            tf.keras.metrics.AUC(name='AUC'),
            tf.keras.metrics.Precision(name='Precision'),
            tf.keras.metrics.Recall(name='Recall')
        ]
    )

    return model

In [None]:

# Instantiate the Hyperband tuner
tuner = Hyperband(
    hypermodel=build_tunable_cnn_lstm_model,
    objective='val_Recall',
    max_epochs=50,
    factor=3,
    directory='keras_tuner_dir',
    project_name='cnn_lstm_tuning_recall',
    overwrite=True
)

print("Hyperband tuner instantiated with objective 'val_Recall'.")

Hyperband tuner instantiated with objective 'val_Recall'.


In [40]:
# Start the hyperparameter search
print("Starting hyperparameter search...")
tuner.search(
    x=[X_combined_36, static_combined_36],
    y=y_combined_36,
    epochs=50,
    batch_size=32,
    validation_split=0.2,
    callbacks=[tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)]
)

print("Hyperparameter search finished.")

Trial 90 Complete [00h 00m 40s]
val_Recall: 0.8649237751960754

Best val_Recall So Far: 0.9586056470870972
Total elapsed time: 00h 29m 09s
Hyperparameter search finished.


### 9. Best Model Evaluation

Retrieve and evaluate the best model found by KerasTuner on the held-out test set. This step provides final performance metrics for the optimized model.

In [41]:
# Get the best hyperparameters
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]

print("Best Hyperparameters found:")
print(best_hps.values)

# Build the best model
enhanced_model = tuner.get_best_models(num_models=1)[0]

print("\nBest model built successfully.")

Best Hyperparameters found:
{'filters_conv1d_1': 32, 'dropout_1': 0.5, 'filters_conv1d_2': 256, 'dropout_2': 0.2, 'lstm_units': 96, 'dropout_3': 0.1, 'dense_units_static_1': 32, 'dense_units_static_2': 8, 'dense_units_combined_1': 32, 'dropout_4': 0.1, 'dense_units_combined_2': 16, 'learning_rate': 0.0002832574472375902, 'tuner/epochs': 17, 'tuner/initial_epoch': 6, 'tuner/bracket': 2, 'tuner/round': 1, 'tuner/trial_id': '0065'}

Best model built successfully.


In [42]:
# Evaluate the enhanced model on the test set
print("Evaluating the enhanced model on the test set...")
loss_enhanced, accuracy_enhanced, auc_enhanced, precision_enhanced, recall_enhanced = enhanced_model.evaluate(
    x=[X_test_36, static_test_36],  # Use the 36-hour test data
    y=y_test_36,
    verbose=0
)

print("\n--- Enhanced Model Evaluation on Test Data ---")
print(f"Test Loss: {loss_enhanced:.4f}")
print(f"Test Accuracy: {accuracy_enhanced:.4f}")
print(f"Test AUC: {auc_enhanced:.4f}")
print(f"Test Precision: {precision_enhanced:.4f}")
print(f"Test Recall: {recall_enhanced:.4f}")

Evaluating the enhanced model on the test set...

--- Enhanced Model Evaluation on Test Data ---
Test Loss: 0.8503
Test Accuracy: 0.6025
Test AUC: 0.7726
Test Precision: 0.2353
Test Recall: 0.8288


### 10. Model Performance Comparison

Compare the evaluation results of the base model and the enhanced (tuned) model. This section summarizes the trade-offs between accuracy, precision, recall, and AUC for clinical decision-making.

In [43]:
# Create a dictionary to store the results for comparison
comparison_results = {
    'Model': ['Base Model (36-hour)', 'Enhanced Model (Tuned)'],
    'Loss': [loss_final, loss_enhanced],
    'Accuracy': [accuracy_final, accuracy_enhanced],
    'AUC': [auc_final, auc_enhanced],
    'Precision': [precision_final, precision_enhanced],
    'Recall': [recall_final, recall_enhanced]
}

# Create a pandas DataFrame for easy comparison
comparison_df = pd.DataFrame(comparison_results)

print("\n--- Model Performance Comparison ---")
display(comparison_df)


--- Model Performance Comparison ---


Unnamed: 0,Model,Loss,Accuracy,AUC,Precision,Recall
0,Base Model (36-hour),0.7402,0.785,0.756306,0.312883,0.459459
1,Enhanced Model (Tuned),0.85032,0.6025,0.77265,0.235294,0.828829


### 11. Save Enhanced Model

Save the enhanced CNN-LSTM model with the best hyperparameters for future inference or deployment.

In [None]:
# Save the enhanced model
enhanced_model_save_path = '../models/enhanced_model_36hour.keras'
enhanced_model.save(enhanced_model_save_path)
print(f"Enhanced model saved to: {enhanced_model_save_path}")

Enhanced model saved to: enhanced_model_36hour.keras


## 12. Compare Base and Enhanced Model Performance

Compare the evaluation results of the base model (trained on the 36-hour data without tuning) and the enhanced model (trained with the best hyperparameters found by KerasTuner).

Based on the comparison of the Base Model and the Enhanced Model:

*   **Loss:** The Base Model has a lower loss (0.74020) compared to the Enhanced Model (0.85032), suggesting slightly better fit to the training data in terms of minimizing binary cross-entropy.

*   **Accuracy:** The Base Model has significantly higher accuracy (0.7850) than the Enhanced Model (0.6025), indicating better overall correct classification.

*   **AUC (Area Under the ROC Curve):** The Enhanced Model has a slightly higher AUC (0.772650) compared to the Base Model (0.756306), showing a marginal improvement in discriminative power.

*   **Precision:** The Base Model has higher precision (0.312883) than the Enhanced Model (0.235294), meaning it makes fewer false positive predictions.

*   **Recall:** The Enhanced Model has a significantly higher recall (0.828829) compared to the Base Model (0.459459), indicating it is much better at identifying actual mortality cases (fewer false negatives).

**Interpretation:**

The hyperparameter tuning improved Recall significantly, which is often critical in medical prediction tasks to avoid missing true positive cases. However, this came at the cost of lower Accuracy and Precision. The Enhanced Model's slightly higher AUC suggests improved overall discrimination despite the trade-off in other metrics. The choice between the models depends on the specific clinical priorities regarding minimizing false negatives (favoring Enhanced Model) versus minimizing false positives or maximizing overall accuracy (favoring Base Model).