# Model Improvements

- Expand map_condition to include additional conditions (e.g., 'intoxicated').
- Increase the model complexity and add regularization.
- Use cross-validation to assess performance more robustly.
- Monitor more metrics and add model checkpointing for optimal weight saving.
- Visualize training progress to check for overfitting or underfitting.

## Expand 'map_condition'

- aiming to detect intoxicated states
- map 'normal' and 'resting' to 0, 'exercising' to 1 and 'intoxicated' to 2
- 

In [None]:
high_bpm_threshold = 110 if self._condition in ['resting', 'normal'] else 200
if self._condition == 'intoxicated':
    high_bpm_threshold = 120  # Example threshold for intoxication


In [None]:
def map_condition(condition):
    if condition in ['normal', 'resting']:
        return 0
    elif condition == 'exercising':
        return 1
    elif condition == 'intoxicated':
        return 2
    return -1  # Default case for unexpected labels 


Use the ML Model for Intoxication Prediction:

If your ML model is trained on labeled data that includes 'intoxicated' versus 'normal' conditions, you can extend predict_condition to detect this specifically.
You might need to add 'intoxicated' as a possible label output from the model, then modify predict_condition to handle this outcome:

In [None]:
prediction = self._model(input_scaled_tensor)
condition = 'intoxicated' if tf.argmax(prediction, axis=1).numpy()[0] == 2 else 'normal'


Increase Model Complexity for Better Classification:

Since your model has only two hidden layers, adding more layers or increasing the number of neurons in each layer may improve performance, especially if you have more complex data. Consider adding dropout layers to avoid overfitting.

In [None]:
model = keras.Sequential([
    layers.Dense(128, activation='relu', input_dim=X_train.shape[1]),
    layers.Dropout(0.3),  # Dropout for regularization
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.3),
    layers.Dense(32, activation='relu'),
    layers.Dense(3, activation='softmax')  # Adjust output layer to 3 if adding a new class
])


Optimize Hyperparameters:

Experiment with the optimizer, learning rate, batch size, and number of epochs. For example, you could try the Adam optimizer with a lower learning rate or use a learning rate scheduler.

In [None]:
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.0005), 
              loss='categorical_crossentropy', metrics=['accuracy'])


Evaluate Model Performance with Cross-Validation:

Instead of a single train-test split, use K-fold cross-validation to better understand the model’s performance across different splits.

In [None]:
from sklearn.model_selection import KFold
kfold = KFold(n_splits=5, shuffle=True, random_state=42)
results = []
for train_idx, test_idx in kfold.split(X):
    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y[train_idx], y[test_idx]
    # Train model and evaluate accuracy for each fold
    history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=50, callbacks=[early_stopping], verbose=0)
    score = model.evaluate(X_test, y_test, verbose=0)
    results.append(score[1])  # Save accuracy score
print("Cross-validation accuracy scores:", results)
print("Mean cross-validation accuracy:", np.mean(results))


Monitor Additional Metrics:

Use metrics like Precision, Recall, and F1-score to understand class-specific performance, especially for an imbalanced dataset where one condition (e.g., intoxicated) may be less frequent.

In [None]:
from sklearn.metrics import classification_report
print(classification_report(y_test_labels, y_pred_labels, target_names=['Normal', 'Active', 'Intoxicated']))


Add Model Checkpointing:

Save the model at its best-performing epoch with ModelCheckpoint, so if training takes long, you don’t lose the best weights due to a later decrease in performance.

In [None]:
from tensorflow.keras.callbacks import ModelCheckpoint
checkpoint = ModelCheckpoint('best_model.keras', monitor='val_accuracy', save_best_only=True, mode='max')
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=50, callbacks=[early_stopping, checkpoint])


Visualize Training Performance:

Plot the training and validation accuracy and loss over epochs to identify potential overfitting and check if early stopping is effective.

In [None]:
import matplotlib.pyplot as plt
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()


Consider Data Augmentation or Synthetic Data:

If you have limited samples for some conditions (e.g., intoxicated), consider data augmentation techniques (e.g., adding noise to BPM values) or generating synthetic data to balance the classes.

Modify code


In [None]:
import os
import pandas as pd

# Define the directories
normal_data_dir = 'path/to/normal/data'
intoxicated_data_dir = 'path/to/intoxicated/data'

# Function to load and label data from a directory
def load_data_from_directory(directory, label):
    data = []
    for filename in os.listdir(directory):
        if filename.endswith('.csv'):  # Adjust if your files have a different extension
            file_path = os.path.join(directory, filename)
            df = pd.read_csv(file_path)
            df['Category'] = label  # Add a new column for the label
            data.append(df)
    return pd.concat(data, ignore_index=True)

# Load normal and intoxicated data, labeling each
normal_data = load_data_from_directory(normal_data_dir, label=0)  # 0 for normal
intoxicated_data = load_data_from_directory(intoxicated_data_dir, label=1)  # 1 for intoxicated

# Combine both datasets
combined_data = pd.concat([normal_data, intoxicated_data], ignore_index=True)


Step 2: Prepare Data for Model Training
After loading and labeling the data, you can proceed with preprocessing (scaling, splitting, etc.) just like in your initial code.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
from tensorflow.keras.utils import to_categorical

# Feature scaling
scaler = StandardScaler()
X = scaler.fit_transform(combined_data[['BPM min', 'BPM max']])
y = to_categorical(combined_data['Category'])  # Convert labels to categorical

# Save the scaler for future use
import joblib
joblib.dump(scaler, 'scaler_for_model.joblib')

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


Step 3: Model Training (As in Your Original Code)
After preparing the data, use your existing model training code with early stopping and other improvements.

In [None]:
from tensorflow.keras import layers, models
from tensorflow.keras.callbacks import EarlyStopping

# Define EarlyStopping
early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

# Define and compile the model
model = models.Sequential([
    layers.Dense(128, activation='relu', input_dim=X_train.shape[1]),
    layers.Dropout(0.3),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.3),
    layers.Dense(32, activation='relu'),
    layers.Dense(2, activation='softmax')  # 2 output neurons for 'normal' and 'intoxicated'
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
history = model.fit(X_train, y_train, epochs=50, validation_data=(X_test, y_test), callbacks=[early_stopping])

# Save the model
model.save('intoxication_detection_model.keras')


Step 4: Verify Data Loading and Labeling
To confirm that the data has been loaded correctly, check the distribution of labels:
This should show counts for each label (e.g., 0 for normal and 1 for intoxicated), confirming that the data from each directory was labeled and combined correctly.

In [None]:
print("Label distribution in combined data:")
print(combined_data['Category'].value_counts())


In [None]:
import os
import numpy as np
import pandas as pd
import wfdb  # For handling ECG files
from wfdb.processing import gqrs_detect  # R-peak detection
from datetime import datetime

# Directory containing the intoxicated ECG files
intoxicated_data_dir = 'path/to/intoxicated/data'

# Prepare a list to hold the extracted data
intoxicated_data = []

# Process each ECG file in the directory
for filename in os.listdir(intoxicated_data_dir):
    if filename.endswith('.dat'):
        # Construct the record name (without the .dat extension)
        record_name = os.path.join(intoxicated_data_dir, filename[:-4])

        try:
            # Load ECG data
            record = wfdb.rdrecord(record_name)
            fs = record.fs  # Sampling frequency

            # Detect R-peaks
            r_peaks = gqrs_detect(sig=record.p_signal[:, 0], fs=fs)  # Assuming the first channel

            # Calculate RR intervals and BPM
            rr_intervals = np.diff(r_peaks) / fs  # Convert samples to seconds
            bpm_values = 60 / rr_intervals

            # Calculate BPM statistics
            min_bpm = np.min(bpm_values)
            max_bpm = np.max(bpm_values)
            avg_bpm = np.mean(bpm_values)

            # Append data to list
            intoxicated_data.append({
                "ID": filename[:-4],  # Use the filename (without extension) as ID
                "Date": datetime.now().date(),  # You can use actual date if available
                "BPM min": min_bpm,
                "BPM max": max_bpm,
                "BPM avg": avg_bpm,
                "Condition": "intoxicated"
            })

        except Exception as e:
            print(f"Error processing {filename}: {e}")

# Convert the data list to a DataFrame
intoxicated_df = pd.DataFrame(intoxicated_data)

# Save the DataFrame to a CSV file
output_csv_path = 'intoxicated_bpm_data.csv'
intoxicated_df.to_csv(output_csv_path, index=False)

print(f"Intoxicated BPM data saved to {output_csv_path}")
