# Machine Learning for Stress Detection on WESAD

In this final notebook you will be guided through the steps to build a simple ML classifier and train it on the ECG features you previously computed. This notebook comprises of 2 parts:

1. Experimental Setup, data loading, classifier initialization
2. Model training, validation, and discussion of results

Throughout this process we will make use of the ``scikit-learn`` library that streamlines simple ML experiments. Let's start by loading our features:

In [None]:
import numpy as np

# TODO: Load features and labels files
features = ...
labels = ...

print("Features shape:", features.shape)
print("Labels shape:", labels.shape)

As you may recall, we have 90 samples available from 15 subjects. Each sample has 2 values, namely heart rate (HR) and RMSSD values over 5-minute ECG intervals. The labels file also includes 2 values, the first being the subject ID and the other the binary label (0: baseline, 1: stress). We can now proceed to split the data into a training and test set.

To be fair, we should not test a model on data from the same recording. Hence, we will choose a random subject to comprise our test set:

In [None]:
test_subject = 10

# TODO: Isolate the test subject's samples using np.where
test_subject_indices = ...

# Isolate the test subject's samples
X_test = features[test_subject_indices[0]]
y_test = labels[test_subject_indices[0], 1]

print("X_test shape:", X_test.shape)
print("y_test shape:", y_test.shape)

# TODO: Isolate the training data
train_subject_indices = ...
X_train = ...
y_train = ...

print("X_train shape:", X_train.shape)
print("y_train shape:", y_train.shape)

Now that we have split our data, we can apply some transformations to ease model training. In our case, our feature vector is already simple, so we will just apply z-score normalization:

In [None]:
# TODO: Apply z-score normalization to the features
mean = ...
std = ...

X_train = ...
X_test = ...

As you noticed, we normalized the test data in the same manner as the train data, with the train data statistics. This is important since the model is going to be trained based on those statistics. It is time now to define a classifier. We are going to use an SVM model, which is a popular model choice for narrow tasks. Let us initialize an SVM model from ``scikit-learn``:

In [None]:
from sklearn.svm import SVC

# Let us initialize an SVM classifier
svm = SVC(kernel='rbf', gamma=0.01, random_state=42)

... You can now refer back to your assignment and return to implement the second part of the notebook.

In this second part, we will train the SVM classifier and discuss the experimental results. Training the model is as simple are calling the ``fit()`` function:

In [None]:
# TODO: Fit the SVM classifier on y_train using X_train
...

While training an ML model can take a substantial amount of time, our dataset is relatively small, hence training finished almost instantly. Let's see how it performs on the unseen subject:

In [None]:
def accuracy(y_true, y_pred):
    # TODO: write a simple function to calculate accuracy %
    acc = ...
    return acc

predictions = svm.predict(X_test)
print("Predictions:", predictions)
print("True labels:", y_test)

print("Accuracy:", accuracy(y_test, predictions), "%")

Should everything work fine, you should get absolute correct predictions! This seems to be a very robust classifier and can predict if subject 10 is in a baseline or stressful condition. But is this true in general? Of course, we will never be able to know. However, we should at least get an estimate by testing on all different subjects that we have available.

To do that, we can apply leave-one-subject-out (LOSO) cross-validation. According to this framework, the data are groupped per subject ID and, in each round, a single subject is used for testing, whereas the rest are used for training. We can implement that neatly with ``scikit-learn``:

In [None]:
from sklearn.model_selection import LeaveOneGroupOut

logo = LeaveOneGroupOut()
groups = labels[:, 0]

all_predictions = np.zeros(labels.shape[0])
for train_index, test_index in logo.split(features, labels[:, 1], groups):

    # TODO: Isolate the training and test data
    X_train, X_test = ...
    y_train, y_test = ...

    # TODO: Apply z-score normalization to the features
    mean = ...
    std = ...
    X_train = ...
    X_test = ...

    # TODO: Fit the SVM classifier on y_train using X_train
    ...

    # TODO: Make predictions using the trained SVM
    predictions = ...
    all_predictions[test_index] = predictions

print("Overall accuracy:", accuracy(labels[:, 1], all_predictions), "%")

How would you interpret this result? What would you change in the setup? In specific, how would you explain the model performance in terms of:
* label ratio (class imbalance)
* betweeen-subjects variability

You can use additional cells to check those aspects of the experiment. Classification accuracy is usually not the most representative way of assesing the capabilities of a model. Below you can compute additional useful metrics that may assist you in analyzing the SVM model outcome:

In [None]:
# The main variables of classification performance:
true_positives = np.sum((labels[:, 1] == 1) & (all_predictions == 1))
false_positives = np.sum((labels[:, 1] == 0) & (all_predictions == 1))
true_negatives = np.sum((labels[:, 1] == 0) & (all_predictions == 0))
false_negatives = np.sum((labels[:, 1] == 1) & (all_predictions == 0))

print("True positives:", true_positives)
print("False positives:", false_positives)
print("True negatives:", true_negatives)
print("False negatives:", false_negatives)

# TODO: Calculate the following performance metrics

# Precision: ratio of correctly predicted positive observations to the total predicted positives
precision = ...

# Recall: ratio of correctly predicted positive observations to all observations of this class
recall = ...

# F1 score: weighted average of precision and recall
f1 = 2 * (precision * recall) / (precision + recall)

print("Precision:", precision)
print("Recall:", recall)
print("F1 score:", f1)

# TODO: In this domain, researchers are often interested in sensitivity and specificity

# Sensitivity: ratio of correctly predicted positive observations to all observations of this class
sensitivity = ...

# Specificity: ratio of correctly predicted negative observations to all observations of this class
specificity = ...

print("Sensitivity:", sensitivity)
print("Specificity:", specificity)

Can you explain what aspect of the classification task does each metric indicate? Keep notes so that you can discuss them in our meeting.

#### Thanks for completing the assignment!
