<a href="https://colab.research.google.com/github/page-jerzak/ai_computing/blob/main/DSMDLP_Module14_Neural_Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Neural Networks and Deep Learning
In this assignment, you will be training and evaluating neural networks and other models for the task of developing automated sensor-free detectors of student affect (i.e. concentration, boredom, confusion, and frustration).

## Dataset

This assignment will utilize student data collected from the ASSISTments digital learning platform. The dataset contains 48,716 rows of data from 838 unique students. Each row represents one "clip" of student activity; a clip is meant to represent approximately 20 seconds of actions, but may vary. Each clip is described by 92 features; these expert-engineered features are the result of taking the average, sum, minimum, and maximum value of 23 action-level features when aggregating to the clip-level (i.e. 23 features x 4 aggregations = 92). The dataset also contains 3,109 observations of affect collected by human coders using the [BROMP protocol](https://d1wqtxts1xzle7.cloudfront.net/36773439/BROMP_2.0_Final-libre.pdf?1424899724=&response-content-disposition=inline%3B+filename%3DBaker_Rodrigo_Ocumpaugh_Monitoring_Proto.pdf&Expires=1708897741&Signature=MkENDA~A6ZDfWOD--VrdUT73ngf4~bQJ48Nq1DOnyZkq~h8zwcSben4URR8MnGipxbgbzxkpRE4pfIaLSBRBq5G62-C3DYdw60Kjx0qTsBCQoIWu6XmqPz6ACzyslcJwc~LA7vDIiJ3MVs1CGVccZnDFaFP6YzAnkAbK3HuZ1UgkT3OsxorsD7p7pbgF0P0WEb6X9NevtNAxNbEbzSN7r0mrjAoESZdFkat~q1eVyAcPhQ-ONGB-aK-FzDImMlC7gjxRqiq2J7Husp5RVFVuQ3v7v-gfvyq7rC8Clabci1EkaaCjbF9qxLiibwl5End3Tre6MQkgV4tu7tY~kSOciQ__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA); students were labeled as exhibiting either concentration, boredom, confusion, or frustration. These labels were collected in a round-robin fashion, such that not every row in the dataset contains a label, resulting in a large number of unlabeled data rows.

**The dataset can be downloaded from this direct link:
[ASSISTments Affect Labels and Features](https://drive.google.com/file/d/19v4vzxsvHM_zm2a_9Zvquhf_Z4HGtu2a/view?usp=sharing)**

Versions of this dataset have been used in prior works:

* [Ocumpaugh, J., Baker, R., Gowda, S., Heffernan, N., & Heffernan, C. (2014). Population validity for educational data mining models: A case study in affect detection. *British Journal of Educational Technology*, 45(3), 487-501.](https://bera-journals.onlinelibrary.wiley.com/doi/full/10.1111/bjet.12156)

* [Botelho, A. F., Baker, R. S., & Heffernan, N. T. (2017, June). Improving Sensor-Free Affect Detection Using Deep Learning. *In Proceedings of the 2017 International Conference on Artificial Intelligence in Education*, 40-51. Springer, Cham.](https://link.springer.com/chapter/10.1007/978-3-319-61425-0_4)

#Data Loading and Preprocessing
Download the **student_affect_with_clip_features_and_folds.csv** file from the link above. Run the first code cell below to upload the dataset. The second code cell below uses the pandas library to read the file into a Dataframe and displays the number of rows and columns as well as a sample of the loaded data.

*Note: The dataset has already been folded at the student-level. We will be using the "fold" column of this dataset to apply cross-validation*



In [1]:
from google.colab import files
dataset = files.upload()
filename = list(dataset.keys())[0]
print(f"{filename} has been uploaded")

Saving student_affect_with_clip_features_and_folds.csv to student_affect_with_clip_features_and_folds (1).csv
student_affect_with_clip_features_and_folds (1).csv has been uploaded


In [2]:
import numpy as np
import pandas as pd
import pickle as pk
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

# Define the prefixes
prefixes = ["avg_", "sum_", "min_", "max_"]

TARGET_FEATURES = ['confusion', 'concentration', 'frustration', 'boredom']

data = pd.read_csv(filename)
EXPERT_FEATURES = [col for col in data.columns if any(col.lower().startswith(prefix.lower()) for prefix in prefixes)]
data[TARGET_FEATURES] = data[TARGET_FEATURES].fillna(0)

# Print the shape of the dataset
print("\nShape of the dataset (rows, columns):", data.shape)

data


Shape of the dataset (rows, columns): (48716, 106)


Unnamed: 0,row_id,clip_id,skill,problem_id,user_id,assignment_id,assistment_id,avg_attemptCount,avg_bottomHint,avg_correct,...,sum_totalFrPercentPastWrong,sum_totalFrSkillOpportunities,sum_totalFrTimeOnSkill,confusion,concentration,boredom,frustration,urbanicity,clip_sequence,fold
0,0,1,281,136.00000,72720,287761.0,136.0,1.0,0.0,0.000000,...,0.000000,0,0.00000,0.0,0.0,0.0,0.0,1,1,1
1,1,2,281,136.00000,72720,287761.0,136.0,2.0,0.0,0.000000,...,0.000000,1,186.65000,0.0,0.0,0.0,0.0,1,1,1
2,2,3,24,4468.00000,72720,287767.0,4468.0,1.5,0.0,0.000000,...,0.000000,1,54.56500,0.0,0.0,0.0,0.0,1,1,1
3,3,5,24,4464.00000,72720,287767.0,4468.0,1.5,0.0,0.500000,...,0.500000,5,133.86500,0.0,0.0,0.0,0.0,1,1,1
4,4,7,42,4465.00000,72720,287767.0,4468.0,3.5,0.0,0.166667,...,0.000000,15,836.40199,0.0,0.0,0.0,0.0,1,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48711,48720,92477,78,86278.50000,155781,529087.0,47799.0,1.0,0.0,1.000000,...,0.500000,14,213.01198,0.0,0.0,0.0,0.0,3,838,4
48712,48721,92479,79,87298.50000,155781,529088.0,48669.5,1.0,0.0,1.000000,...,0.904167,31,458.56597,0.0,0.0,0.0,0.0,3,838,4
48713,48722,92481,47,84974.00000,155781,529090.0,46686.0,1.0,0.0,1.000000,...,0.111111,9,343.83000,0.0,0.0,0.0,0.0,3,838,4
48714,48723,92482,47,84952.00000,155781,529090.0,46664.0,1.5,0.0,0.500000,...,0.100000,21,732.08400,0.0,0.0,0.0,0.0,3,838,4


# Defining Utility Functions
The code cell below provides an implementation of AUC for multi-class prediction (following the method suggested by Hand & Till, 2001) as well as for traditional binary prediction tasks.

[Hand, D. J., & Till, R. J. (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. *Machine learning, 45, 171-186.](https://link.springer.com/article/10.1023/A:1010920819831)

In [3]:
def alen(x):
    return 1 if np.isscalar(x) else len(x)

def auc(actual, predicted, average_over_labels=True, partition=1024.):
    assert len(actual) == len(predicted)

    ac = np.array(actual, dtype=np.float32).reshape((len(actual),-1))
    pr = np.array(predicted, dtype=np.float32).reshape((len(predicted),-1))

    na = np.argwhere([not np.any(np.isnan(i)) for i in ac]).ravel()

    ac = ac[na]
    pr = pr[na]

    label_auc = []
    for i in range(ac.shape[-1]):
        a = np.array(ac[:,i])
        p = np.array(pr[:,i])

        val = np.unique(a)
        if len(val) == 1:
            label_auc.append(np.nan)
            continue

        pos = np.argwhere(a[:] >= np.median(val))
        neg = np.argwhere(a[:] < np.median(val))

        p_div = int(np.ceil(len(pos)/partition))
        n_div = int(np.ceil(len(neg)/partition))

        div = 0
        for j in range(int(p_div)):
            p_range = list(range(int(j * partition), int(np.minimum(int((j + 1) * partition), len(pos)))))
            for k in range(n_div):
                n_range = list(range(int(k * partition), int(np.minimum(int((k + 1) * partition), len(neg)))))


                eq = np.ones((alen(neg[n_range]), alen(pos[p_range]))) * p[pos[p_range]].T == np.ones(
                    (alen(neg[n_range]), alen(pos[p_range]))) * p[neg[n_range]]

                geq = np.array(np.ones((alen(neg[n_range]), alen(pos[p_range]))) *
                               p[pos[p_range]].T >= np.ones((alen(neg[n_range]),
                                                             alen(pos[p_range]))) * p[neg[n_range]],
                               dtype=np.float32)
                geq[eq[:, :] == True] = 0.5
                div += np.sum(geq)

        label_auc.append(div / (alen(pos)*alen(neg)))

    if average_over_labels:
        return np.nanmean(label_auc)
    else:
        return label_auc

## Part 1: Feed Forward Neural Network

The code cell below formats the data for a non-recurrent model (such as a Feed Forward Neural Network or any of the prediction models that have previously been introduced).

The second code cell applies a 5-fold cross-validation on a Feed Forward Neural Network.

**Please follow the instructions in the ASSISTments assignment for modifying and running the cross-validation code cell.**

In [4]:
keepers = data[TARGET_FEATURES].sum(axis=1) == 1
X_nonrecurrent = np.array(data[EXPERT_FEATURES][keepers])
y_nonrecurrent = np.array(data[TARGET_FEATURES][keepers])
fold_nonrecurrent = np.array(data['fold'][keepers])

print(X_nonrecurrent.shape)
print(y_nonrecurrent.shape)

(3109, 92)
(3109, 4)


In [12]:
import keras
import tensorflow as tf
import random
import numpy as np
from keras.models import Sequential
from keras.layers import Masking, Dense, LSTM, TimeDistributed, Dropout, Normalization
from keras.callbacks import EarlyStopping
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, cohen_kappa_score

# Set a seed value
seed_value= 42
np.random.seed(seed_value)
random.seed(seed_value)
tf.random.set_seed(seed_value)

auc_scores = []
kappa_scores = []

for fold in np.unique(fold_nonrecurrent):
    training = np.argwhere(fold_nonrecurrent != fold).ravel()
    testing = np.argwhere(fold_nonrecurrent == fold).ravel()

    X_train, X_test = X_nonrecurrent[training], X_nonrecurrent[testing]
    y_train, y_test = y_nonrecurrent[training], y_nonrecurrent[testing]

    # Define the model
    keras.backend.clear_session()
    model = Sequential([
          Dense(64, activation='relu', input_shape=(92,)),
          Dense(64, activation='tanh'),
          Dense(32, activation='tanh'),
          Dense(4, activation='softmax')
          ])

    # Compile the model
    model.compile(optimizer='adam',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

    # Define early stopping
    early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)

    # Train the model with the new training and validation sets
    history = model.fit(X_train, y_train,
                        epochs=100,
                        validation_split=0.2,
                        verbose=1, # setting this to 0 reduces the amount printed
                        callbacks=[early_stopping])

     # Evaluate the model
    y_pred = model.predict(X_test)

    # AUC
    auc_score = auc(y_test, y_pred)

    # Kappa Score
    y_pred_classes = np.argmax(y_pred, axis=1)
    y_true_classes = np.argmax(y_test, axis=1)
    kappa_score = cohen_kappa_score(y_true_classes, y_pred_classes)

    auc_scores.append(auc_score)
    kappa_scores.append(kappa_score)

# Calculate the average AUC and Kappa scores across all folds
average_auc = np.mean(auc_scores)
average_kappa = np.mean(kappa_scores)

print(f"Average AUC: {average_auc}")
print(f"Average Kappa: {average_kappa}")


Epoch 1/100


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m62/62[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.7493 - loss: 0.7603 - val_accuracy: 0.8327 - val_loss: 0.6588
Epoch 2/100
[1m62/62[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8338 - loss: 0.5722 - val_accuracy: 0.8306 - val_loss: 0.6396
Epoch 3/100
[1m62/62[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.8345 - loss: 0.5688 - val_accuracy: 0.8306 - val_loss: 0.6346
Epoch 4/100
[1m62/62[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.8336 - loss: 0.5679 - val_accuracy: 0.8286 - val_loss: 0.6405
Epoch 5/100
[1m62/62[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.8361 - loss: 0.5685 - val_accuracy: 0.8327 - val_loss: 0.6433
Epoch 6/100
[1m62/62[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.8326 - loss: 0.5611 - val_accuracy: 0.8327 - val_loss: 0.6308
Epoch 7/100
[1m62/62[0m [32m━━━━━━━━━━━━━━━

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m62/62[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.6894 - loss: 0.8585 - val_accuracy: 0.8367 - val_loss: 0.6627
Epoch 2/100
[1m62/62[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8307 - loss: 0.5714 - val_accuracy: 0.8367 - val_loss: 0.6451
Epoch 3/100
[1m62/62[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8246 - loss: 0.5579 - val_accuracy: 0.8367 - val_loss: 0.6806
Epoch 4/100
[1m62/62[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8304 - loss: 0.5578 - val_accuracy: 0.8367 - val_loss: 0.6714
Epoch 5/100
[1m62/62[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8319 - loss: 0.5508 - val_accuracy: 0.8347 - val_loss: 0.6777
Epoch 6/100
[1m62/62[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8301 - loss: 0.5567 - val_accuracy: 0.8347 - val_loss: 0.6777
Epoch 7/100
[1m62/62[0m [32m━━━━━━━━━━━━━━━

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m64/64[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 8ms/step - accuracy: 0.7225 - loss: 0.8087 - val_accuracy: 0.8343 - val_loss: 0.6371
Epoch 2/100
[1m64/64[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.8164 - loss: 0.6293 - val_accuracy: 0.8343 - val_loss: 0.6213
Epoch 3/100
[1m64/64[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.8163 - loss: 0.6199 - val_accuracy: 0.8343 - val_loss: 0.6360
Epoch 4/100
[1m64/64[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.8153 - loss: 0.6230 - val_accuracy: 0.8343 - val_loss: 0.6248
Epoch 5/100
[1m64/64[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.8164 - loss: 0.6215 - val_accuracy: 0.8343 - val_loss: 0.6364
Epoch 6/100
[1m64/64[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.8164 - loss: 0.6262 - val_accuracy: 0.8323 - val_loss: 0.6475
Epoch 7/100
[1m64/64[0m [32m━━━━━━━━━━━━━━━

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m65/65[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.7573 - loss: 0.7586 - val_accuracy: 0.8571 - val_loss: 0.5850
Epoch 2/100
[1m65/65[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8322 - loss: 0.5764 - val_accuracy: 0.8571 - val_loss: 0.5903
Epoch 3/100
[1m65/65[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8319 - loss: 0.5645 - val_accuracy: 0.8571 - val_loss: 0.5992
Epoch 4/100
[1m65/65[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8324 - loss: 0.5635 - val_accuracy: 0.8571 - val_loss: 0.5961
Epoch 5/100
[1m65/65[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8324 - loss: 0.5646 - val_accuracy: 0.8571 - val_loss: 0.5974
Epoch 6/100
[1m65/65[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8322 - loss: 0.5618 - val_accuracy: 0.8571 - val_loss: 0.5784
Epoch 7/100
[1m65/65[0m [32m━━━━━━━━━━━━━━━

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.7679 - loss: 0.7375 - val_accuracy: 0.7845 - val_loss: 0.7947
Epoch 2/100
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8091 - loss: 0.6347 - val_accuracy: 0.7845 - val_loss: 0.8123
Epoch 3/100
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8091 - loss: 0.6251 - val_accuracy: 0.7845 - val_loss: 0.8053
Epoch 4/100
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8066 - loss: 0.6225 - val_accuracy: 0.7845 - val_loss: 0.8118
Epoch 5/100
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8064 - loss: 0.6203 - val_accuracy: 0.7845 - val_loss: 0.7785
Epoch 6/100
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8065 - loss: 0.6223 - val_accuracy: 0.7845 - val_loss: 0.8033
Epoch 7/100
[1m60/60[0m [32m━━━━━━━━━━━━━━━

## Part 2: Long Short Term Memory (LSTM) Neural Network

The code cell below formats the data for a recurrent model (such as a LSTM Neural Network). Remember that the data needs to be represented in 3 dimensions as opposed to 2 (note the difference in data shape as compared to above). In this cell, padding is also applied to the data and a mask is generated so that the model ignores any padded values (this is for model training efficiency).

The second code cell applies a 5-fold cross-validation on a LSTM Neural Network following the structure suggested by Botelho et al. (2017).

[Botelho, A. F., Baker, R. S., & Heffernan, N. T. (2017, June). Improving Sensor-Free Affect Detection Using Deep Learning. *In Proceedings of the 2017 International Conference on Artificial Intelligence in Education*, 40-51. Springer, Cham.](https://link.springer.com/chapter/10.1007/978-3-319-61425-0_4)


**Please follow the instructions in the ASSISTments assignment for modifying and running the cross-validation code cell.**

In [13]:
sequence_lengths = []
X_recurrent = []
y_recurrent = []
mask_recurrent = []
fold_recurrent = []

def parse_data(df):
    if (df[TARGET_FEATURES].sum(axis=1) == 1).sum() > 0:
        keepers = df[EXPERT_FEATURES[0]].notna()
        df = df[keepers]
        sequence_lengths.append(len(df))
        X_recurrent.append(df[EXPERT_FEATURES].values.reshape(-1, len(EXPERT_FEATURES)))
        y_recurrent.append(df[TARGET_FEATURES].values.reshape(-1, len(TARGET_FEATURES)))
        mask_recurrent.append(df[TARGET_FEATURES].sum(axis=1).values.reshape(-1, 1))
        fold_recurrent.append(df['fold'].iloc[0])


def pad_data(a, max_length):
    pad = np.zeros((max_length - a.shape[0], a.shape[1]))
    return np.concatenate([a, pad])

data = data.sort_values(['user_id', 'clip_sequence', 'row_id']).reset_index()
data.groupby(['user_id', 'clip_sequence']).apply(parse_data)

max_length = max(sequence_lengths)
X_recurrent_padded = np.stack([pad_data(i, max_length) for i in X_recurrent])
y_recurrent_padded = np.stack([pad_data(i, max_length) for i in y_recurrent])
mask_recurrent_padded = np.equal(np.stack([pad_data(i, max_length) for i in mask_recurrent]), 1)
fold_recurrent = np.array(fold_recurrent)

print(X_recurrent_padded.shape)
print(y_recurrent_padded.shape)

  data.groupby(['user_id', 'clip_sequence']).apply(parse_data)


(472, 432, 92)
(472, 432, 4)


In [15]:
import keras
import numpy as np
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Masking, Dense, LSTM, TimeDistributed, Dropout, Normalization
from keras.callbacks import EarlyStopping
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, cohen_kappa_score

# Set a seed value
seed_value= 42
np.random.seed(seed_value)
random.seed(seed_value)
tf.random.set_seed(seed_value)

mms = MinMaxScaler()

auc_scores = []
kappa_scores = []

training_auc = []
training_kappa = []

for fold in np.unique(fold_recurrent):
    training = np.argwhere(fold_recurrent != fold).ravel()
    testing = np.argwhere(fold_recurrent == fold).ravel()

    X_train, X_test = X_recurrent_padded[training], X_recurrent_padded[testing]
    y_train, y_test = y_recurrent_padded[training], y_recurrent_padded[testing]
    m_train, m_test = mask_recurrent_padded[training], mask_recurrent_padded[testing]
    m_train = m_train.squeeze(-1)
    m_train = m_train[..., None]
    m_test = m_test.squeeze(-1)
    m_test = m_test[..., None]

    # Apply masking
    X_train = np.where(m_train, X_train, 0.0)
    X_test = np.where(m_test, X_test, 0.0)

    # Define the model
    keras.backend.clear_session()
    normalization_layer = Normalization(axis=-1)
    normalization_layer.adapt(X_train)
    model = Sequential([
        normalization_layer,
        LSTM(50, activation='tanh', return_sequences=True),
        TimeDistributed(Dense(4, activation='softmax'))
    ])

    # Compile the model
    model.compile(optimizer='adam',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

    # Define early stopping
    early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)

    # Train the model with the new training and validation sets
    history = model.fit(X_train, y_train,
                        epochs=100,
                        validation_split=0.2,
                        verbose=1, # setting this to 0 reduces the amount printed
                        callbacks=[early_stopping])

    # Evaluate the model (TRAINING SET)
    y_pred = model.predict(X_train)

    flat_pred = []
    flat_actual = []

    for p, a, m in zip(y_pred, y_train, m_train):
        flat_pred.append(p[m.flatten(), :])
        flat_actual.append(a[m.flatten(), :])

    flat_pred = np.concatenate(flat_pred)
    flat_pred = mms.fit_transform(flat_pred)
    flat_actual = np.concatenate(flat_actual)

    # AUC
    auc_score = auc(flat_actual, flat_pred)

    # Kappa Score
    y_pred_classes = np.argmax(flat_pred, axis=1)
    y_true_classes = np.argmax(flat_actual, axis=1)
    kappa_score = cohen_kappa_score(y_true_classes, y_pred_classes)

    training_auc.append(auc_score)
    training_kappa.append(kappa_score)


    # Evaluate the model (TEST SET)
    y_pred = model.predict(X_test)

    flat_pred = []
    flat_actual = []

    for p, a, m in zip(y_pred, y_test, m_test):
        flat_pred.append(p[m.flatten(), :])
        flat_actual.append(a[m.flatten(), :])

    flat_pred = np.concatenate(flat_pred)
    flat_pred = mms.fit_transform(flat_pred)
    flat_actual = np.concatenate(flat_actual)

    # AUC
    auc_score = auc(flat_actual, flat_pred)

    # Kappa Score
    y_pred_classes = np.argmax(flat_pred, axis=1)
    y_true_classes = np.argmax(flat_actual, axis=1)
    kappa_score = cohen_kappa_score(y_true_classes, y_pred_classes)

    auc_scores.append(auc_score)
    kappa_scores.append(kappa_score)

# Calculate the average AUC and Kappa scores across all folds
print(f"Average Test Set AUC: {np.mean(auc_scores)}")
print(f"Average Test Set Kappa: {np.mean(kappa_scores)}")

print(f"\nAverage Training Set AUC: {np.mean(training_auc)}")
print(f"Average Training Set Kappa: {np.mean(training_kappa)}")

Epoch 1/100
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 1s/step - accuracy: 0.0885 - loss: 0.0185 - val_accuracy: 0.9348 - val_loss: 0.0086
Epoch 2/100
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 454ms/step - accuracy: 0.9066 - loss: 0.0115 - val_accuracy: 0.9129 - val_loss: 0.0072
Epoch 3/100
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 524ms/step - accuracy: 0.4122 - loss: 0.0099 - val_accuracy: 0.0255 - val_loss: 0.0073
Epoch 4/100
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 452ms/step - accuracy: 0.0270 - loss: 0.0093 - val_accuracy: 0.0198 - val_loss: 0.0074
Epoch 5/100
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 608ms/step - accuracy: 0.0228 - loss: 0.0088 - val_accuracy: 0.0178 - val_loss: 0.0075
Epoch 6/100
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 455ms/step - accuracy: 0.0216 - loss: 0.0085 - val_accuracy: 0.0157 - val_loss: 0.0076
Epoch 7/100
[1m10/10