## 1. Dataset Exploration
The dataset contains the EEG recordings for 15 subjects. Each subject performed the experiment 3 times, giving a total of 15 EEG sessions. 

For each session, 15 different EEG signal have been recorded, each one associated to the task of watching a different movie clip. Each clip is associated to an emotional state, namely {sad: -1, neutral: 0, happy: 1}. 

EEG recordings comprises 62 channels and lasts around 4 minutes. Specifically, recordings correspondent to the same movies have the same length, while recordings correspondent to different movies have different length (in general).

Data have been preprocessed by downsampling signals to 200Hz, segmentating the signals such that it corresponds to the length of the movie and applying a band-pass filter at 0-75Hz. Since recordings are about 4 minutes long and are now sampled at 200Hz, they contain roughly 48k time points each.

In [9]:
example_dataset = TrainDataset(
    path_to_data_dir="data/Preprocessed_EEG/test", 
    path_to_labels="data/Preprocessed_EEG/label.mat", 
    win_length=1000, 
    win_overlap=100, 
    transform=False
)

Loading data files: 100%|██████████| 45/45 [01:09<00:00,  1.54s/it]


In [None]:
# Show how you can manipulate and plot data

## 2. Model definition and training
In this section we do the following:

- Load training, validation and test datasets.
- Create an instance of the EEGNet model.
- Train the EEGNet model.

### Load training, validation and test datasets
We create Dataset and DataLoader for each data split

In [10]:
from torch.utils.data import DataLoader
from dataset import TrainDataset, ValidationDataset, TestDataset

Define variables for this task

In [6]:
window_length = 1000
window_overlap = 100

Load training, validation and test datasets with the specific dataset objects

In [28]:
train_dataset = TrainDataset(
    path_to_data_dir="data/train/", 
    path_to_labels="data/label.mat", 
    win_length=window_length, 
    win_overlap=window_overlap, 
    data_augmentation=True
)

val_dataset = ValidationDataset(
    path_to_data_dir="data/valid/", 
    path_to_labels="data/label.mat", 
    win_length=window_length, 
    win_overlap=window_overlap, 
)

test_dataset = TestDataset(
    path_to_data_dir="data/test/", 
    path_to_labels="data/label.mat", 
    win_length=window_length, 
    win_overlap=window_overlap,
)

Loading training data files:   0%|          | 0/36 [00:00<?, ?it/s]

Loading training data files: 100%|██████████| 36/36 [00:52<00:00,  1.46s/it]
Loading validation data files: 100%|██████████| 6/6 [00:08<00:00,  1.44s/it]
Loading test data files: 100%|██████████| 3/3 [00:04<00:00,  1.45s/it]


In order to feed the data into the neural network we need a specific tool called *DataLoader*. This object enable to automatically *shuffle* the records in the dataset, then create *batches* of data of a specific size, which are suitable inputs for the model. 

Define variables for this task

In [11]:
batch_sz = 16

In [29]:
train_dataloader = DataLoader(train_dataset, batch_size=batch_sz, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=batch_sz, shuffle=False)
test_dataloader = DataLoader(test_dataset, batch_size=batch_sz, shuffle=False)

To better understand what's happening let's check the content of dataset and dataloader objects.

In [30]:
print("Length of datasets (number of windows)")
print("--------------------------------------")
print(f"Training dataset: {len(train_dataset)}")
print(f"Validation dataset: {len(val_dataset)}")
print(f"Test dataset: {len(test_dataset)}")

Length of datasets (number of windows)
--------------------------------------
Training dataset: 26820
Validation dataset: 4470
Test dataset: 2235


In [33]:
first_elem = train_dataset[0]
print("Single dataset item")
print(f"Data: {first_elem[0]}")
print(f"Data shape & type: {first_elem[0].dtype}, {first_elem[0].shape}")
print(f"Label: {first_elem[1]}")
print(f"Label shape & type: {first_elem[1].dtype}, {first_elem[1].shape}")

Single dataset item
Data: tensor([[[  5.6854,  -4.2185, -29.4796,  ..., -32.3536,  -4.6246,  -1.3064],
         [-19.6733, -16.1599, -21.7344,  ..., -34.5277, -30.0970, -33.2164],
         [-23.3691, -28.9452, -35.3163,  ..., -37.2978, -19.0721, -11.3794],
         ...,
         [ -0.5065,  -4.1457,  -2.9170,  ...,  51.4446,  47.2760,  46.2717],
         [-18.0092, -22.8597, -13.6491,  ...,  52.7978,  44.3729,  44.7333],
         [-28.1125, -28.6088, -35.9149,  ...,  50.1409,  55.9867,  51.4929]]])
Data shape & type: torch.float32, torch.Size([1, 62, 1000])
Label: 1
Label shape & type: torch.int16, torch.Size([])


### Create the Neural Network Model
We employ the class named `EEGNet` (from the script `model.py`) to build an object of the *EEGNet* neural network model we previously discussed.

Here I create one instance of the model to inspect its structure by calling `EEGNetModel` constructor (i.e., the name of the class itself).

In [50]:
from model import EEGNetModel
from torchsummary import summary 

Define variables for this task

In [53]:
input_sz = (1, 62, 1000)

In [56]:
eeg_net_model = EEGNetModel(input_size=input_sz)

In [57]:
summary(
    model=eeg_net_model, 
    input_size=input_sz,
    batch_size=batch_sz, 
    device="cpu"
)

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [16, 16, 62, 1000]           1,040
       BatchNorm2d-2         [16, 16, 62, 1000]              32
            Conv2d-3         [16, 32, 56, 1000]           3,616
       BatchNorm2d-4         [16, 32, 56, 1000]              64
              ReLU-5         [16, 32, 56, 1000]               0
         AvgPool2d-6          [16, 32, 28, 200]               0
           Dropout-7          [16, 32, 28, 200]               0
            Conv2d-8          [16, 32, 28, 200]          16,416
            Conv2d-9          [16, 32, 28, 200]           1,056
      BatchNorm2d-10          [16, 32, 28, 200]              64
             ReLU-11          [16, 32, 28, 200]               0
        AvgPool2d-12           [16, 32, 14, 40]               0
          Dropout-13           [16, 32, 14, 40]               0
          Flatten-14                [16

### Train the model
We employ the class named `EEGNet` (from the script `model.py`) to build an object of the *EEGNet* neural network model that provide internal functionalities for *training*.

In [59]:
from model import EEGNet

Define variables for this task

In [60]:
model_params = {
    "input_size": (1, 62, 1000),
    "num_classes": 3,
    "num_out_channels": 16,
    "temporal_kernel_size": 64,
    "spatial_kernel_size": 7, 
    "separable_kernel_size": 16,
    "pooling_size": (2, 5), 
    "dropout_prob": 0.5, 
    "hidden_size": 128,
}

training_params = {    
    "lr": 1e-4, 
    "betas": [0.9, 0.99], 
    "weight_decay": 1e-6, 
    "epochs": 100, 
}

In [61]:
eeg_net = EEGNet(
    model_parameters=model_params,
    **training_params
)