## Import hfo data from .mat files and process it

In [79]:
# Show current directory
import os
curr_dir = os.getcwd()
print(curr_dir)

PATH_TO_FILE = 'src/seeg_data'  # This is needed if the WD is not the same as the file location

/home/monkin/Desktop/feup/thesis/thesis-lava


In [80]:
import scipy.io as sio
import numpy as np

# Load the data
# simulation_rate_3.mat is the input data from the synthetic dataset
data = sio.loadmat(f'{PATH_TO_FILE}/simulation_rate_3.mat')

# Print the data structure
print(data.keys())

dict_keys(['__header__', '__version__', '__globals__', 'sr', 'channels', 'duration', 'samples', 'channel_types', 'markers', 'marker_positions', 'marker_durations', 'marker_values', 'data'])


# Print the content of the .mat file

## General experiment information

In [81]:
sampling_rate = data['sr'][0][0]
input_duration = data['duration'][0][0]
num_samples = data['samples'][0][0]

print(f"sr: {sampling_rate}")  # Sampling rate

print(f"duration: {input_duration}")

print(f"samples: {num_samples}")

sr: 2048.0
duration: 120.0
samples: 245760


- The **sampling rate** is 2048 Hz, which means that 2048 samples are recorded per second (for each electrode).
- The **duration** of the experiment is 120s.
- There is a **total of 245760 samples** (sampling_rate * duration) per electrode.

## Channel information

In [82]:
num_channels = data['channels'].shape[1]

print(f"Shape of channels: {data['channels'].shape}")

print(f"channels: {data['channels']}")

print("=====================================\n\n\n")

print(f"channel_types: {data['channel_types']}")

Shape of channels: (1, 960)
channels: [[array(["OT'8- SNR 0dB - #1"], dtype='<U18')
  array(["OT'8- SNR 0dB - #2"], dtype='<U18')
  array(["OT'8- SNR 0dB - #3"], dtype='<U18')
  array(["OT'8- SNR 0dB - #4"], dtype='<U18')
  array(["OT'8- SNR 0dB - #5"], dtype='<U18')
  array(["OT'8- SNR 0dB - #6"], dtype='<U18')
  array(["OT'8- SNR 0dB - #7"], dtype='<U18')
  array(["OT'8- SNR 0dB - #8"], dtype='<U18')
  array(["OT'8- SNR 0dB - #9"], dtype='<U18')
  array(["OT'8- SNR 0dB - #10"], dtype='<U19')
  array(["OT'8- SNR 0dB - #11"], dtype='<U19')
  array(["OT'8- SNR 0dB - #12"], dtype='<U19')
  array(["OT'8- SNR 0dB - #13"], dtype='<U19')
  array(["OT'8- SNR 0dB - #14"], dtype='<U19')
  array(["OT'8- SNR 0dB - #15"], dtype='<U19')
  array(["OT'8- SNR 0dB - #16"], dtype='<U19')
  array(["OT'8- SNR 0dB - #17"], dtype='<U19')
  array(["OT'8- SNR 0dB - #18"], dtype='<U19')
  array(["OT'8- SNR 0dB - #19"], dtype='<U19')
  array(["OT'8- SNR 0dB - #20"], dtype='<U19')
  array(["OT'8- SNR 0dB - #21"]

This recording has 960 channels, which correspond to **960 electrodes**. 

The channels are divided into **8 groups of 120 channels each**. Each **group** corresponds to a **different brain region**:
- **Group 1**: OT'8
- **Group 2**: B'1
- **Group 3**: GPH'2
- **Group 4**: A'1
- **Group 5**: I6
- **Group 6**: PM6
- **Group 7**: PM10
- **Group 8**: CR5

For each group, **the 120 electrodes are divided into 4 subgroups of 30 electrodes each**. Each subgroup has a **different** Spike-to-Noise Ratio (**SNR**):
- **Subgroup 1**: 0dB
- **Subgroup 2**: 5dB
- **Subgroup 3**: 10dB
- **Subgroup 4**: 15dB

## Markers information

In [83]:
print("Shape of markers: ", data['markers'].shape)

print(f"markers: {data['markers']}")

print(f"marker_positions: {data['marker_positions']}")

print(f"marker_durations: {data['marker_durations']}")

print(f"marker_values: {data['marker_values']}")

Shape of markers:  (1, 40320)
markers: [[array(['Spike+Ripple+Fast-Ripple'], dtype='<U24')
  array(['Fast-Ripple'], dtype='<U11')
  array(['Fast-Ripple'], dtype='<U11') ...
  array(['Spike+Ripple'], dtype='<U12') array(['Spike'], dtype='<U5')
  array(['Spike+Fast-Ripple'], dtype='<U17')]]
marker_positions: [[  1.   1.   1. ... 119. 119. 119.]]
marker_durations: [[0. 0. 0. ... 0. 0. 0.]]
marker_values: [[14.  0.  0. ...  8. 10.  6.]]


### Transform the markers into a numpy array of shape (n_markers, )

In [84]:
marker_labels = data['markers'].T

# Remove useless dimensions from the numpy array
marker_labels = np.array(list(map(lambda x: x[0][0], marker_labels)))

print(marker_labels.shape)
print(marker_labels)

(40320,)
['Spike+Ripple+Fast-Ripple' 'Fast-Ripple' 'Fast-Ripple' ... 'Spike+Ripple'
 'Spike' 'Spike+Fast-Ripple']


The `markers` are stored as an array, where **each element labels the type of event** that occurred at that time. The event can be a combination of the following:
- 'Spike'
- 'Ripple'
- 'Fast-Ripple'

### Let's also fix the dimensions of `marker_positions`, `marker_durations` and `marker_values`

In [85]:
marker_positions = data['marker_positions'].T

# Remove useless dimensions from the numpy array and convert from seconds to milliseconds
marker_positions = np.array(list(map(lambda x: x[0] * 1000, marker_positions)))

print(marker_positions.shape)
print(marker_positions)

(40320,)
[  1000.   1000.   1000. ... 119000. 119000. 119000.]


The `marker_positions` indicates where the event starts in milliseconds.

In [86]:
marker_durations = data['marker_durations'].T

# Remove useless dimensions from the numpy array
# TODO: Check if the duration is in milliseconds or seconds
marker_durations = np.array(list(map(lambda x: x[0], marker_durations)))

print(marker_durations.shape)
print(marker_durations)

(40320,)
[0. 0. 0. ... 0. 0. 0.]


The `marker_durations` indicates how long the event lasts. **In the synthetic dataset, the duration is always 0 (not indicated)**

In [112]:
marker_values = data['marker_values'].T

# Remove useless dimensions from the numpy array
marker_values = np.array(list(map(lambda x: x[0], marker_values)))

print(marker_values.shape)
print(marker_values)

(40320,)
[14.  0.  0. ...  8. 10.  6.]


What are the `marker_values`? Could it be related to the amplitude of the event or the brain region where it occurred? It's an integer value that ranges from 0 to 15

### Join all the markers data into a single structured datatype

The structured datatype will have the following fields:
- `label`: the type of event that occurred (can be a combination of 'Spike', 'Ripple' and 'Fast-Ripple')
- `position`: the position of the marker in milliseconds
- `duration`: the duration of the event in milliseconds
- `value`: the value of the marker

In [110]:
# Create a structured array with the markers
num_markers = len(marker_labels)
markers = np.zeros(num_markers, dtype=[('label', 'U64'), ('position', np.float32), ('duration', np.float32), ('value', np.int16)])

# Assign the columns to the structured array
markers['label'] = marker_labels
markers['position'] = marker_positions
markers['duration'] = marker_durations
markers['value'] = marker_values

print(markers.shape)
print(markers[:100])

(40320,)
[('Spike+Ripple+Fast-Ripple', 1000., 0., 14)
 ('Fast-Ripple', 1000., 0.,  0) ('Fast-Ripple', 1000., 0.,  0)
 ('Fast-Ripple', 1000., 0.,  0) ('Fast-Ripple', 1000., 0.,  0)
 ('Spike+Ripple+Fast-Ripple', 1000., 0.,  2) ('Spike', 1000., 0.,  3)
 ('Spike+Ripple', 1000., 0.,  4) ('Ripple', 1000., 0.,  0)
 ('Spike+Fast-Ripple', 1000., 0.,  1) ('Spike', 1000., 0., 11)
 ('Spike', 1000., 0., 13) ('Spike+Fast-Ripple', 1000., 0., 13)
 ('Spike+Fast-Ripple', 1000., 0.,  7) ('Spike', 1000., 0.,  4)
 ('Spike+Ripple+Fast-Ripple', 1000., 0.,  7) ('Ripple', 1000., 0.,  0)
 ('Ripple+Fast-Ripple', 1000., 0.,  0) ('Spike', 1000., 0., 12)
 ('Spike+Fast-Ripple', 1000., 0.,  2) ('Spike+Ripple', 1000., 0.,  6)
 ('Spike+Fast-Ripple', 1000., 0., 14) ('Ripple', 1000., 0.,  0)
 ('Fast-Ripple', 1000., 0.,  0) ('Spike+Fast-Ripple', 1000., 0.,  4)
 ('Spike+Fast-Ripple', 1000., 0.,  2)
 ('Spike+Ripple+Fast-Ripple', 1000., 0., 11)
 ('Spike+Ripple+Fast-Ripple', 1000., 0.,  6)
 ('Spike+Ripple+Fast-Ripple', 1000.,

### Write the processed markers into a .npy file

In [109]:
file_name = f"{PATH_TO_FILE}/seeg_synthetic_humans_markers.npy"

np.save(file_name, markers)   # Save the data to a numpy file (not stored in git due to size)

## SEEG Data

In [9]:
print(f"data: {data['data']}")

print(f"Shape of data: {data['data'].shape}")
# Shape of the data is (channels, samples)
# Nº of channels = 960 (Each channel represents a different electrode)
# Nº of samples = 245760

data: [[ 3.2352024e-01 -6.9759099e-04  1.9026639e+00 ...  1.8771046e+01
   1.8270973e+01  1.7719765e+01]
 [-1.3235390e+00 -3.5122361e+00 -5.6726017e+00 ... -4.5149647e+01
  -4.5094780e+01 -4.5308308e+01]
 [-5.9668809e-01 -4.8766956e-01  9.8274893e-01 ...  1.8622489e+00
   1.0670514e+00  1.6390228e+00]
 ...
 [-1.9608999e+00 -5.8757830e+00 -6.6182971e+00 ... -3.1880032e+01
  -3.0655754e+01 -2.9434477e+01]
 [-1.9769822e-01 -7.4400985e-01 -8.3053267e-01 ... -3.2245569e+00
  -1.9809768e+00 -2.1984828e+00]
 [-1.2078454e+00 -5.1096064e-01 -8.1596655e-01 ...  2.5964035e+01
   2.7849955e+01  2.9322460e+01]]
Shape of data: (960, 245760)


### Shape of the data
The shape of the data is (n_channels, n_samples)
- Number of channels: 960
- Number of samples: 245760

The **data corresponds** to the recordings of the SEEG signals from 960 channels, acquiring a total of 245760 samples for each channel.

In [10]:
recorded_data = data['data']

Each value of the `recorded_data` is a float number that represents the amplitude of the signal at that specific time (voltage). The voltage is measured in millivolts (mV)?? TODO: Check units

# Change the structure of the data

Let's change the structure of the data to a 2D array that is ordered by time. This way, we can use the input data of various channels together by following the time order. 

Therefore, let's **transform the shape from (num_channels, num_samples) to (num_samples, num_channels)**. This way, each row will represent a time point and contains the voltage values of all channels at that time point. 

It is not necessary to specify the time of each row since there is a designated sampling rate of the input.

The structure is exemplified below, with a total of 245760 rows:

| Channel 1       | Channel 2     | Channel ...     | Channel 960   |
|-----------------|---------------|-----------------|---------------|
| 3               | 1             | 3               | 1             |
| 8               | 0             | 7               | 15            |
| ...             | ...           | ...             | ...           |
| 14              | 5             | 3               | 1             |


## Select the channels to be used
For the sake of simplicity, we can define a list of channels to be used.

In [11]:
# channels_used: set = {1, 2, 3, 4, 5, 6, 7, 8}
channels_used = set(range(1, num_channels+1, 1))

print(channels_used)

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 22

In [12]:
ordered_recorded_data = recorded_data.T     # Swap the structure of the recorded_data to (num_samples, num_channels)

ordered_recorded_data.shape

(245760, 960)

# Write the processed data to a .npy file

Finally, we write the processed data to a .npy file. This way, we can use it in the Spiking Neural Networks (SNN) model.

The .npy file is a binary file that contains the processed data in a numpy array format. This format is easy to read and write, and it is compatible with the numpy library.

In [13]:
file_name = f"{PATH_TO_FILE}/seeg_synthetic_humans.npy"

np.save(file_name, ordered_recorded_data)   # Save the data to a numpy file (not stored in git due to size)