## Import hfo data from .mat files and process it

In [1]:
import scipy.io as sio

# Load the data
# simulation_rate_3.mat is the input data from the synthetic dataset
data = sio.loadmat('simulation_rate_3.mat')

# Print the data structure
print(data.keys())

dict_keys(['__header__', '__version__', '__globals__', 'sr', 'channels', 'duration', 'samples', 'channel_types', 'markers', 'marker_positions', 'marker_durations', 'marker_values', 'data'])


# Print the content of the .mat file

## General experiment information

In [7]:
print(f"sr: {data['sr']}")  # Sampling rate

print(f"duration: {data['duration']}")

print(f"samples: {data['samples']}")

sr: [[2048.]]
duration: [[120.]]
samples: [[245760]]


- The **sampling rate** is 2048 Hz, which means that 2048 samples are recorded per second.
- The **duration** of the experiment is 120s.
- There is a **total of 245760 samples** (sampling_rate * duration).

## Channel information

In [14]:
print(f"Shape of channels: {data['channels'].shape}")

print(f"channels: {data['channels']}")

print("=====================================\n\n\n")

print(f"channel_types: {data['channel_types']}")

Shape of channels: (1, 960)
channels: [[array(["OT'8- SNR 0dB - #1"], dtype='<U18')
  array(["OT'8- SNR 0dB - #2"], dtype='<U18')
  array(["OT'8- SNR 0dB - #3"], dtype='<U18')
  array(["OT'8- SNR 0dB - #4"], dtype='<U18')
  array(["OT'8- SNR 0dB - #5"], dtype='<U18')
  array(["OT'8- SNR 0dB - #6"], dtype='<U18')
  array(["OT'8- SNR 0dB - #7"], dtype='<U18')
  array(["OT'8- SNR 0dB - #8"], dtype='<U18')
  array(["OT'8- SNR 0dB - #9"], dtype='<U18')
  array(["OT'8- SNR 0dB - #10"], dtype='<U19')
  array(["OT'8- SNR 0dB - #11"], dtype='<U19')
  array(["OT'8- SNR 0dB - #12"], dtype='<U19')
  array(["OT'8- SNR 0dB - #13"], dtype='<U19')
  array(["OT'8- SNR 0dB - #14"], dtype='<U19')
  array(["OT'8- SNR 0dB - #15"], dtype='<U19')
  array(["OT'8- SNR 0dB - #16"], dtype='<U19')
  array(["OT'8- SNR 0dB - #17"], dtype='<U19')
  array(["OT'8- SNR 0dB - #18"], dtype='<U19')
  array(["OT'8- SNR 0dB - #19"], dtype='<U19')
  array(["OT'8- SNR 0dB - #20"], dtype='<U19')
  array(["OT'8- SNR 0dB - #21"]

This recording has 960 channels, which correspond to **960 electrodes**. 

The channels are divided into **8 groups of 120 channels each**. Each **group** corresponds to a **different brain region**:
- **Group 1**: OT'8
- **Group 2**: B'1
- **Group 3**: GPH'2
- **Group 4**: A'1
- **Group 5**: I6
- **Group 6**: PM6
- **Group 7**: PM10
- **Group 8**: CR5

For each group, **the 120 electrodes are divided into 4 subgroups of 30 electrodes each**. Each subgroup has a **different** Spike-to-Noise Ratio (**SNR**):
- **Subgroup 1**: 0dB
- **Subgroup 2**: 5dB
- **Subgroup 3**: 10dB
- **Subgroup 4**: 15dB

## Markers information

In [17]:
print("Shape of markers: ", data['markers'].shape)

print(f"markers: {data['markers']}")

print(f"marker_positions: {data['marker_positions']}")

print(f"marker_durations: {data['marker_durations']}")

print(f"marker_values: {data['marker_values']}")

Shape of markers:  (1, 40320)
markers: [[array(['Spike+Ripple+Fast-Ripple'], dtype='<U24')
  array(['Fast-Ripple'], dtype='<U11')
  array(['Fast-Ripple'], dtype='<U11') ...
  array(['Spike+Ripple'], dtype='<U12') array(['Spike'], dtype='<U5')
  array(['Spike+Fast-Ripple'], dtype='<U17')]]
marker_positions: [[  1.   1.   1. ... 119. 119. 119.]]
marker_durations: [[0. 0. 0. ... 0. 0. 0.]]
marker_values: [[14.  0.  0. ...  8. 10.  6.]]


The markers are stored in an array?

## Data

In [22]:
print(f"data: {data['data']}")

print(f"Shape of data: {data['data'].shape}")
# Shape of the data is (channels, samples)
# Nº of channels = 960 (Each channel represents a different electrode)
# Nº of samples = 245760

data: [[ 3.2352024e-01 -6.9759099e-04  1.9026639e+00 ...  1.8771046e+01
   1.8270973e+01  1.7719765e+01]
 [-1.3235390e+00 -3.5122361e+00 -5.6726017e+00 ... -4.5149647e+01
  -4.5094780e+01 -4.5308308e+01]
 [-5.9668809e-01 -4.8766956e-01  9.8274893e-01 ...  1.8622489e+00
   1.0670514e+00  1.6390228e+00]
 ...
 [-1.9608999e+00 -5.8757830e+00 -6.6182971e+00 ... -3.1880032e+01
  -3.0655754e+01 -2.9434477e+01]
 [-1.9769822e-01 -7.4400985e-01 -8.3053267e-01 ... -3.2245569e+00
  -1.9809768e+00 -2.1984828e+00]
 [-1.2078454e+00 -5.1096064e-01 -8.1596655e-01 ...  2.5964035e+01
   2.7849955e+01  2.9322460e+01]]
Shape of data: (960, 245760)


### Shape of the data
The shape of the data is (n_channels, n_samples)
- Number of channels: 960
- Number of samples: 245760

The **data corresponds** to the recordings of the SEEG signals from 960 channels, acquiring a total of 245760 samples.

In [16]:
recorded_data = data['data']

Each value of the `recorded_data` is a float number that represents the amplitude of the signal at that specific time (voltage). The voltage is measured in millivolts (mV)?? TODO: Check units

So the spike times are stored in a 1D array. Each element of the array is a list of spike times for a given neuron. The spike times are in ms.

# Change the structure of the data

Let's change the structure of the data to a 2D array that is ordered by time. This way, we can use the input data of various channels together by following the time order.

Thus, each row will represent a spiking event and contain 2 columns for the spike time and the channel index respectively. The structure is exemplified below:

| Spike Time (ms) | Channel Index |
|-----------------|---------------|
| 3               | 1             |
| 8               | 0             |
| 12              | 2             |
| 13              | 3             |
| 13              | 6             |
| 14              | 5             |


## Select the channels to be used
For the sake of simplicity, we can define a list of channels to be used.

In [27]:
# channels_used: set = {1, 2, 3, 4, 5, 6, 7, 8}
channels_used = set()

# Add all the channels in the given range
for i in range(0, 252):
    channels_used.add(i)

In [28]:
import numpy as np

# Create a list to store the ordered spike times
all_spike_times = []

# Iterate over each neuron channel
for (idx, channel) in enumerate(spike_times):
    # print("index: ", idx, "channel: ", channel)
    # If the set of channels_used is empty, use all the channels
    if len(channels_used) != 0 and idx not in channels_used:    # If the channel is not in the set, skip it
        continue

    curr_spike_times = channel[0] if len(channel) > 0 else channel     # Remove the extra dimension
    # print(f"Processing channel with shape {curr_spike_times.shape}")

    for spike_time in curr_spike_times.flatten():   # Flatten the array to iterate over all the spike times
        # print(f"Processing spike {spike_time} from channel {idx}")
        # Add the spike time and the channel to the list of all spikes
        all_spike_times.append((spike_time, idx))

# Define the data type for the numpy array
dtype = [('time', float), ('channel', int)]

# Convert the list to a numpy array
all_spike_times = np.array(all_spike_times, dtype=dtype)

# Print the first 10 spike times
print(all_spike_times[:10])

# Show the shape of the all_spike_times list
print(all_spike_times.shape)


[( 69486.8, 0) (173984.7, 0) (193738.7, 0) (210319.3, 0) (269287.5, 0)
 (270162.6, 0) (  1427.1, 1) (  1430.4, 1) (  1433.3, 1) (  1462.8, 1)]
(40020,)


As we can see, we now have a `numpy array` with 2 columns and a number of rows equal to the total number of spikes in the dataset. The first column contains the spike times and the second column contains the channel index. 

The next step is to sort the array by the spike times.

In [29]:
# Sort the spike times array by the time column
ordered_spike_times = np.sort(all_spike_times.copy(), order='time')

# Print the first 10 spike times
print(ordered_spike_times[0:10])

# Print the shape of the ordered spike times
print(ordered_spike_times.shape)

[( 99.5, 229) (303.6,   7) (502.5, 229) (510.6,  71) (528. ,  54)
 (540.9,   7) (589.3, 225) (631.6, 100) (633.8, 100) (758.3, 229)]
(40020,)


## Validate that the data is sorted correctly

In [30]:
# Print the spiking times of the channel 229
print("First 5 spikes of channel 229: ", all_spike_times[all_spike_times['channel'] == 229][:5])

# Print the spiking times of the channel 7
print("First 5 spikes of channel 7: ", all_spike_times[all_spike_times['channel'] == 7][:5])

First 5 spikes of channel 229:  [(  99.5, 229) ( 502.5, 229) ( 758.3, 229) ( 802.4, 229) (1326.8, 229)]
First 5 spikes of channel 7:  [( 303.6, 7) ( 540.9, 7) ( 782.6, 7) (1063.8, 7) (1434.7, 7)]


Indeed, the coherence between the spike times and the channel index is preserved after the sorting. So, we now have a 2D array that is ordered by time `ordered_spike_times`.

# Write the processed data to a .csv file

Finally, we write the processed data to a .csv file. This way, we can use it in the Spiking Neural Networks (SNN) model.

The .csv file will have the following structure:

| time (ms)       | channel_idx   |
|-----------------|---------------|
| 3               | 1             |
| 8               | 0             |
| 12              | 2             |
| 13              | 3             |

In [31]:
import csv

file_name = "lab_data_all_channels.csv"
csv_cols = ['time', 'channel_idx']

with open(file_name, 'w', newline='') as csvfile:
    # Create a CSV writer
    writer = csv.DictWriter(csvfile, fieldnames=csv_cols)

    # Write the header
    writer.writeheader()

    # Write the data
    for spike in ordered_spike_times:
        writer.writerow({'time': spike[0], 'channel_idx': spike[1]})