# Filtering - Pre-Processing of the SEEG Signal (1/2)
This notebook presents the **pre-processing stage 1** the SEEG signal goes through before being fed to the SNN. The pre-processing stages are as follows:
1. **Filtering**: The SEEG signal is bandpass filtered to remove noise and artifacts. The bandpass filter is designed using the Butterworth filter and, since we are working with *iEEG*, the signal is filtered in the ripples and FR bands. The co-occurrence of HFOs in both bands is an optimal prediction of post-surgical seizure freedom by defining an optimal "HFO area" or EZ zone.
2. **Signal-to-Spike Conversion**: To interface and communicate with the silicon neurons in the SNN, the SEEG signal must be converted to spikes.

## Filtering
Depending on the EEG modality, the signal is filtered in different frequency bands. In this case, since we are handling *iEEG* or *sEEG* data, the signal is filtered in both the ripples (80-250Hz) and FR bands (250-500Hz). The co-occurrence of HFO in these bands represents an optimal prediction of post-surgical seizure freedom by defining an optimal "HFO area" or EZ zone.

The filter is implemented in different ways depending on the setup it will run on.
1. **Neuromorphic Hardware**: The filter is implemented using analog filters. 
2. **Software Simulation**: *Butterworth filters* are utilized since they are a good approximation of the tuned *Tow-Thomas* architectures implemented in hardware.

The frequency response of the *Butterworth filter* is maximally flat in the passband and rolls of towards 0 in the stopband.

### Check WD (change if necessary) and file loading

In [12]:
# Show current directory
import os
curr_dir = os.getcwd()
print(curr_dir)

# Check if the current WD is the file location
if "/src/hfo/filter" not in os.getcwd():
    # Set working directory to this file location
    file_location = f"{os.getcwd()}/thesis-lava/src/hfo/filter"
    print("File Location: ", file_location)

    # Change the current working Directory
    os.chdir(file_location)

    # New Working Directory
    print("New Working Directory: ", os.getcwd())

PATH_TO_FILE = '' # 'src/hfo/'  # This is needed if the WD is not the same as the file location

/home/monkin/Desktop/feup/thesis/thesis-lava/src/hfo/filter


In [13]:
import numpy as np
import math

INPUT_FILE_COMMON = "seeg_csl"  # "seeg_synthetic_humans"
seeg_file_name = f"{INPUT_FILE_COMMON}.npy"   # "seeg_synthetic_humans.npy"
markers_seeg_file_name = f"{INPUT_FILE_COMMON}_markers.npy"

recorded_data = np.load(f"{PATH_TO_FILE}data/{seeg_file_name}")

print("Data shape: ", recorded_data.shape)
print("First time steps: ", recorded_data[:10])

Data shape:  (129239, 86)
First time steps:  [[  1.0633698  34.293705   -5.3168535 ... -54.76359    13.557976
  -24.457527 ]
 [  1.8608971  35.888763   -6.380224  ... -57.687862   15.684719
  -25.78674  ]
 [ -1.8608971  37.21797    -7.1777525 ... -62.738873   19.406517
  -22.862473 ]
 ...
 [ -1.3292122  34.559547   -8.772809  ... -72.57505    20.204044
   -2.9242706]
 [ -0.7975273  37.74966   -13.026291  ... -67.78989    16.216404
   -1.8608971]
 [ -1.3292122  34.027863   -9.304494  ... -68.85326    14.355507
    0.5316849]]


## Define the Filter

In [14]:
from scipy.signal import butter, lfilter

# ================================================================ #
# ============ Butterworth Filter Coefficients =================== #
# ================================================================ #
def butter_bandpass(lowcut, highcut, sampling_freq, order=5):
    """
    This function is used to generate the coefficients for lowpass, highpass and bandpass
    filtering for Butterworth filters.
    @lowcut, highcut (int): cutoff frequencies for the bandpass filter
    @sampling_freq (float): sampling_frequency frequency of the wideband signal
    @order (int): filter order

    - return b, a (float): filtering coefficients that will be applied on the wideband signal
    """
    nyq = 0.5 * sampling_freq   # Nyquist frequency
    low = lowcut / nyq          # Normalizing the cutoff frequencies
    high = highcut / nyq        # Normalizing the cutoff frequencies

    return butter(order, [low, high], btype='band')    

# ================================================================ #
# ====================== Butterworth Filters ===================== #
# ================================================================ #
def butter_bandpass_filter(data, lowcut, highcut, sampling_freq, order=5):
    """
    This function applies the filtering coefficients calculated above to the wideband signal (original signal).
    @data (array): Array with the amplitude values of the wideband signal.
    @lowcut, highcut (int): cutoff frequencies for the bandpass filter.
    @sampling_freq (float): sampling frequency of the original signal.
    @order (int): filter order.

    - return (array): Array with the amplitude values of the filtered signal.
    """
    coef_b, coef_a = butter_bandpass(lowcut, highcut, sampling_freq, order)

    return lfilter(coef_b, coef_a, data)
    

## Define Global Parameters of the Experiment

In [15]:
from utils.input import SAMPLING_RATE, X_STEP

sampling_rate = SAMPLING_RATE    # 2048 Hz
x_step = X_STEP  # 0.48828125 ms

num_samples = recorded_data.shape[0]    # 2048 * 120 = 245760
num_channels = recorded_data.shape[1]   # 960
input_duration = num_samples / sampling_rate

print(f"Input Duration: {input_duration} seconds")

Input Duration: 63.10498046875 seconds


### Extract a window of channels from the SEEG data
Let's define the window first.

If we want to extract a single channel, set the variable `is_single_channel` to `True` and the variable `min_channel_idx` to the desired channel number.

In [16]:
is_single_channel = False   # Set to True if you want to use only one channel

# Define the window of channels to be used
BRAIN_REGION_IDX = 0
BRAIN_REGION_OFFSET = BRAIN_REGION_IDX * 120
SNR_OFFSET = 0 # 90     # Choose the highest SNR (channels 90-120)
min_channel_idx =BRAIN_REGION_OFFSET + SNR_OFFSET
max_channel_idx = min_channel_idx + 30

if is_single_channel:
    # Set the window to size 1
    max_channel_idx = min_channel_idx + 1

In [17]:
from utils.io import preview_np_array
seeg_window = recorded_data[:, min_channel_idx:max_channel_idx]

preview_np_array(seeg_window, "SEEG Window")

SEEG Window Shape: (129239, 30).
Preview: [[   1.0633698    34.293705     -5.3168535    31.369436    -37.483818
  ...  -44.927414     12.228767    106.33707     -26.052582
  -141.16248   ]
 [   1.8608971    35.888763     -6.380224     32.964493    -36.952133
  ...  -45.72494      12.228767    106.33707     -25.78674
  -142.22583   ]
 [  -1.8608971    37.21797      -7.1777525    35.091232    -37.217976
  ...  -43.066513      9.0386505   107.1346      -25.255054
  -144.61841   ]
 [  -2.1267433    36.952133     -6.9119096    31.635279    -34.559547
  ...  -46.522472      7.443596    109.52718     -25.255054
  -145.94763   ]
 [  -0.5316849    33.76202      -6.114382     32.69865     -36.952133
  ...  -48.38337       6.911911    110.8564      -26.318426
  -147.5427    ]
 ...
 [   0.26584435    0.5316849     5.582696     34.027863   -123.88269
  ...    1.5950546    -6.380224      0.26584244  -25.78674
   -57.687862  ]
 [   1.0633717    -0.7975273     6.114381     34.2937     -123.351
  ...  

## Apply the Butterworth filter to each channel

In [18]:
# Apply the Butterworth filter to the window of channels in the Ripple Band
ripple_lowcut_freq = 80
ripple_highcut_freq = 250
BUTTER_FILTER_ORDER = 9

ripple_band_seeg_window = [ butter_bandpass_filter(seeg_window[:, i], ripple_lowcut_freq, ripple_highcut_freq, sampling_rate, BUTTER_FILTER_ORDER) for i in range(seeg_window.shape[1]) ]
ripple_band_seeg_window = np.array(ripple_band_seeg_window).T
preview_np_array(ripple_band_seeg_window, "Ripple Band SEEG Window", edge_items=3)

Ripple Band SEEG Window Shape: (129239, 30).
Preview: [[ 1.58888585e-06  5.12416143e-05 -7.94443639e-06 ...  1.58888722e-04
  -3.89277376e-05 -2.10924808e-04]
 [ 2.43530459e-05  7.49338554e-04 -1.17395899e-04 ...  2.31614015e-03
  -5.67057125e-04 -3.07626534e-03]
 [ 1.73387021e-04  5.24759388e-03 -8.32239218e-04 ...  1.61589141e-02
  -3.95205727e-03 -2.14761155e-02]
 ...
 [-2.00463421e+00  4.32507116e-01 -7.82314044e-01 ...  8.74048712e-01
  -4.89444665e-01  1.07881892e+00]
 [-2.34646054e+00  1.16366188e+00 -1.19675063e+00 ...  3.96790980e-01
  -4.76568038e-01  1.06754919e+00]
 [-2.13219194e+00  1.66497204e+00 -1.45428660e+00 ... -1.79986688e-01
  -2.01523335e-01  5.85677721e-01]]


In [19]:
# Apply the Butterworth filter to the window of channels in the Fast Ripple Band
fr_lowcut_freq = 250
fr_highcut_freq = 500

fr_band_seeg_window = [ butter_bandpass_filter(seeg_window[:, i], fr_lowcut_freq, fr_highcut_freq, sampling_rate, BUTTER_FILTER_ORDER) for i in range(seeg_window.shape[1]) ]
fr_band_seeg_window = np.array(fr_band_seeg_window).T
preview_np_array(fr_band_seeg_window, "FR Band SEEG Window", edge_items=3)

FR Band SEEG Window Shape: (129239, 30).
Preview: [[ 3.07483489e-05  9.91635134e-04 -1.53741882e-04 ...  3.07483754e-03
  -7.53335209e-04 -4.08184740e-03]
 [ 2.37688492e-04  6.96785698e-03 -1.10388549e-03 ...  2.14627415e-02
  -5.25068468e-03 -2.85225413e-02]
 [ 4.57120045e-04  1.33819337e-02 -2.25653474e-03 ...  4.03999805e-02
  -9.82331437e-03 -5.38841754e-02]
 ...
 [-4.31730295e-01 -8.11727306e-01  8.44248120e-01 ...  1.49546285e-01
  -5.79074194e-01  1.10558591e+00]
 [-2.28771275e-01 -1.07887853e-01 -2.08512410e-03 ... -7.76373503e-01
   4.01760384e-01 -2.31285927e-01]
 [ 9.33223677e-01  3.62512199e-01 -7.29889627e-01 ... -3.66630013e-01
   6.51022118e-01 -9.24987535e-01]]


Apply the Butterworth filter in the combined Ripple+FR Band

In [20]:
# Apply the Butterworth filter to the window of channels in the Combined Ripple and Fast Ripple Band
both_band_seeg_window = [ butter_bandpass_filter(seeg_window[:, i], ripple_lowcut_freq, fr_highcut_freq, sampling_rate, BUTTER_FILTER_ORDER) for i in range(seeg_window.shape[1]) ]
both_band_seeg_window = np.array(both_band_seeg_window).T
preview_np_array(both_band_seeg_window, "Both Bands SEEG Window", edge_items=3)

Both Bands SEEG Window Shape: (129239, 30).
Preview: [[ 1.35444513e-03  4.36808944e-02 -6.77223173e-03 ...  1.35444630e-01
  -3.31839349e-02 -1.79802771e-01]
 [ 1.36867196e-02  4.10668102e-01 -6.47089319e-02 ...  1.26708967e+00
  -3.10098362e-01 -1.68341619e+00]
 [ 5.46471681e-02  1.62947797e+00 -2.63109765e-01 ...  4.98947631e+00
  -1.21832789e+00 -6.63790039e+00]
 ...
 [ 2.77658767e-01  6.58586971e-02 -6.67332095e-01 ... -4.13114402e-01
   8.98186892e-01 -1.62475645e+00]
 [ 9.46050828e-01 -3.05484659e-01 -6.44122741e-01 ...  2.67005505e-01
   4.62394055e-01 -1.84057544e+00]
 [ 1.41879006e+00 -1.16945237e-01  1.98865993e-01 ...  5.43896958e-01
   2.27998424e-01 -1.31559355e+00]]


## Import the Markers (Annotated Events) 
The markers are stored in a numpy array of shape (num_channels, events):
- Each row represents the events of a channel
- Each event is composed of the following 3 fields (Label, Position, Shape)

In [21]:
markers = np.load(f"{PATH_TO_FILE}data/{markers_seeg_file_name}")

preview_np_array(markers, "Markers", edge_items=3)

ValueError: Object arrays cannot be loaded when allow_pickle=False

### Define the set of channels the markers will be extracted from

In [None]:
channels_used = set(range(min_channel_idx, max_channel_idx))
print("Channels used: ", channels_used)

Channels used:  {90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119}


## Visualize the filtered signals

In [None]:
# Interactive Plot for the HFO detection
# bokeh docs: https://docs.bokeh.org/en/2.4.1/docs/first_steps/first_steps_1.html

from utils.line_plot import create_fig  # Import the function to create the figure
from bokeh.models import Range1d

# Define the x and y values
# Should the first input start at 0 or x_step?
# TODO: is it okay to create a range with floats?
x = [val for val in np.arange(x_step, input_duration + x_step, x_step)] 

## Create the Plot

In [None]:
# Create the plot
# List of tuples containing the y values and the legend label
hfo_y_arrays = []

PLOT_RIPPLE_BAND = False
PLOT_FR_BAND = False
PLOT_BOTH_BAND = True

if is_single_channel:
    # Add the Ripple and FR bands of the single channel
    hfo_y_arrays.append((ripple_band_seeg_window[:, 0], f"Ripple Band Ch. {min_channel_idx}"))
    hfo_y_arrays.append((fr_band_seeg_window[:, 0], f"Fast Ripple Band Ch. {min_channel_idx}"))
else:
    # Add the Ripple, FR and both bands of each channel in the range defined below
    min_hfo_idx = 0
    max_hfo_idx = 8
    if PLOT_RIPPLE_BAND:
        for hfo_idx in range(min_hfo_idx, max_hfo_idx, 1):
            hfo_y_arrays.append((ripple_band_seeg_window[:, hfo_idx], f"Ripple Band Ch. {min_channel_idx + hfo_idx}"))
    if PLOT_FR_BAND:
        for hfo_idx in range(min_hfo_idx, max_hfo_idx, 1):
                hfo_y_arrays.append((fr_band_seeg_window[:, hfo_idx], f"Fast Ripple Band Ch. {min_channel_idx + hfo_idx}"))
    if PLOT_BOTH_BAND:
        for hfo_idx in range(min_hfo_idx, max_hfo_idx, 1):
            hfo_y_arrays.append((both_band_seeg_window[:, hfo_idx], f"Both Bands Ch. {min_channel_idx + hfo_idx}"))


# Create the SEEG Voltage plot
hfo_plot = create_fig(
    title="SEEG Voltage dynamics of Filtered Both Bands", 
    x_axis_label='time (ms)', 
    y_axis_label='Voltage (μV)',
    x=x, 
    y_arrays=hfo_y_arrays, 
    sizing_mode="stretch_both", 
    tools="pan, box_zoom, wheel_zoom, hover, undo, redo, zoom_in, zoom_out, reset, save",
    tooltips="Data point @x: @y",
    legend_location="top_right",
    legend_bg_fill_color="navy",
    legend_bg_fill_alpha=0.1,
    # y_range=Range1d(-0.05, 1.05)
)

# If there are more than 30 channels, hide the legend
if len(hfo_y_arrays) > 30:
    # Hide the legend
    hfo_plot.legend.visible = False

## Add Box Annotations to the plot to identify the marked HFOs (ground truth)

In [None]:
from bokeh.models import BoxAnnotation
# from utils.line_plot import color_map

show_markers = False    # Boolean to show the markers

color_map = {                  
    'Spike': 'red',
    'Fast-Ripple': 'blue',
    'Ripple': 'green',  
    'Spike+Ripple': 'yellow',
    'Spike+Fast-Ripple': 'pink',
    'Ripple+Fast-Ripple': 'cyan',
    'Spike+Ripple+Fast-Ripple': 'black'
}

confidence_range = 100          # TODO: Check this value. When the duration is missing (0), we consider the 200ms window around the marked position 
visited_markers = {}    # Avoid inserting multiple boxes for the same marker (only one of each label)
use_visited = False     # Boolean controlling if we remove duplicate markers
plot_instant = True     # Boolean to plot the markers as instant events or as boxes
instant_width = 100 # 20       # Width of the instant event for visualization purposes

if show_markers:
    for ch_idx in channels_used:
        channel_markers = markers[ch_idx]
        # print("channel_markers", channel_markers)
        for idx2, marker in enumerate(channel_markers):
            # print("marker:", marker)
            
            if use_visited:
                # Check if the marker has already been visited and skip it if it has
                if marker['position'] in visited_markers:
                    visited_labels = visited_markers[marker['position']]    # Get the labels that already have an annotation for this position
                    if marker['label'] in visited_labels:
                        # print("Skipping marker", marker['position'], marker['label'])
                        continue    # Skip this marker
                    else:
                        visited_labels.append(marker['label'])  # Add the label to the visited labels
                else:
                    visited_markers[marker['position']] = [marker['label']] # Add the marker to the visited markers

            # Add a box annotation for each marker
            has_duration = marker['duration'] > 0
            
            confidence_constant = 0 if plot_instant or has_duration else confidence_range

            left = marker['position'] - confidence_constant
            right = marker['position'] + confidence_constant + instant_width
            box_color = color_map[marker['label']]  # Choose a color according to the label
            
            # if left < min_t or right > max_t:
            #     continue    # Skip this marker
            

            box = BoxAnnotation(left=left, right=right, fill_color=box_color, fill_alpha=0.35)
            # print("Added marker for channel: ", ch_idx, " at position: ", left)
            hfo_plot.add_layout(box)

## Show the Plot

In [None]:
import bokeh.plotting as bplt

showPlot = True
if showPlot:
    bplt.show(hfo_plot)

## Export the plot to a file

In [None]:
export = False
file_name = f"filtered_seeg_ch{min_hfo_idx}" if is_single_channel else f"filtered_seeg_ch{min_channel_idx}-{max_channel_idx - 1}"

if export:
    file_path = f"{PATH_TO_FILE}plots/synthetic/{file_name}.html"

    # Customize the output file settings
    bplt.output_file(filename=file_path, title="SEEG Data - Filtered Voltage dynamics across time")

    # Save the plot
    bplt.save(hfo_plot)

## Export the filtered signals to a numpy file

### Get the relevant Markers

In [None]:
# Save the relevant markers in a variable
relevant_markers = markers[min_channel_idx:max_channel_idx]
preview_np_array(relevant_markers, "Relevant Markers", edge_items=3)

Relevant Markers Shape: (30, 42).
Preview: [[('Spike',   1000.  , 0.) ('Spike+Fast-Ripple',   4218.75, 0.)
  ('Ripple+Fast-Ripple',   6966.8 , 0.) ... ('Ripple', 114551.  , 0.)
  ('Spike+Ripple', 116517.  , 0.) ('Spike+Ripple', 119000.  , 0.)]
 [('Spike+Ripple',   1000.  , 0.)
  ('Spike+Ripple+Fast-Ripple',   3917.97, 0.)
  ('Spike+Fast-Ripple',   6794.92, 0.) ... ('Spike', 113382.  , 0.)
  ('Ripple+Fast-Ripple', 116656.  , 0.) ('Spike+Ripple', 119000.  , 0.)]
 [('Fast-Ripple',   1000.  , 0.) ('Spike',   3655.76, 0.)
  ('Spike',   7188.96, 0.) ...
  ('Spike+Ripple+Fast-Ripple', 112797.  , 0.)
  ('Fast-Ripple', 115363.  , 0.) ('Spike+Fast-Ripple', 119000.  , 0.)]
 ...
 [('Ripple+Fast-Ripple',   1000.  , 0.) ('Ripple',   3543.46, 0.)
  ('Ripple+Fast-Ripple',   6759.77, 0.) ...
  ('Fast-Ripple', 112950.  , 0.) ('Spike+Ripple', 115987.  , 0.)
  ('Spike+Ripple', 119000.  , 0.)]
 [('Spike+Fast-Ripple',   1000.  , 0.) ('Fast-Ripple',   4774.41, 0.)
  ('Ripple',   7656.74, 0.) ... ('Spike', 11

In [None]:
from utils.input import RIPPLE_BAND_FILENAME, FR_BAND_FILENAME, BOTH_BAND_FILENAME

EXPORT_FILTERED_SIGNAL = True
file_name = f"filtered_seeg_ch{min_channel_idx}" if is_single_channel else f"filtered_seeg_ch{min_channel_idx}-{max_channel_idx-1}"
if EXPORT_FILTERED_SIGNAL:
    # Export the filtered signals
    np.save(f"{PATH_TO_FILE}results/synthetic/{file_name}_{RIPPLE_BAND_FILENAME}_band.npy", ripple_band_seeg_window)
    np.save(f"{PATH_TO_FILE}results/synthetic/{file_name}_{FR_BAND_FILENAME}_band.npy", fr_band_seeg_window)
    np.save(f"{PATH_TO_FILE}results/synthetic/{file_name}_{BOTH_BAND_FILENAME}_band.npy", both_band_seeg_window)

    # Export the markers
    np.save(f"{PATH_TO_FILE}results/synthetic/{file_name}_markers.npy", relevant_markers)