# Filtering - Pre-Processing of the SEEG Signal (1/2)
This notebook presents the **pre-processing stage 1** the SEEG signal goes through before being fed to the SNN. The pre-processing stages are as follows:
1. **Filtering**: The SEEG signal is bandpass filtered to remove noise and artifacts. The bandpass filter is designed using the Butterworth filter and, since we are working with *iEEG*, the signal is filtered in the ripples and FR bands. The co-occurrence of HFOs in both bands is an optimal prediction of post-surgical seizure freedom by defining an optimal "HFO area" or EZ zone.
2. **Signal-to-Spike Conversion**: To interface and communicate with the silicon neurons in the SNN, the SEEG signal must be converted to spikes.

## Filtering
Depending on the EEG modality, the signal is filtered in different frequency bands. In this case, since we are handling *iEEG* or *sEEG* data, the signal is filtered in both the ripples (80-250Hz) and FR bands (250-500Hz). The co-occurrence of HFO in these bands represents an optimal prediction of post-surgical seizure freedom by defining an optimal "HFO area" or EZ zone.

The filter is implemented in different ways depending on the setup it will run on.
1. **Neuromorphic Hardware**: The filter is implemented using analog filters. 
2. **Software Simulation**: *Butterworth filters* are utilized since they are a good approximation of the tuned *Tow-Thomas* architectures implemented in hardware.

The frequency response of the *Butterworth filter* is maximally flat in the passband and rolls of towards 0 in the stopband.

### Check WD (change if necessary) and file loading

In [109]:
# Show current directory
import os
curr_dir = os.getcwd()
print(curr_dir)

# Check if the current WD is the file location
if "/src/hfo/filter" not in os.getcwd():
    # Set working directory to this file location
    file_location = f"{os.getcwd()}/thesis-lava/src/hfo/filter"
    print("File Location: ", file_location)

    # Change the current working Directory
    os.chdir(file_location)

    # New Working Directory
    print("New Working Directory: ", os.getcwd())

PATH_TO_FILE = '' # 'src/hfo/'  # This is needed if the WD is not the same as the file location

/home/monkin/Desktop/feup/thesis/thesis-lava/src/hfo/filter


In [110]:
import numpy as np
import math

INPUT_FILE_COMMON = "seeg_ics"  # "seeg_csl"  # "seeg_synthetic_humans"
seeg_file_name = f"{INPUT_FILE_COMMON}.npy"   # "seeg_synthetic_humans.npy"
markers_seeg_file_name = f"{INPUT_FILE_COMMON}_markers.npy"

IS_CLINICAL = True
OUTPUT_PATH = "/clinical" if IS_CLINICAL else "/synthetic"
OUT_FILE_PREFIX = f"filtered_{INPUT_FILE_COMMON}_ch" if IS_CLINICAL else "filtered_seeg_ch"

recorded_data = np.load(f"{PATH_TO_FILE}data/{seeg_file_name}")

print("Data shape: ", recorded_data.shape)
print("First time steps: ", recorded_data[:10])

Data shape:  (125056, 101)
First time steps:  [[-107.66629    -81.347855    19.14067   ...  -25.25505     18.077301
   -10.6336975]
 [-111.653915   -82.942924    17.811462  ...  -26.31842     16.2164
   -13.026291 ]
 [-113.51482    -81.61371     10.3678665 ...  -26.052582    17.27977
   -16.2164   ]
 ...
 [-121.49011    -87.99393     -8.506966  ...  -25.52089     13.026291
   -18.077309 ]
 [-123.08516    -87.462234   -15.950562  ...  -24.723358    14.621338
   -21.799095 ]
 [-123.61684    -87.72809    -23.128311  ...  -24.723373    14.089661
   -23.39415  ]]


## Define the Filter

In [111]:
from scipy.signal import butter, lfilter

# ================================================================ #
# ============ Butterworth Filter Coefficients =================== #
# ================================================================ #
def butter_bandpass(lowcut, highcut, sampling_freq, order=5):
    """
    This function is used to generate the coefficients for lowpass, highpass and bandpass
    filtering for Butterworth filters.
    @lowcut, highcut (int): cutoff frequencies for the bandpass filter
    @sampling_freq (float): sampling_frequency frequency of the wideband signal
    @order (int): filter order

    - return b, a (float): filtering coefficients that will be applied on the wideband signal
    """
    nyq = 0.5 * sampling_freq   # Nyquist frequency
    low = lowcut / nyq          # Normalizing the cutoff frequencies
    high = highcut / nyq        # Normalizing the cutoff frequencies

    return butter(order, [low, high], btype='band')    

# ================================================================ #
# ====================== Butterworth Filters ===================== #
# ================================================================ #
def butter_bandpass_filter(data, lowcut, highcut, sampling_freq, order=5):
    """
    This function applies the filtering coefficients calculated above to the wideband signal (original signal).
    @data (array): Array with the amplitude values of the wideband signal.
    @lowcut, highcut (int): cutoff frequencies for the bandpass filter.
    @sampling_freq (float): sampling frequency of the original signal.
    @order (int): filter order.

    - return (array): Array with the amplitude values of the filtered signal.
    """
    coef_b, coef_a = butter_bandpass(lowcut, highcut, sampling_freq, order)

    return lfilter(coef_b, coef_a, data)
    

## Define Global Parameters of the Experiment

In [112]:
from utils.input import SAMPLING_RATE, X_STEP

sampling_rate = SAMPLING_RATE    # 2048 Hz
x_step = X_STEP  # 0.48828125 ms

num_samples = recorded_data.shape[0]    # 2048 * 120 = 245760
num_channels = recorded_data.shape[1]   # 960
input_duration = (num_samples / sampling_rate) * 1000

print(f"Input Duration: {input_duration} ms")

Input Duration: 61062.5 ms


### Extract a window of channels from the SEEG data
Let's define the window first.

If we want to extract a single channel, set the variable `is_single_channel` to `True` and the variable `min_channel_idx` to the desired channel number.

In [113]:
is_single_channel = False   # Set to True if you want to use only one channel

# Define the window of channels to be used
BRAIN_REGION_IDX = 0
BRAIN_REGION_OFFSET = BRAIN_REGION_IDX * 120
SNR_OFFSET = 90     # Choose the highest SNR (channels 90-120)
min_channel_idx = 30  # BRAIN_REGION_OFFSET + SNR_OFFSET
max_channel_idx = min_channel_idx + 30

if is_single_channel:
    # Set the window to size 1
    max_channel_idx = min_channel_idx + 1

In [114]:
from utils.io import preview_np_array
seeg_window = recorded_data[:, min_channel_idx:max_channel_idx]

preview_np_array(seeg_window, "SEEG Window")

SEEG Window Shape: (125056, 30).
Preview: [[ -23.128326    -48.38336     -13.557968      2.3925781     8.506958
  ...  103.4128       68.055725     78.42359    -141.16246
   160.0373    ]
 [ -24.19168     -46.52246     -13.292145      1.0633698    12.760452
  ...  107.93213      68.58741      80.28449    -139.56741
   159.50562   ]
 [ -24.989212    -47.585846     -8.772797     -0.26585388   13.557983
  ...  108.72965      67.258194     81.87955    -139.83325
   159.23976   ]
 [ -24.191696    -45.72493     -10.10202      -1.5950623    14.355499
  ...  108.19797      68.58741      78.157745   -139.03572
   161.10066   ]
 [ -23.925842    -44.927414    -10.367874     -1.8608856    15.950562
  ...  110.05887      67.78989      77.094376   -136.90898
   161.10066   ]
 ...
 [ -30.040222    -11.431236     -8.241127    -23.128311     65.928986
  ... -103.14696      45.72494      83.4746       -6.1143837
   134.51639   ]
 [ -29.508537    -10.3678665   -11.697075    -22.862469     62.473026
  ...

## Apply the Butterworth filter to each channel

In [115]:
# Apply the Butterworth filter to the window of channels in the Ripple Band
ripple_lowcut_freq = 80
ripple_highcut_freq = 250
BUTTER_FILTER_ORDER = 9

ripple_band_seeg_window = [ butter_bandpass_filter(seeg_window[:, i], ripple_lowcut_freq, ripple_highcut_freq, sampling_rate, BUTTER_FILTER_ORDER) for i in range(seeg_window.shape[1]) ]
ripple_band_seeg_window = np.array(ripple_band_seeg_window).T
preview_np_array(ripple_band_seeg_window, "Ripple Band SEEG Window", edge_items=3)

Ripple Band SEEG Window Shape: (125056, 30).
Preview: [[-3.45583186e-05 -7.22943633e-05 -2.02583003e-05 ...  1.17180440e-04
  -2.10924785e-04  2.39127537e-04]
 [-5.05349660e-04 -1.05106314e-03 -2.94910505e-04 ...  1.71093402e-03
  -3.07229283e-03  3.48499665e-03]
 [-3.53865953e-03 -7.31281957e-03 -2.04756568e-03 ...  1.19592369e-02
  -2.14150325e-02  2.43053956e-02]
 ...
 [ 1.50714903e+00 -6.00145536e-01  5.61266194e-01 ...  1.41499998e-01
  -3.58904047e-02  2.71875797e+00]
 [ 1.20587999e+00 -6.45394297e-01  1.23579680e+00 ...  1.91074659e-01
  -2.85881185e-01  3.25530382e+00]
 [ 5.39099496e-01 -5.29644202e-01  1.55495989e+00 ...  2.49057138e-01
  -8.25277516e-01  3.60921031e+00]]


In [116]:
# Apply the Butterworth filter to the window of channels in the Fast Ripple Band
fr_lowcut_freq = 250
fr_highcut_freq = 500

fr_band_seeg_window = [ butter_bandpass_filter(seeg_window[:, i], fr_lowcut_freq, fr_highcut_freq, sampling_rate, BUTTER_FILTER_ORDER) for i in range(seeg_window.shape[1]) ]
fr_band_seeg_window = np.array(fr_band_seeg_window).T
preview_np_array(fr_band_seeg_window, "FR Band SEEG Window", edge_items=3)

FR Band SEEG Window Shape: (125056, 30).
Preview: [[-6.68777581e-04 -1.39905098e-03 -3.92041559e-04 ...  2.26769283e-03
  -4.08184695e-03  4.62763070e-03]
 [-4.69889710e-03 -9.71173697e-03 -2.72881143e-03 ...  1.58825826e-02
  -2.84456680e-02  3.22860535e-02]
 [-9.01967091e-03 -1.80266472e-02 -4.96371999e-03 ...  3.01997007e-02
  -5.32861094e-02  6.06522665e-02]
 ...
 [ 5.59608488e-01 -1.34049590e-02 -6.30097123e-01 ...  1.32435808e-01
   2.16404503e-01 -1.49831346e+00]
 [ 8.15717077e-01 -4.35934484e-01  3.00535918e-01 ...  5.62505164e-01
   3.14394276e-01 -9.11249438e-01]
 [-7.03969363e-01 -3.12499050e-01  1.16348157e+00 ... -6.88015476e-02
   4.60469373e-01  9.18939256e-01]]


Apply the Butterworth filter in the combined Ripple+FR Band

In [117]:
# Apply the Butterworth filter to the window of channels in the Combined Ripple and Fast Ripple Band
both_band_seeg_window = [ butter_bandpass_filter(seeg_window[:, i], ripple_lowcut_freq, fr_highcut_freq, sampling_rate, BUTTER_FILTER_ORDER) for i in range(seeg_window.shape[1]) ]
both_band_seeg_window = np.array(both_band_seeg_window).T
preview_np_array(both_band_seeg_window, "Both Bands SEEG Window", edge_items=3)

Both Bands SEEG Window Shape: (125056, 30).
Preview: [[-0.02945923 -0.0616273  -0.01726918 ...  0.09989042 -0.17980275
   0.20384418]
 [-0.2769466  -0.57415547 -0.16121525 ...  0.93684897 -1.68002993
   1.90629282]
 [-1.09867737 -2.24892972 -0.62710442 ...  3.70319559 -6.60351385
   7.50095936]
 ...
 [-0.99666961  0.77272963 -0.76120399 ...  2.35955208 -0.64079101
   0.6181213 ]
 [-0.45465793  0.52715916 -1.02956973 ...  3.30577122 -0.93174099
   1.92176336]
 [ 0.31837213  0.01130678 -1.18531412 ...  2.79142579  0.29668694
   2.4572324 ]]


## Import the Markers (Annotated Events) 
The markers are stored in a numpy array of shape (num_channels, events):
- Each row represents the events of a channel
- Each event is composed of the following 3 fields (Label, Position, Shape)

In [118]:
markers = np.load(f"{PATH_TO_FILE}data/{markers_seeg_file_name}", allow_pickle=True)

preview_np_array(markers, "Markers", edge_items=3)

Markers Shape: (101,).
Preview: [list([array([],
       dtype=[('label', '<U64'), ('position', '<f4'), ('duration', '<f4')])])
 list([array([],
       dtype=[('label', '<U64'), ('position', '<f4'), ('duration', '<f4')])])
 list([array([],
       dtype=[('label', '<U64'), ('position', '<f4'), ('duration', '<f4')])])
 ...
 list([array([],
       dtype=[('label', '<U64'), ('position', '<f4'), ('duration', '<f4')])])
 list([array([],
       dtype=[('label', '<U64'), ('position', '<f4'), ('duration', '<f4')])])
 list([array([],
       dtype=[('label', '<U64'), ('position', '<f4'), ('duration', '<f4')])])]


### Define the set of channels the markers will be extracted from

In [119]:
channels_used = set(range(min_channel_idx, max_channel_idx))
print("Channels used: ", channels_used)

Channels used:  {30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59}


## Visualize the filtered signals

In [120]:
# Interactive Plot for the HFO detection
# bokeh docs: https://docs.bokeh.org/en/2.4.1/docs/first_steps/first_steps_1.html

from utils.line_plot import create_fig  # Import the function to create the figure
from bokeh.models import Range1d

# Define the x and y values
# Should the first input start at 0 or x_step?
# TODO: is it okay to create a range with floats?
x = [val for val in np.arange(x_step, input_duration + x_step, x_step)] 

## Create the Plot

In [121]:
# Create the plot
# List of tuples containing the y values and the legend label
hfo_y_arrays = []

PLOT_RIPPLE_BAND = False
PLOT_FR_BAND = False
PLOT_BOTH_BAND = True

if is_single_channel:
    # Add the Ripple and FR bands of the single channel
    hfo_y_arrays.append((ripple_band_seeg_window[:, 0], f"Ripple Band Ch. {min_channel_idx}"))
    hfo_y_arrays.append((fr_band_seeg_window[:, 0], f"Fast Ripple Band Ch. {min_channel_idx}"))
else:
    # Add the Ripple, FR and both bands of each channel in the range defined below
    min_hfo_idx = 0
    max_hfo_idx = 8
    if PLOT_RIPPLE_BAND:
        for hfo_idx in range(min_hfo_idx, max_hfo_idx, 1):
            hfo_y_arrays.append((ripple_band_seeg_window[:, hfo_idx], f"Ripple Band Ch. {min_channel_idx + hfo_idx}"))
    if PLOT_FR_BAND:
        for hfo_idx in range(min_hfo_idx, max_hfo_idx, 1):
                hfo_y_arrays.append((fr_band_seeg_window[:, hfo_idx], f"Fast Ripple Band Ch. {min_channel_idx + hfo_idx}"))
    if PLOT_BOTH_BAND:
        for hfo_idx in range(min_hfo_idx, max_hfo_idx, 1):
            hfo_y_arrays.append((both_band_seeg_window[:, hfo_idx], f"Both Bands Ch. {min_channel_idx + hfo_idx}"))


# Create the SEEG Voltage plot
hfo_plot = create_fig(
    title="SEEG Voltage dynamics of Filtered Both Bands", 
    x_axis_label='time (ms)', 
    y_axis_label='Voltage (μV)',
    x=x, 
    y_arrays=hfo_y_arrays, 
    sizing_mode="stretch_both", 
    tools="pan, box_zoom, wheel_zoom, hover, undo, redo, zoom_in, zoom_out, reset, save",
    tooltips="Data point @x: @y",
    legend_location="top_right",
    legend_bg_fill_color="navy",
    legend_bg_fill_alpha=0.1,
    # y_range=Range1d(-0.05, 1.05)
)

# If there are more than 30 channels, hide the legend
if len(hfo_y_arrays) > 30:
    # Hide the legend
    hfo_plot.legend.visible = False

## Add Box Annotations to the plot to identify the marked HFOs (ground truth)

In [122]:
from bokeh.models import BoxAnnotation
# from utils.line_plot import color_map

show_markers = False    # Boolean to show the markers

color_map = {                  
    'Spike': 'red',
    'Fast-Ripple': 'blue',
    'Ripple': 'green',  
    'Spike+Ripple': 'yellow',
    'Spike+Fast-Ripple': 'pink',
    'Ripple+Fast-Ripple': 'cyan',
    'Spike+Ripple+Fast-Ripple': 'black'
}

confidence_range = 100          # TODO: Check this value. When the duration is missing (0), we consider the 200ms window around the marked position 
visited_markers = {}    # Avoid inserting multiple boxes for the same marker (only one of each label)
use_visited = False     # Boolean controlling if we remove duplicate markers
plot_instant = True     # Boolean to plot the markers as instant events or as boxes
instant_width = 100 # 20       # Width of the instant event for visualization purposes

if show_markers:
    for ch_idx in channels_used:
        channel_markers = markers[ch_idx]
        # print("channel_markers", channel_markers)
        for idx2, marker in enumerate(channel_markers):
            # print("marker:", marker)
            
            if use_visited:
                # Check if the marker has already been visited and skip it if it has
                if marker['position'] in visited_markers:
                    visited_labels = visited_markers[marker['position']]    # Get the labels that already have an annotation for this position
                    if marker['label'] in visited_labels:
                        # print("Skipping marker", marker['position'], marker['label'])
                        continue    # Skip this marker
                    else:
                        visited_labels.append(marker['label'])  # Add the label to the visited labels
                else:
                    visited_markers[marker['position']] = [marker['label']] # Add the marker to the visited markers

            # Add a box annotation for each marker
            has_duration = marker['duration'] > 0
            
            confidence_constant = 0 if plot_instant or has_duration else confidence_range

            left = marker['position'] - confidence_constant
            right = marker['position'] + confidence_constant + instant_width
            box_color = color_map[marker['label']]  # Choose a color according to the label
            
            # if left < min_t or right > max_t:
            #     continue    # Skip this marker
            

            box = BoxAnnotation(left=left, right=right, fill_color=box_color, fill_alpha=0.35)
            # print("Added marker for channel: ", ch_idx, " at position: ", left)
            hfo_plot.add_layout(box)

## Show the Plot

In [123]:
import bokeh.plotting as bplt

showPlot = True
if showPlot:
    bplt.show(hfo_plot)

## Export the plot to a file

In [124]:
export = True
CH_SUFFIX = f"{min_channel_idx}" if is_single_channel else f"{min_channel_idx}-{max_channel_idx - 1}"
file_name = f"{OUT_FILE_PREFIX}{CH_SUFFIX}"

if export:
    file_path = f"plots{OUTPUT_PATH}/{file_name}.html"

    # Customize the output file settings
    bplt.output_file(filename=file_path, title="SEEG Data - Filtered Voltage dynamics across time")

    # Save the plot
    bplt.save(hfo_plot)

# Close the plot
bplt.curdoc().clear()
bplt.reset_output()

## Export the filtered signals to a numpy file

### Get the relevant Markers

In [125]:
# Save the relevant markers in a variable
relevant_markers = markers[min_channel_idx:max_channel_idx]
preview_np_array(relevant_markers, "Relevant Markers", edge_items=3)

Relevant Markers Shape: (30,).
Preview: [array([('Ripple',  2598.6328, 0.), ('Ripple', 32159.18  , 0.),
        ('Ripple', 35217.773 , 0.), ('Ripple', 49569.824 , 0.)],
       dtype=[('label', '<U64'), ('position', '<f4'), ('duration', '<f4')])
 array([('Ripple',  2598.6328, 0.), ('Ripple', 32159.18  , 0.),
        ('Ripple', 35217.773 , 0.), ('Ripple', 49569.824 , 0.)],
       dtype=[('label', '<U64'), ('position', '<f4'), ('duration', '<f4')])
 array([('Ripple',  2598.6328, 0.), ('Ripple', 32159.18  , 0.),
        ('Ripple', 35217.773 , 0.)],
       dtype=[('label', '<U64'), ('position', '<f4'), ('duration', '<f4')])
 ...
 array([('Fast Ripple',  3478.5156, 0.), ('Ripple',  4852.539 , 0.),
        ('Ripple',  5303.711 , 0.), ..., ('Ripple', 32159.18  , 0.),
        ('Ripple', 34225.586 , 0.), ('Ripple', 38057.13  , 0.)],
       dtype=[('label', '<U64'), ('position', '<f4'), ('duration', '<f4')])
 list([array([],
       dtype=[('label', '<U64'), ('position', '<f4'), ('duration', '<f4'

In [126]:
from utils.input import RIPPLE_BAND_FILENAME, FR_BAND_FILENAME, BOTH_BAND_FILENAME

EXPORT_FILTERED_SIGNAL = True
if EXPORT_FILTERED_SIGNAL:
    # Export the filtered signals
    np.save(f"{PATH_TO_FILE}results{OUTPUT_PATH}/{file_name}_{RIPPLE_BAND_FILENAME}_band.npy", ripple_band_seeg_window)
    np.save(f"{PATH_TO_FILE}results{OUTPUT_PATH}/{file_name}_{FR_BAND_FILENAME}_band.npy", fr_band_seeg_window)
    np.save(f"{PATH_TO_FILE}results{OUTPUT_PATH}/{file_name}_{BOTH_BAND_FILENAME}_band.npy", both_band_seeg_window)

    # Export the markers
    np.save(f"{PATH_TO_FILE}results{OUTPUT_PATH}/{file_name}_markers.npy", relevant_markers)