# Filtering - Pre-Processing of the SEEG Signal (1/2)
This notebook presents the **pre-processing stage 1** the SEEG signal goes through before being fed to the SNN. The pre-processing stages are as follows:
1. **Filtering**: The SEEG signal is bandpass filtered to remove noise and artifacts. The bandpass filter is designed using the Butterworth filter and, since we are working with *iEEG*, the signal is filtered in the ripples and FR bands. The co-occurrence of HFOs in both bands is an optimal prediction of post-surgical seizure freedom by defining an optimal "HFO area" or EZ zone.
2. **Signal-to-Spike Conversion**: To interface and communicate with the silicon neurons in the SNN, the SEEG signal must be converted to spikes.

## Filtering
Depending on the EEG modality, the signal is filtered in different frequency bands. In this case, since we are handling *iEEG* or *sEEG* data, the signal is filtered in both the ripples (80-250Hz) and FR bands (250-500Hz). The co-occurrence of HFO in these bands represents an optimal prediction of post-surgical seizure freedom by defining an optimal "HFO area" or EZ zone.

The filter is implemented in different ways depending on the setup it will run on.
1. **Neuromorphic Hardware**: The filter is implemented using analog filters. 
2. **Software Simulation**: *Butterworth filters* are utilized since they are a good approximation of the tuned *Tow-Thomas* architectures implemented in hardware.

The frequency response of the *Butterworth filter* is maximally flat in the passband and rolls of towards 0 in the stopband.

### Check WD (change if necessary) and file loading

In [61]:
# Show current directory
import os
curr_dir = os.getcwd()
print(curr_dir)

# Check if the current WD is the file location
if "/src/hfo/filter" not in os.getcwd():
    # Set working directory to this file location
    file_location = f"{os.getcwd()}/thesis-lava/src/hfo/filter"
    print("File Location: ", file_location)

    # Change the current working Directory
    os.chdir(file_location)

    # New Working Directory
    print("New Working Directory: ", os.getcwd())

PATH_TO_FILE = '' # 'src/hfo/'  # This is needed if the WD is not the same as the file location

/home/monkin/Desktop/feup/thesis/thesis-lava/src/hfo/filter


In [62]:
import numpy as np
import math

seeg_file_name = "seeg_synthetic_humans.npy"
recorded_data = np.load(f"{PATH_TO_FILE}data/{seeg_file_name}")

print("Data shape: ", recorded_data.shape)
print("First time steps: ", recorded_data[:10])

Data shape:  (245760, 960)
First time steps:  [[ 3.2352024e-01 -1.3235390e+00 -5.9668809e-01 ... -1.9608999e+00
  -1.9769822e-01 -1.2078454e+00]
 [-6.9759099e-04 -3.5122361e+00 -4.8766956e-01 ... -5.8757830e+00
  -7.4400985e-01 -5.1096064e-01]
 [ 1.9026639e+00 -5.6726017e+00  9.8274893e-01 ... -6.6182971e+00
  -8.3053267e-01 -8.1596655e-01]
 ...
 [ 3.2172418e+00 -8.4650068e+00  1.5216088e+00 ... -4.1081657e+00
   2.0085973e-01 -4.7539668e+00]
 [ 1.7725919e+00 -9.4744024e+00  1.6776791e+00 ... -4.1469693e+00
   1.6412770e+00 -3.4672713e+00]
 [ 7.8109097e-01 -1.0500931e+01  2.3717029e+00 ... -5.1762242e+00
   1.0715837e+00 -4.4489903e+00]]


## Define the Filter

In [63]:
from scipy.signal import butter, lfilter

# ================================================================ #
# ============ Butterworth Filter Coefficients =================== #
# ================================================================ #
def butter_bandpass(lowcut, highcut, sampling_freq, order=5):
    """
    This function is used to generate the coefficients for lowpass, highpass and bandpass
    filtering for Butterworth filters.
    @lowcut, highcut (int): cutoff frequencies for the bandpass filter
    @sampling_freq (float): sampling_frequency frequency of the wideband signal
    @order (int): filter order

    - return b, a (float): filtering coefficients that will be applied on the wideband signal
    """
    nyq = 0.5 * sampling_freq   # Nyquist frequency
    low = lowcut / nyq          # Normalizing the cutoff frequencies
    high = highcut / nyq        # Normalizing the cutoff frequencies

    return butter(order, [low, high], btype='band')    

# ================================================================ #
# ====================== Butterworth Filters ===================== #
# ================================================================ #
def butter_bandpass_filter(data, lowcut, highcut, sampling_freq, order=5):
    """
    This function applies the filtering coefficients calculated above to the wideband signal (original signal).
    @data (array): Array with the amplitude values of the wideband signal.
    @lowcut, highcut (int): cutoff frequencies for the bandpass filter.
    @sampling_freq (float): sampling frequency of the original signal.
    @order (int): filter order.

    - return (array): Array with the amplitude values of the filtered signal.
    """
    coef_b, coef_a = butter_bandpass(lowcut, highcut, sampling_freq, order)

    return lfilter(coef_b, coef_a, data)
    

## Define Global Parameters of the Experiment

In [64]:
sampling_rate = 2048    # 2048 Hz
input_duration = 120 * (10**3)    # 120000 ms or 120 seconds
num_samples = recorded_data.shape[0]    # 2048 * 120 = 245760
num_channels = recorded_data.shape[1]   # 960

x_step = 1/sampling_rate * (10**3)  # 0.48828125 ms

### Extract a window of channels from the SEEG data
Let's define the window first.

If we want to extract a single channel, set the variable `is_single_channel` to `True` and the variable `min_channel_idx` to the desired channel number.

In [65]:
is_single_channel = False   # Set to True if you want to use only one channel

# Define the window of channels to be used
min_channel_idx = 90
max_channel_idx = min_channel_idx + 30

if is_single_channel:
    # Set the window to size 1
    max_channel_idx = min_channel_idx + 1

In [66]:
from utils.io import preview_np_array
seeg_window = recorded_data[:, min_channel_idx:max_channel_idx]

preview_np_array(seeg_window, "SEEG Window")

SEEG Window Shape: (245760, 30).
Preview: [[ 3.2148191e-01  7.6039447e-03  4.7144434e-01  1.6629694e-01
   6.4306718e-01 ... -1.0047178e+00 -2.3591769e+00 -9.1859162e-01
  -1.1903234e+00 -1.0296663e+00]
 [-2.0508800e-02 -4.6258485e-01 -7.4022943e-01 -1.8455078e-01
   2.1398619e-01 ... -1.3898176e+00 -3.7664881e+00 -1.3889902e+00
  -5.1110125e-01 -1.5383426e+00]
 [-1.2178916e+00 -2.0358562e+00 -1.2876116e+00 -1.3673829e+00
   1.2640097e+00 ... -2.6840944e+00 -4.0100474e+00 -1.9828362e+00
  -1.4741321e+00 -1.6997232e+00]
 [-1.0280560e+00 -3.0880692e+00 -1.2848712e+00  2.0071094e-01
   2.3219368e+00 ... -1.9042799e+00 -2.8593712e+00 -1.8951950e+00
  -2.2079365e+00 -2.1716366e+00]
 [-1.6502820e+00 -2.6590590e+00 -2.6356335e+00 -1.8437275e-01
   4.8591003e+00 ... -7.3706615e-01 -3.4916322e+00 -3.4061065e+00
  -1.9112504e+00 -2.1673732e+00]
 ...
 [ 5.8455471e+01  5.3846050e+01 -2.0737498e+00 -3.2716221e-01
   7.4935384e+00 ...  5.5184326e+01  6.2657150e+01 -5.0081539e+01
  -1.8071165e+01 -1.

## Apply the Butterworth filter to each channel

In [67]:
# Apply the Butterworth filter to the window of channels in the Ripple Band
ripple_lowcut_freq = 80
ripple_highcut_freq = 250

ripple_band_seeg_window = [ butter_bandpass_filter(seeg_window[:, i], ripple_lowcut_freq, ripple_highcut_freq, sampling_rate) for i in range(seeg_window.shape[1]) ]
ripple_band_seeg_window = np.array(ripple_band_seeg_window).T
preview_np_array(ripple_band_seeg_window, "Ripple Band SEEG Window", edge_items=3)

Ripple Band SEEG Window Shape: (245760, 30).
Preview: [[ 1.84829940e-04  4.37174409e-06  2.71047998e-04 ... -5.28126862e-04
  -6.84353879e-04 -5.91987150e-04]
 [ 1.37975809e-03 -2.33040380e-04  1.61508759e-03 ... -4.77473965e-03
  -5.44621734e-03 -5.34139889e-03]
 [ 3.92964380e-03 -3.06118826e-03  2.97532191e-03 ... -2.06351304e-02
  -2.05310798e-02 -2.27491617e-02]
 ...
 [ 6.57385201e-01  7.72400799e-01 -4.98205341e-01 ... -2.25650820e-01
  -1.15243564e+00  1.23691538e+00]
 [ 5.95894958e-01  5.34762270e-01 -5.39540198e-01 ... -4.98290647e-01
  -1.20278059e+00  7.54000855e-01]
 [ 3.30411778e-01  3.01232878e-01 -5.59614286e-01 ... -8.15564053e-01
  -1.14933726e+00  1.73992494e-01]]


In [68]:
# Apply the Butterworth filter to the window of channels in the Fast Ripple Band
fr_lowcut_freq = 250
fr_highcut_freq = 500

fr_band_seeg_window = [ butter_bandpass_filter(seeg_window[:, i], fr_lowcut_freq, fr_highcut_freq, sampling_rate) for i in range(seeg_window.shape[1]) ]
fr_band_seeg_window = np.array(fr_band_seeg_window).T
preview_np_array(fr_band_seeg_window, "FR Band SEEG Window", edge_items=3)

FR Band SEEG Window Shape: (245760, 30).
Preview: [[ 9.55236180e-04  2.25940024e-05  1.40082745e-03 ... -2.72945977e-03
  -3.53687062e-03 -3.05950185e-03]
 [ 3.10524546e-03 -1.29961370e-03  2.44363903e-03 ... -1.31741286e-02
  -1.32418198e-02 -1.47118505e-02]
 [-4.78473281e-03 -1.06279148e-02 -1.25298819e-02 ... -1.68170844e-02
  -5.84467919e-03 -1.71137287e-02]
 ...
 [ 2.70639699e-01  4.39370956e-01  1.64996622e-01 ... -1.70404540e-02
   5.26999132e-01 -4.95964763e-01]
 [-1.83190240e-01 -4.41412299e-01  4.53359242e-01 ... -5.10423230e-01
   1.10863199e-01 -2.75361798e-01]
 [-3.01458872e-01 -2.90944680e-01  2.09993078e-01 ... -4.26309677e-01
  -4.75959832e-01  2.86613031e-01]]


## Import the Markers (Annotated Events) 
The markers are stored in a numpy array of shape (num_channels, events):
- Each row represents the events of a channel
- Each event is composed of the following 3 fields (Label, Position, Shape)

In [69]:
markers_seeg_file_name = "seeg_synthetic_humans_markers.npy"
markers = np.load(f"{PATH_TO_FILE}data/{markers_seeg_file_name}")

preview_np_array(markers, "Markers", edge_items=3)

Markers Shape: (960, 42).
Preview: [[('Spike+Ripple+Fast-Ripple',   1000.  , 0.)
  ('Spike+Ripple+Fast-Ripple',   4537.6 , 0.)
  ('Ripple+Fast-Ripple',   7610.84, 0.) ... ('Ripple', 113024.  , 0.)
  ('Fast-Ripple', 116549.  , 0.) ('Spike+Ripple', 119000.  , 0.)]
 [('Spike+Fast-Ripple',   1000.  , 0.)
  ('Spike+Ripple+Fast-Ripple',   3849.12, 0.)
  ('Ripple+Fast-Ripple',   7010.25, 0.) ...
  ('Fast-Ripple', 114176.  , 0.) ('Spike+Fast-Ripple', 116672.  , 0.)
  ('Fast-Ripple', 119000.  , 0.)]
 [('Fast-Ripple',   1000.  , 0.)
  ('Spike+Ripple+Fast-Ripple',   4357.42, 0.)
  ('Fast-Ripple',   7062.01, 0.) ... ('Spike+Fast-Ripple', 113759.  , 0.)
  ('Ripple+Fast-Ripple', 116295.  , 0.) ('Spike', 119000.  , 0.)]
 ...
 [('Spike+Fast-Ripple',   1000.  , 0.) ('Spike',   3671.88, 0.)
  ('Fast-Ripple',   6912.6 , 0.) ... ('Ripple', 114088.  , 0.)
  ('Spike', 116028.  , 0.) ('Spike+Ripple', 119000.  , 0.)]
 [('Spike+Fast-Ripple',   1000.  , 0.) ('Fast-Ripple',   3782.23, 0.)
  ('Ripple+Fast-Ripple'

### Define the set of channels the markers will be extracted from

In [70]:
channels_used = set(range(min_channel_idx, max_channel_idx))
print("Channels used: ", channels_used)

Channels used:  {90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119}


## Visualize the filtered signals

In [71]:
# Interactive Plot for the HFO detection
# bokeh docs: https://docs.bokeh.org/en/2.4.1/docs/first_steps/first_steps_1.html

from utils.line_plot import create_fig  # Import the function to create the figure
from bokeh.models import Range1d

# Define the x and y values
# Should the first input start at 0 or x_step?
# TODO: is it okay to create a range with floats?
x = [val for val in np.arange(x_step, input_duration + x_step, x_step)] 

## Create the Plot

In [72]:
# Create the plot
# List of tuples containing the y values and the legend label
hfo_y_arrays = []

if is_single_channel:
    # Add the Ripple and FR bands of the single channel
    hfo_y_arrays.append((ripple_band_seeg_window[:, 0], f"Ripple Band Ch. {min_channel_idx}"))
    hfo_y_arrays.append((fr_band_seeg_window[:, 0], f"Fast Ripple Band Ch. {min_channel_idx}"))
else:
    # Add the Ripple and FR bands of each channel in the range defined below
    min_hfo_idx = 20
    max_hfo_idx = 24
    for hfo_idx in range(min_hfo_idx, max_hfo_idx, 1):
        hfo_y_arrays.append((ripple_band_seeg_window[:, hfo_idx], f"Ripple Band Ch. {min_channel_idx + hfo_idx}"))
        hfo_y_arrays.append((fr_band_seeg_window[:, hfo_idx], f"Fast Ripple Band Ch. {min_channel_idx + hfo_idx}"))


# Create the SEEG Voltage plot
hfo_plot = create_fig(
    title="SEEG Voltage dynamics of Filtered Ripple and Fast Ripple Bands", 
    x_axis_label='time (ms)', 
    y_axis_label='Voltage (μV)',
    x=x, 
    y_arrays=hfo_y_arrays, 
    sizing_mode="stretch_both", 
    tools="pan, box_zoom, wheel_zoom, hover, undo, redo, zoom_in, zoom_out, reset, save",
    tooltips="Data point @x: @y",
    legend_location="top_right",
    legend_bg_fill_color="navy",
    legend_bg_fill_alpha=0.1,
    # y_range=Range1d(-0.05, 1.05)
)

# If there are more than 30 channels, hide the legend
if len(hfo_y_arrays) > 30:
    # Hide the legend
    hfo_plot.legend.visible = False

## Add Box Annotations to the plot to identify the marked HFOs (ground truth)

In [73]:
from bokeh.models import BoxAnnotation
# from utils.line_plot import color_map

show_markers = False    # Boolean to show the markers

color_map = {                  
    'Spike': 'red',
    'Fast-Ripple': 'blue',
    'Ripple': 'green',  
    'Spike+Ripple': 'yellow',
    'Spike+Fast-Ripple': 'pink',
    'Ripple+Fast-Ripple': 'cyan',
    'Spike+Ripple+Fast-Ripple': 'black'
}

confidence_range = 100          # TODO: Check this value. When the duration is missing (0), we consider the 200ms window around the marked position 
visited_markers = {}    # Avoid inserting multiple boxes for the same marker (only one of each label)
use_visited = False     # Boolean controlling if we remove duplicate markers
plot_instant = True     # Boolean to plot the markers as instant events or as boxes
instant_width = 100 # 20       # Width of the instant event for visualization purposes

if show_markers:
    for ch_idx in channels_used:
        channel_markers = markers[ch_idx]
        # print("channel_markers", channel_markers)
        for idx2, marker in enumerate(channel_markers):
            # print("marker:", marker)
            
            if use_visited:
                # Check if the marker has already been visited and skip it if it has
                if marker['position'] in visited_markers:
                    visited_labels = visited_markers[marker['position']]    # Get the labels that already have an annotation for this position
                    if marker['label'] in visited_labels:
                        # print("Skipping marker", marker['position'], marker['label'])
                        continue    # Skip this marker
                    else:
                        visited_labels.append(marker['label'])  # Add the label to the visited labels
                else:
                    visited_markers[marker['position']] = [marker['label']] # Add the marker to the visited markers

            # Add a box annotation for each marker
            has_duration = marker['duration'] > 0
            
            confidence_constant = 0 if plot_instant or has_duration else confidence_range

            left = marker['position'] - confidence_constant
            right = marker['position'] + confidence_constant + instant_width
            box_color = color_map[marker['label']]  # Choose a color according to the label
            
            # if left < min_t or right > max_t:
            #     continue    # Skip this marker
            

            box = BoxAnnotation(left=left, right=right, fill_color=box_color, fill_alpha=0.35)
            # print("Added marker for channel: ", ch_idx, " at position: ", left)
            hfo_plot.add_layout(box)

## Show the Plot

In [74]:
import bokeh.plotting as bplt

showPlot = True
if showPlot:
    bplt.show(hfo_plot)

## Export the plot to a file

In [75]:
export = False
file_name = f"filtered_seeg_ch{min_hfo_idx}" if is_single_channel else f"filtered_seeg_ch{min_hfo_idx}-{max_hfo_idx}"

if export:
    file_path = f"{PATH_TO_FILE}plots/synthetic/{file_name}.html"

    # Customize the output file settings
    bplt.output_file(filename=file_path, title="SEEG Data - Filtered Voltage dynamics across time")

    # Save the plot
    bplt.save(hfo_plot)

## Export the filtered signals to a numpy file

### Get the relevant Markers

In [76]:
# Save the relevant markers in a variable
relevant_markers = markers[min_channel_idx:max_channel_idx]
preview_np_array(relevant_markers, "Relevant Markers", edge_items=3)

Relevant Markers Shape: (30, 42).
Preview: [[('Spike',   1000.  , 0.) ('Spike+Fast-Ripple',   4218.75, 0.)
  ('Ripple+Fast-Ripple',   6966.8 , 0.) ... ('Ripple', 114551.  , 0.)
  ('Spike+Ripple', 116517.  , 0.) ('Spike+Ripple', 119000.  , 0.)]
 [('Spike+Ripple',   1000.  , 0.)
  ('Spike+Ripple+Fast-Ripple',   3917.97, 0.)
  ('Spike+Fast-Ripple',   6794.92, 0.) ... ('Spike', 113382.  , 0.)
  ('Ripple+Fast-Ripple', 116656.  , 0.) ('Spike+Ripple', 119000.  , 0.)]
 [('Fast-Ripple',   1000.  , 0.) ('Spike',   3655.76, 0.)
  ('Spike',   7188.96, 0.) ...
  ('Spike+Ripple+Fast-Ripple', 112797.  , 0.)
  ('Fast-Ripple', 115363.  , 0.) ('Spike+Fast-Ripple', 119000.  , 0.)]
 ...
 [('Ripple+Fast-Ripple',   1000.  , 0.) ('Ripple',   3543.46, 0.)
  ('Ripple+Fast-Ripple',   6759.77, 0.) ...
  ('Fast-Ripple', 112950.  , 0.) ('Spike+Ripple', 115987.  , 0.)
  ('Spike+Ripple', 119000.  , 0.)]
 [('Spike+Fast-Ripple',   1000.  , 0.) ('Fast-Ripple',   4774.41, 0.)
  ('Ripple',   7656.74, 0.) ... ('Spike', 11

In [79]:
EXPORT_FILTERED_SIGNAL = True
file_name = f"filtered_seeg_ch{min_channel_idx}" if is_single_channel else f"filtered_seeg_ch{min_channel_idx}-{max_channel_idx-1}"
if EXPORT_FILTERED_SIGNAL:
    # Export the filtered signals
    np.save(f"{PATH_TO_FILE}results/synthetic/{file_name}_ripple_band.npy", ripple_band_seeg_window)
    np.save(f"{PATH_TO_FILE}results/synthetic/{file_name}_fr_band.npy", fr_band_seeg_window)

    # Export the markers
    np.save(f"{PATH_TO_FILE}results/synthetic/{file_name}_markers.npy", relevant_markers)