# 2. Data Preprocessing

This notebook applies preprocessing steps to the raw EDA data. This includes:
1.  Loading raw data for a subject.
2.  Applying filtering and decomposition using our `preprocess_eda` function.
3.  Visualizing the results.
4.  Segmenting the data into windows for model training.

In [None]:
import pandas as pd
import numpy as np
import pickle
import sys

# Add src to path to import custom modules
sys.path.append('../src')

from data.preprocess import preprocess_eda
from visualization.plot import plot_eda_comparison

# Load raw data (same as in notebook 01)
WESAD_PATH = '../data/raw/WESAD/'
subject_id = 'S2'
with open(f'{WESAD_PATH}/{subject_id}/{subject_id}.pkl', 'rb') as f:
    data = pickle.load(f, encoding='latin1')
raw_eda = data['signal']['chest']['EDA'].flatten()

## 2.1 Apply Preprocessing

We use the `preprocess_eda` function from `src/data/preprocess.py` which leverages `NeuroKit2` to clean the signal and separate its tonic and phasic components.

In [None]:
SAMPLING_RATE = 700
processed_df = preprocess_eda(raw_eda, sampling_rate=SAMPLING_RATE)

# Select a segment with a visible artifact for comparison
start, end = 700 * 100, 700 * 120 # A 20-second segment

raw_segment = processed_df['EDA_Raw'][start:end]
cleaned_segment = processed_df['EDA_Clean'][start:end]

plot_eda_comparison(raw_segment.to_numpy(), {'NeuroKit Cleaned': cleaned_segment.to_numpy()}, title='Preprocessing Comparison')