# 1. Data Exploration

This notebook is for exploring the raw EDA datasets (WESAD, DEAP, etc.). The goal is to understand the data format, signal characteristics, and identify potential challenges like artifacts, noise, and baseline drift.

## 1.1 Setup and Configuration

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pickle
import sys

# Add src to path to import custom modules
sys.path.append('../src')

from visualization.plot import plot_eda_comparison

sns.set(style="whitegrid")

# Define paths to the datasets
WESAD_PATH = '../data/raw/WESAD/'

## 1.2 Loading WESAD Data

Let's load the data for a single subject from the WESAD dataset. The data is stored in a `.pkl` file.

In [None]:
# Load data for subject S2
subject_id = 'S2'
with open(f'{WESAD_PATH}/{subject_id}/{subject_id}.pkl', 'rb') as f:
    data = pickle.load(f, encoding='latin1')

# Extract the raw EDA signal from the chest sensor
raw_eda = data['signal']['chest']['EDA']
labels = data['label']

print(f"Loaded EDA signal for subject {subject_id} with {raw_eda.shape[0]} samples.")
print(f"Signal sampling rate: 700 Hz")

## 1.3 Visualizing a Raw EDA Segment

Let's plot a segment of the raw signal to observe its characteristics. We can clearly see some motion artifacts.

In [None]:
# Plot the first 10 minutes (700 samples/sec * 60 sec/min * 10 min)
plt.figure(figsize=(20, 6))
plt.plot(raw_eda[:700*60*10])
plt.title(f'Raw EDA Signal (First 10 Minutes) - Subject {subject_id}')
plt.xlabel('Samples')
plt.ylabel('Conductance (Î¼S)')
plt.show()