# Overview
- Load statistical metrics from EDF files and transform into a dataframe
- Observe dataframe dimension and data types
- Save dataframe to csv file

In [10]:
# Import the necessary packages from utils file
from utils import pd, np, mne, os

In [11]:
# Load one EDF file (PhysioNet uses .edf format)
path_name = '../data/raw/'

# Get all EDF files from the raw directory
edf_files = [f for f in os.listdir(path_name) if f.endswith('.edf')]

# Create a table of features
features = []
labels = []

# Create column names for features
feature_names = ['trial_id','subject_id']
for ch in range(1, 65):  # 64 channels
    feature_names.extend([
        f'ch{ch}_std',
        f'ch{ch}_mean',
        f'ch{ch}_max',
        f'ch{ch}_min'
    ])

# Process each file
raw_data = []
for file_name in edf_files:
    raw = mne.io.read_raw_edf(path_name + file_name, preload=True, encoding='latin1')
    raw.filter(1., 40.)  # Bandpass filtering
    raw_data.append(raw)

    # Assume annotations are already present
    events, event_id = mne.events_from_annotations(raw)

    # Epoching (trial segmentation, e.g., 0–1s after event)
    epochs = mne.Epochs(raw, events, event_id=event_id, tmin=0, tmax=1, baseline=None, preload=True)

    # Extract trial IDs (sample index of each event)
    trial_ids = epochs.events[:, 0]  # Sample index of event onset

    for trial_id, epoch, label in zip(trial_ids, epochs.get_data(), epochs.events[:, -1]):
    # epoch shape: (n_channels, n_times)
        channel_features = [trial_id, file_name.split('.')[0]]  # Extract subject ID from filename
        for channel in epoch:
            channel_features.extend([
                np.std(channel),
                np.mean(channel),
                np.max(channel),
                np.min(channel)
            ])
        features.append(channel_features)
        labels.append(label)

Extracting EDF parameters from /Users/miriamlandau/Documents/predict_hand_imagery/data/raw/S012R04.edf...
EDF file detected
Setting channel info structure...
Creating raw.info structure...
Reading 0 ... 19679  =      0.000 ...   122.994 secs...
Filtering raw data in 1 contiguous segment
Setting up band-pass filter from 1 - 40 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal bandpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Upper passband edge: 40.00 Hz
- Upper transition bandwidth: 10.00 Hz (-6 dB cutoff frequency: 45.00 Hz)
- Filter length: 529 samples (3.306 s)

Used Annotations descriptions: [np.str_('T0'), np.str_('T1'), np.str_('T2')]
Not setting metadata
30 matching events found
No baseline correction applied
0 projection items activated
Using data 

[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done  64 out of  64 | elapsed:    0.0s finished
[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done  64 out of  64 | elapsed:    0.0s finished


Used Annotations descriptions: [np.str_('T0'), np.str_('T1'), np.str_('T2')]
Not setting metadata
30 matching events found
No baseline correction applied
0 projection items activated
Using data from preloaded Raw for 30 events and 161 original time points ...
0 bad epochs dropped
Extracting EDF parameters from /Users/miriamlandau/Documents/predict_hand_imagery/data/raw/S006R04.edf...
EDF file detected
Setting channel info structure...
Creating raw.info structure...
Reading 0 ... 19679  =      0.000 ...   122.994 secs...
Filtering raw data in 1 contiguous segment
Setting up band-pass filter from 1 - 40 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal bandpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Upper passband edge: 40.00 Hz
- Upper transition bandwidt

[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done  64 out of  64 | elapsed:    0.1s finished
[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done  64 out of  64 | elapsed:    0.0s finished


Used Annotations descriptions: [np.str_('T0'), np.str_('T1'), np.str_('T2')]
Not setting metadata
30 matching events found
No baseline correction applied
0 projection items activated
Using data from preloaded Raw for 30 events and 161 original time points ...
0 bad epochs dropped
Extracting EDF parameters from /Users/miriamlandau/Documents/predict_hand_imagery/data/raw/S008R04.edf...
EDF file detected
Setting channel info structure...
Creating raw.info structure...
Reading 0 ... 19679  =      0.000 ...   122.994 secs...
Filtering raw data in 1 contiguous segment
Setting up band-pass filter from 1 - 40 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal bandpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Upper passband edge: 40.00 Hz
- Upper transition bandwidt

[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done  64 out of  64 | elapsed:    0.0s finished
[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done  64 out of  64 | elapsed:    0.0s finished


Extracting EDF parameters from /Users/miriamlandau/Documents/predict_hand_imagery/data/raw/S013R04.edf...
EDF file detected
Setting channel info structure...
Creating raw.info structure...
Reading 0 ... 19679  =      0.000 ...   122.994 secs...
Filtering raw data in 1 contiguous segment
Setting up band-pass filter from 1 - 40 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal bandpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Upper passband edge: 40.00 Hz
- Upper transition bandwidth: 10.00 Hz (-6 dB cutoff frequency: 45.00 Hz)
- Filter length: 529 samples (3.306 s)

Used Annotations descriptions: [np.str_('T0'), np.str_('T1'), np.str_('T2')]
Not setting metadata
30 matching events found
No baseline correction applied
0 projection items activated
Using data 

[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done  64 out of  64 | elapsed:    0.0s finished
[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done  64 out of  64 | elapsed:    0.0s finished


Used Annotations descriptions: [np.str_('T0'), np.str_('T1'), np.str_('T2')]
Not setting metadata
30 matching events found
No baseline correction applied
0 projection items activated
Using data from preloaded Raw for 30 events and 161 original time points ...
0 bad epochs dropped
Extracting EDF parameters from /Users/miriamlandau/Documents/predict_hand_imagery/data/raw/S011R04.edf...
EDF file detected
Setting channel info structure...
Creating raw.info structure...
Reading 0 ... 19679  =      0.000 ...   122.994 secs...
Filtering raw data in 1 contiguous segment
Setting up band-pass filter from 1 - 40 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal bandpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Upper passband edge: 40.00 Hz
- Upper transition bandwidt

[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:    0.1s
[Parallel(n_jobs=1)]: Done  64 out of  64 | elapsed:    0.1s finished
[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done  64 out of  64 | elapsed:    0.0s finished


Setting up band-pass filter from 1 - 40 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal bandpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Upper passband edge: 40.00 Hz
- Upper transition bandwidth: 10.00 Hz (-6 dB cutoff frequency: 45.00 Hz)
- Filter length: 529 samples (3.306 s)

Used Annotations descriptions: [np.str_('T0'), np.str_('T1'), np.str_('T2')]
Not setting metadata
30 matching events found
No baseline correction applied
0 projection items activated
Using data from preloaded Raw for 30 events and 161 original time points ...
0 bad epochs dropped
Extracting EDF parameters from /Users/miriamlandau/Documents/predict_hand_imagery/data/raw/S007R04.edf...
EDF file detected
Setting channel info structure...
Creating raw.info structure...
Reading 0 ..

[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done  64 out of  64 | elapsed:    0.2s finished


Used Annotations descriptions: [np.str_('T0'), np.str_('T1'), np.str_('T2')]
Not setting metadata
30 matching events found
No baseline correction applied
0 projection items activated
Using data from preloaded Raw for 30 events and 161 original time points ...
0 bad epochs dropped
Extracting EDF parameters from /Users/miriamlandau/Documents/predict_hand_imagery/data/raw/S005R04.edf...
EDF file detected
Setting channel info structure...
Creating raw.info structure...
Reading 0 ... 19679  =      0.000 ...   122.994 secs...
Filtering raw data in 1 contiguous segment
Setting up band-pass filter from 1 - 40 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal bandpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Upper passband edge: 40.00 Hz
- Upper transition bandwidt

[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done  64 out of  64 | elapsed:    0.0s finished
[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done  64 out of  64 | elapsed:    0.0s finished


Filtering raw data in 1 contiguous segment
Setting up band-pass filter from 1 - 40 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal bandpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Upper passband edge: 40.00 Hz
- Upper transition bandwidth: 10.00 Hz (-6 dB cutoff frequency: 45.00 Hz)
- Filter length: 529 samples (3.306 s)

Used Annotations descriptions: [np.str_('T0'), np.str_('T1'), np.str_('T2')]
Not setting metadata
30 matching events found
No baseline correction applied
0 projection items activated
Using data from preloaded Raw for 30 events and 161 original time points ...
0 bad epochs dropped


[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done  64 out of  64 | elapsed:    0.1s finished


In [12]:
# Transform dataset into pandas dataframe 
df = pd.DataFrame(features,columns=feature_names)
df['label'] = labels

In [13]:
# Analyze dataframe shape, columns, data types, and sample rows
print(df.info())
display(df.describe())
display(df.head())
print(f" dataframe shape {df.shape}")
df['subject_id'].value_counts()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 420 entries, 0 to 419
Columns: 259 entries, trial_id to label
dtypes: float64(256), int64(2), object(1)
memory usage: 850.0+ KB
None


Unnamed: 0,trial_id,ch1_std,ch1_mean,ch1_max,ch1_min,ch2_std,ch2_mean,ch2_max,ch2_min,ch3_std,...,ch62_min,ch63_std,ch63_mean,ch63_max,ch63_min,ch64_std,ch64_mean,ch64_max,ch64_min,label
count,420.0,420.0,420.0,420.0,420.0,420.0,420.0,420.0,420.0,420.0,...,420.0,420.0,420.0,420.0,420.0,420.0,420.0,420.0,420.0,420.0
mean,9537.714286,3.9e-05,-4.448134e-08,0.000109,-8.5e-05,3.5e-05,-2.560149e-08,9.5e-05,-7.5e-05,3.5e-05,...,-6.4e-05,2.7e-05,3.501155e-07,7.7e-05,-6.3e-05,2.5e-05,1.382e-07,8.6e-05,-5.6e-05,1.745238
std,5699.931291,2e-05,8.439925e-06,6.4e-05,4.1e-05,1.7e-05,6.784997e-06,5.3e-05,3.7e-05,1.7e-05,...,3e-05,1.2e-05,5.929466e-06,4.1e-05,3.1e-05,1.1e-05,4.736894e-06,4.9e-05,2.6e-05,0.825808
min,0.0,1.1e-05,-6.903448e-05,2e-05,-0.000301,1e-05,-2.469015e-05,2e-05,-0.000271,1.1e-05,...,-0.000207,9e-06,-4.070357e-05,2.1e-05,-0.000212,8e-06,-4.666118e-05,2e-05,-0.00016,1.0
25%,4592.0,2.5e-05,-3.685732e-06,6.4e-05,-0.000103,2.1e-05,-3.586249e-06,5.3e-05,-9.4e-05,2.1e-05,...,-8.5e-05,1.7e-05,-1.95049e-06,4.6e-05,-8.2e-05,1.6e-05,-1.988964e-06,4.4e-05,-7.3e-05,1.0
50%,9568.0,3.6e-05,-3.531032e-08,9.6e-05,-7.6e-05,3.1e-05,1.376996e-07,8.4e-05,-6.5e-05,3.3e-05,...,-5.8e-05,2.5e-05,4.503886e-07,6.5e-05,-5.6e-05,2.2e-05,2.129733e-07,6.9e-05,-5.1e-05,1.5
75%,14432.0,4.8e-05,3.470348e-06,0.000135,-5.6e-05,4.4e-05,3.161087e-06,0.000122,-4.7e-05,4.5e-05,...,-3.9e-05,3.6e-05,2.750875e-06,0.000101,-3.8e-05,3.4e-05,2.059518e-06,0.000125,-3.7e-05,2.0
max,19264.0,0.00014,3.992565e-05,0.00043,-2.3e-05,0.000125,3.640876e-05,0.000324,-1.7e-05,0.000112,...,-2e-05,6.6e-05,3.207527e-05,0.000192,-1.8e-05,6.5e-05,2.331532e-05,0.000217,-1.6e-05,3.0


Unnamed: 0,trial_id,subject_id,ch1_std,ch1_mean,ch1_max,ch1_min,ch2_std,ch2_mean,ch2_max,ch2_min,...,ch62_min,ch63_std,ch63_mean,ch63_max,ch63_min,ch64_std,ch64_mean,ch64_max,ch64_min,label
0,0,S012R04,2.8e-05,-1e-05,5.9e-05,-0.000111,2.8e-05,-6e-06,6.2e-05,-9.3e-05,...,-6.3e-05,2.1e-05,-4.080704e-06,6.7e-05,-5.6e-05,1.9e-05,-6.62698e-07,5.6e-05,-5.3e-05,1
1,656,S012R04,3.3e-05,6e-06,0.000111,-7.9e-05,2.2e-05,3e-06,5.7e-05,-4.7e-05,...,-3.6e-05,2e-05,3.332963e-06,4.5e-05,-4.3e-05,1.9e-05,3.642887e-06,4.7e-05,-4e-05,2
2,1312,S012R04,2.6e-05,-3e-06,5.8e-05,-7.7e-05,1.8e-05,-4e-06,4.9e-05,-5.2e-05,...,-4.7e-05,1.8e-05,-1.986943e-06,3.6e-05,-4.3e-05,1.7e-05,-2.827891e-06,3.7e-05,-4.3e-05,1
3,1968,S012R04,2.6e-05,-4e-06,5.6e-05,-7.9e-05,2.7e-05,-2e-06,6.3e-05,-6.8e-05,...,-6.2e-05,2.4e-05,1.415632e-06,6.4e-05,-5.6e-05,2.5e-05,2.380007e-06,6.2e-05,-5.8e-05,3
4,2624,S012R04,3e-05,5e-06,0.000105,-5.6e-05,2.4e-05,3e-06,6.6e-05,-4.6e-05,...,-5.5e-05,2.1e-05,4.673331e-07,4.7e-05,-5.3e-05,1.8e-05,8.674011e-07,4.9e-05,-4.1e-05,1


 dataframe shape (420, 259)


subject_id
S012R04    30
S010R04    30
S014R04    30
S006R04    30
S004R04    30
S008R04    30
S013R04    30
S001R04    30
S003R04    30
S011R04    30
S015R04    30
S007R04    30
S005R04    30
S009R04    30
Name: count, dtype: int64

In [15]:
#Save dataframe to csv in data folder
df.to_csv('../data/interim/eeg_motor_imagery.csv',index=False)


# Key Takeaways

**Dimension**
- 30 rows per subject X 259 columns
- Features are all float

**Column Naming Convention**
- Features follow the pattern: `ch{channel_number}_{statistic}`
- Example: `ch1_std`, `ch1_mean`, `ch1_max`, `ch1_min`

**Key Identifiers**
- `subject_id`: File name that uniquely identifies the trial subject
- `trial_id`: Timestamp indicating when the trial occurred

**Note:**
Dataset is shorter than wide, more trials should be added. Update: more files have been added to balance the dataset