# Create SPES response dataset
This notebook creates a dataset suitable for subsequent PyTorch analyses. SPES responses are extracted for each patient, and then NumPy files are created.

### Extract mean and standard deviation for each unique stimulation/response pair, across all patients

In [1]:
from create_dataset import *
from tqdm import tqdm

mne.set_log_level('WARNING')

# Set root directory for BIDS dataset
bids_root = '/Users/jamienorris/ds004080'

# Define the BIDSDataLoader and StimulationDataProcessor
bids_loader = BIDSDataLoader(bids_root=bids_root)
subjects = bids_loader.load_subjects()
stim_processor = StimulationDataProcessor(tmin=0.009, tmax=1)

In [2]:
# Create an empty list to store the response data
response_df = []

# Iterate over each subject and process the data, adding to the list
for subject in tqdm(subjects):

    # Load the session data
    session_data = bids_loader.load_session_data(subject)

    # Create an empty list to store the response data for each run for current patient
    patient_response_df = []
    
    # Iterate over each run
    for run in session_data['runs']:

        # Load the run data
        run_data = bids_loader.load_run_data(subject, run)

        # Process the run data and append to the list
        patient_response_df.append(stim_processor.process_run_data(run_data['eeg'], run_data['events_df'], run_data['channels_df'], subject))
    
    # Concatenate the response data across runs
    patient_response_df = pd.concat(patient_response_df)
    
    # Group by recording, stim_1, stim_2 and apply the combine_stats function
    grouped = patient_response_df.groupby(['recording', 'stim_1', 'stim_2'])

    # Combine data across runs
    patient_response_df = pd.concat([pd.concat(combine_stats(group)) for _, group in grouped])

    # Add the subject to the dataframe
    response_df.append(patient_response_df)

# Concatenate the response data across subjects
response_df = pd.concat(response_df)
# response_df = response_df.astype({col: 'float32' for col in response_df.columns if response_df[col].dtype == 'float64'})

 56%|█████▌    | 20/36 [10:16<06:27, 24.24s/it]  

ccepAgeUMCU47 538 76


 86%|████████▌ | 31/36 [15:18<02:56, 35.33s/it]

ccepAgeUMCU61 560 470


 89%|████████▉ | 32/36 [15:46<02:13, 33.30s/it]

ccepAgeUMCU62 629 479


100%|██████████| 36/36 [17:28<00:00, 29.13s/it]


Note: the patients listed above are those for whom some trials are outside of the time frame. The first number is the number of trials, the second is the number of trials within the time frame of the file.

### Save or load the dataframe (worth saving to save your time in future!)

In [4]:
response_df_filepath = '../response_df.csv'
response_df.to_csv(response_df_filepath)
# response_df = pd.read_csv(response_df_filepath)

### Create dataset(s)
First, create the relevant folders

In [6]:
!mkdir -p ../data/mean
!mkdir -p ../data/std
!mkdir -p ../data/main

Initialise DatasetCreator with the dataframe created above, then process each patient, creating the relevant files.

In [7]:
datasetcreator = DatasetCreator(response_df)

# Assuming you have loaded run_data using BIDSDataLoader
for subject in tqdm(subjects):
    session_data = bids_loader.load_session_data(subject)
    datasetcreator.process_for_analysis(subject, session_data['electrodes_tsv'])

100%|██████████| 36/36 [02:50<00:00,  4.74s/it]
