## Project Overview

In this notebook, we are processing physiological recordings from a family therapy session. The goal is to organize, preprocess, and segment the physiological data for further analysis.

### What are we doing?
We are loading raw physiological data for each participant in a session, processing the data, segmenting it into epochs, and saving the processed session for future use.

### Why are we doing it?
Segmenting and processing physiological data enables us to analyze patterns and interactions between participants during therapy sessions. This can provide insights into physiological synchrony, emotional responses, and other relevant metrics.

### How are we doing it?
1. **Setup:** Import necessary modules and helper functions.
2. **Load COI Structure:** Read the configuration file that maps data files to each participant.
3. **Session Initialization:** Create a `Session` object and add `Subject` and `PhysioRecording` objects for each participant.
4. **Data Loading:** Load raw physiological data for each participant.
5. **Processing:** Apply preprocessing steps to clean and standardize the data.
6. **Epoching:** Segment the processed data into fixed-duration epochs.
7. **Saving:** Save the processed session object for future analysis.

### Expected Results
At the end of this notebook, we expect to have a serialized session object containing preprocessed and segmented physiological data for all participants, ready for downstream analysis.

In [1]:
import sys
sys.path.append('../src/')

import warnings
#warnings.filterwarnings('ignore')  # Suppress warnings for cleaner output

import os
from pathlib import Path
import json
import pickle

from subject import Subject
from physio_recording import PhysioRecording
from session import Session
from helpers import *

# Loading sessions structure

In the following cell, we are loading the COI (Case of Interest) structure from a JSON file. This structure maps each participant in the session to their corresponding physiological data file and metadata. The COI structure is essential for organizing and accessing the correct data for each subject in the analysis.

**Example of a COI structure entry:**
```json
{
    "session_code": "fam4_session_2",
    "family": 4,
    "session": 2,
    "sensor": "1723456789_B12CDE",
    "role": "MOTHER",
    "index": 1
}
```

**Explanation of fields:**
- `session_code`: Unique identifier for the session (e.g., "fam4_session_2").
- `family`: Family ID number (e.g., 4).
- `session`: Session number within the family (e.g., 2).
- `sensor`: Identifier for the physiological sensor device (e.g., "1723456789_B12CDE").
- `role`: Role of the participant in the session (e.g., "MOTHER").
- `index`: Index or ID for the participant (e.g., 1).

By loading this structure, we can programmatically link each subject to their data files and relevant metadata for further processing.

In [2]:
coi_structure_pathname = Path("../data/coi_structure.json")

if not coi_structure_pathname.exists():
    raise FileNotFoundError("The COI structure file does not exist")

with open(coi_structure_pathname, "r") as f:
    coi_structure = json.load(f)

# Creating session object

In this section, we are initializing the session and subject objects, and linking each subject to their corresponding physiological data files using the COI structure. This setup ensures that each participant's data is correctly associated with their role in the session, preparing the data for loading, processing, and analysis in the subsequent steps.

In [3]:
FAMILY_ID = 4
SEANCE_ID = 1
SESSION_ID = 0

verbose = True

session = Session(session_id = SESSION_ID, family_id = FAMILY_ID, seance_id = SEANCE_ID, verbose = verbose)

subjects: [Subject] = []

subjects.append(Subject(id = 0, role_id = 0, role_desc = "THERAPIST"))
subjects.append(Subject(id = 1, role_id = 1, role_desc = "MOTHER"))
subjects.append(Subject(id = 2, role_id = 2, role_desc = "CHILD"))

for subject in subjects:
    physio_recording = PhysioRecording(subject_id = subject.id, seance_id = SEANCE_ID, session_id = SESSION_ID)
    physio_filepath = extract_raw_pathname_from_coi_structure(
        coi_structure = coi_structure,
        FAMILY_ID = FAMILY_ID,
        SEANCE_ID = SEANCE_ID,
        ROLE_ID = subject.role_id)
    physio_recording.set_physio_filepath(physio_filepath = physio_filepath)
    session.add_physio_recording(physio_recording = physio_recording)

Initializing session 0 for family 4, seance 1...
Session 0 initialized with family 4 and seance 1.
	Physio file path set to ../data/raw/segmented_physio/fam4_FB_session_1/1713204887_A00275_segmented.xlsx
Adding physio recording for subject 0 to session 0, family 4, seance 1...
Physio recording for subject 0 added to session 0.
	Physio file path set to ../data/raw/segmented_physio/fam4_FB_session_1/1713204885_A04ADA_segmented.xlsx
Adding physio recording for subject 1 to session 0, family 4, seance 1...
Physio recording for subject 1 added to session 0.
	Physio file path set to ../data/raw/segmented_physio/fam4_FB_session_1/1713204884_A04D21_segmented.xlsx
Adding physio recording for subject 2 to session 0, family 4, seance 1...
Physio recording for subject 2 added to session 0.


# Loading physiological data

In this cell, we are loading the raw physiological data for each participant in the session. By invoking `session.load_physio_recordings_data()`, we ensure that the physiological recordings linked to each subject are read from their respective files and stored within the session object. This step is crucial for making the data available for subsequent preprocessing, segmentation, and analysis.

In [4]:
session.load_physio_recordings_data()

Loading physio recordings for session 0, family 4, seance 1...
	Loading raw data for session 0 and subject 0
		Loaded resting state Temperature data from TEMP_rs with 239 samples at 4 Hz.
		Loaded session Temperature data from TEMP_session with 19891 samples at 4 Hz.
		Loaded resting state HR data from HR_rs with 59 samples at 1 Hz.
		Loaded session HR data from HR_session with 4974 samples at 1 Hz.
		Loaded resting state EDA data from EDA_rs with 239 samples at 4 Hz.
		Loaded session EDA data from EDA_session with 19891 samples at 4 Hz.
		Loaded resting state BVP data from BVP_rs with 3839 samples at 64 Hz.
		Loaded session BVP data from BVP_session with 318394 samples at 64 Hz.
	Loading raw data for session 0 and subject 1
		Loaded resting state Temperature data from TEMP_rs with 239 samples at 4 Hz.
		Loaded session Temperature data from TEMP_session with 19867 samples at 4 Hz.
		Loaded resting state HR data from HR_rs with 59 samples at 1 Hz.
		Loaded session HR data from HR_sessio

# Data processing

In this cell, we are preprocessing the physiological recordings for each participant in the session. By calling `session.process_physio_recordings()`, we apply a series of cleaning and standardization steps to the raw data, such as filtering, artifact removal, and normalization. This preprocessing ensures that the physiological signals are of high quality and suitable for further analysis, such as segmentation and feature extraction.

In [5]:
session.process_physio_recordings()

Processing physio recordings for session 0, family 4, seance 1...
	Processing physio data for subject 0 and session 0...
		Processing EDA data... 
		Cleaned EDA resting state data with 239 samples at 4 Hz.
		Cleaned EDA session data with 19891 samples at 4 Hz.
		Resampled EDA resting state data to 64 Hz with 3824 samples.
		Resampled EDA session data to 64 Hz with 318256 samples.


  warn(


		Processed EDA resting state data with 239 samples at 4 Hz.
		Processed EDA session data with 19891 samples at 4 Hz.
		Processing BVP data...
		Computing RR intervals from BVP data...
		Extracting RR intervals from BVP data with sampling rate 64 Hz...
		Splitting large RR interval 2281.25 ms into 2 parts of 1140.62 ms each.
		Detected 1 anomalies (1.27% of total RR intervals).
		Splitting large RR interval 2062.5 ms into 3 parts of 687.50 ms each.
		Splitting large RR interval 2281.25 ms into 2 parts of 1140.62 ms each.
		Splitting large RR interval 7984.375 ms into 7 parts of 1140.62 ms each.
		Splitting large RR interval 3593.75 ms into 2 parts of 1796.88 ms each.
		Splitting large RR interval 2890.625 ms into 2 parts of 1445.31 ms each.
		Splitting large RR interval 2296.875 ms into 2 parts of 1148.44 ms each.
		Splitting large RR interval 2906.25 ms into 2 parts of 1453.12 ms each.
		Splitting large RR interval 2250.0 ms into 2 parts of 1125.00 ms each.
		Splitting large RR interv

  warn(


		Processed EDA resting state data with 239 samples at 4 Hz.
		Processed EDA session data with 19865 samples at 4 Hz.
		Processing BVP data...
		Computing RR intervals from BVP data...
		Extracting RR intervals from BVP data with sampling rate 64 Hz...
		Splitting large RR interval 3656.25 ms into 3 parts of 1218.75 ms each.
		Detected 1 anomalies (1.56% of total RR intervals).
		Splitting large RR interval 2531.25 ms into 2 parts of 1265.62 ms each.
		Splitting large RR interval 2218.75 ms into 3 parts of 739.58 ms each.
		Splitting large RR interval 3812.5 ms into 6 parts of 635.42 ms each.
		Splitting large RR interval 7234.375 ms into 11 parts of 657.67 ms each.
		Splitting large RR interval 9562.5 ms into 14 parts of 683.04 ms each.
		Splitting large RR interval 4312.5 ms into 6 parts of 718.75 ms each.
		Splitting large RR interval 2140.625 ms into 2 parts of 1070.31 ms each.
		Splitting large RR interval 7156.25 ms into 7 parts of 1022.32 ms each.
		Splitting large RR interval 3

  warn(


		Processed EDA resting state data with 239 samples at 4 Hz.
		Processed EDA session data with 19879 samples at 4 Hz.
		Processing BVP data...
		Computing RR intervals from BVP data...
		Extracting RR intervals from BVP data with sampling rate 64 Hz...
		Detected 0 anomalies (0.00% of total RR intervals).
		Detected 0 anomalies (0.00% of total RR intervals).
		Detected 0 anomalies (0.00% of total RR intervals).
		Detected 0 anomalies (0.00% of total RR intervals).
		Detected 0 anomalies (0.00% of total RR intervals).
		Detected 0 anomalies (0.00% of total RR intervals).
		Detected 0 anomalies (0.00% of total RR intervals).
		Detected 0 anomalies (0.00% of total RR intervals).
		Detected 0 anomalies (0.00% of total RR intervals).
		Detected 0 anomalies (0.00% of total RR intervals).
		Detected 0 anomalies (0.00% of total RR intervals).
		Detected 0 anomalies (0.00% of total RR intervals).
		Detected 0 anomalies (0.00% of total RR intervals).
		Detected 0 anomalies (0.00% of total RR int

# Segmenting data

In the following cell, we are segmenting the preprocessed physiological recordings into fixed-duration epochs. By calling `session.epoch_physio_recordings(method = "fixed_duration", duration = 30, overlap = 0)`, we divide each participant's physiological data into consecutive, non-overlapping segments of 30 seconds each. This segmentation facilitates time-based analysis of physiological responses and enables comparison of signal features across consistent time intervals.

In addition to fixed-duration segmentation, the `session.epoch_physio_recordings()` method supports other epoching strategies:

- **Fixed Number of Epochs:**  
    By specifying `method = "fixed_number"` and providing `n_epochs`, the data for each participant is divided into a set number of equally sized epochs. This is useful when you want to compare segments of equal count across recordings, regardless of their absolute duration.

    ```python
    session.epoch_physio_recordings(method = "fixed_number", n_epochs = 30)
    ```

- **Sliding Window Epochs:**  
    Using `method = "sliding_window"` with parameters like `duration` (window length) and `step` (stride), the data is segmented into overlapping windows. This approach allows for more granular, continuous analysis of physiological changes over time.

    ```python
    session.epoch_physio_recordings(method = "sliding_window", duration = 30, step = 5)
    ```

Each method provides a different perspective on the data, enabling flexible analysis tailored to specific research questions.

In [6]:
session.epoch_physio_recordings(method = "fixed_duration", duration = 30, overlap = 0)
#session.epoch_physio_recordings(method = "fixed_number", n_epochs = 30)
#session.epoch_physio_recordings(method = "sliding_window", duration = 30, step = 5)

Epoching physio recordings for session 0, family 4, seance 1...
	Epoching signal for subject 0 and session 0 using method 'fixed_duration'...
		Epoching EDA metric for 'EDA_Tonic' using fixed_duration...
		Epoching EDA time series for 'EDA_Tonic' with duration 30s and overlap 0s.
		Created 166 epochs of 30s from EDA 'EDA_Tonic' data.
		Epoching complete for EDA metric 'EDA_Tonic' using fixed_duration method.
		Epoching EDA metric for 'EDA_Phasic' using fixed_duration...
		Epoching EDA time series for 'EDA_Phasic' with duration 30s and overlap 0s.
		Created 166 epochs of 30s from EDA 'EDA_Phasic' data.
		Epoching complete for EDA metric 'EDA_Phasic' using fixed_duration method.
		Epoching BVP metric for 'RR_Intervals' using fixed_duration...
		Epoching BVP interval series for 'RR_Intervals' with fixed duration 30s...
		Created 165 epochs of 30s for BVP 'RR_Intervals' data.
		Epoching complete for BVP metric 'RR_Intervals' using fixed_duration method.
		Epoching HEARTRATE metric for 'Hea

# Save processed session in a file

This cell saves the processed session object to a pickle file. It ensures the output directory exists, then serializes the `session` object for future use or analysis.

In [7]:
session_pathname = Path(f"../data/output/processed/session_{SESSION_ID}_family_{FAMILY_ID}_seance_{SEANCE_ID}.pkl")
if not session_pathname.parent.exists():
    os.makedirs(session_pathname.parent)

with open(session_pathname, "wb") as f:
    pickle.dump(session, f)
print(f"Session saved to {session_pathname}")

Session saved to ../data/output/processed/session_0_family_4_seance_1.pkl
