## Project Overview

In this notebook, we are processing physiological recordings from a family therapy session. The goal is to organize, preprocess, and segment the physiological data for further analysis.

### What are we doing?
We are loading raw physiological data for each participant in a session, processing the data, segmenting it into epochs, and saving the processed session for future use.

### Why are we doing it?
Segmenting and processing physiological data enables us to analyze patterns and interactions between participants during therapy sessions. This can provide insights into physiological synchrony, emotional responses, and other relevant metrics.

### How are we doing it?
1. **Setup:** Import necessary modules and helper functions.
2. **Load COI Structure:** Read the configuration file that maps data files to each participant.
3. **Session Initialization:** Create a `Session` object and add `Subject` and `PhysioRecording` objects for each participant.
4. **Data Loading:** Load raw physiological data for each participant.
5. **Processing:** Apply preprocessing steps to clean and standardize the data.
6. **Epoching:** Segment the processed data into fixed-duration epochs.
7. **Saving:** Save the processed session object for future analysis.

### Expected Results
At the end of this notebook, we expect to have a serialized session object containing preprocessed and segmented physiological data for all participants, ready for downstream analysis.

In [None]:
import sys
sys.path.append('../src/')

import warnings
#warnings.filterwarnings('ignore')  # Suppress warnings for cleaner output

import os
from pathlib import Path
import json
import pickle

from subject import Subject
from physio_recording import PhysioRecording
from session import Session
from helpers import *

# Loading sessions structure

In the following cell, we are loading the COI (Case of Interest) structure from a JSON file. This structure maps each participant in the session to their corresponding physiological data file and metadata. The COI structure is essential for organizing and accessing the correct data for each subject in the analysis.

**Example of a COI structure entry:**
```json
{
    "session_code": "fam4_session_2",
    "family": 4,
    "session": 2,
    "sensor": "1723456789_B12CDE",
    "role": "MOTHER",
    "index": 1
}
```

**Explanation of fields:**
- `session_code`: Unique identifier for the session (e.g., "fam4_session_2").
- `family`: Family ID number (e.g., 4).
- `session`: Session number within the family (e.g., 2).
- `sensor`: Identifier for the physiological sensor device (e.g., "1723456789_B12CDE").
- `role`: Role of the participant in the session (e.g., "MOTHER").
- `index`: Index or ID for the participant (e.g., 1).

By loading this structure, we can programmatically link each subject to their data files and relevant metadata for further processing.

In [None]:
coi_structure_pathname = Path("../data/coi_structure.json")

if not coi_structure_pathname.exists():
    raise FileNotFoundError("The COI structure file does not exist")

with open(coi_structure_pathname, "r") as f:
    coi_structure = json.load(f)

# Creating session object

In this section, we are initializing the session and subject objects, and linking each subject to their corresponding physiological data files using the COI structure. This setup ensures that each participant's data is correctly associated with their role in the session, preparing the data for loading, processing, and analysis in the subsequent steps.

In [None]:
FAMILY_ID = 4
SEANCE_ID = 1
SESSION_ID = 0

verbose = True

session = Session(session_id = SESSION_ID, family_id = FAMILY_ID, seance_id = SEANCE_ID, verbose = verbose)

subjects: [Subject] = []

FILEPATH_DANCER = Path("../data/dancer.xlsx")
FILEPATH_RESIDENT = Path("../data/resident.xlsx")

subjects.append(Subject(id = 0, role_id = 0, role_desc = "DANCER"))
subjects.append(Subject(id = 1, role_id = 1, role_desc = "RESIDENT"))

for subject in subjects:
    physio_recording = PhysioRecording(subject_id = subject.id, seance_id = SEANCE_ID, session_id = SESSION_ID)
    physio_filepath = FILEPATH_DANCER if subject.role_id == 0 else FILEPATH_RESIDENT
    physio_recording.set_physio_filepath(physio_filepath = physio_filepath)
    session.add_physio_recording(physio_recording = physio_recording)

# Loading physiological data

In this cell, we are loading the raw physiological data for each participant in the session. By invoking `session.load_physio_recordings_data()`, we ensure that the physiological recordings linked to each subject are read from their respective files and stored within the session object. This step is crucial for making the data available for subsequent preprocessing, segmentation, and analysis.

In [None]:
session.load_physio_recordings_data()

# Data processing

In this cell, we are preprocessing the physiological recordings for each participant in the session. By calling `session.process_physio_recordings()`, we apply a series of cleaning and standardization steps to the raw data, such as filtering, artifact removal, and normalization. This preprocessing ensures that the physiological signals are of high quality and suitable for further analysis, such as segmentation and feature extraction.

In [None]:
session.process_physio_recordings()

# Segmenting data

In the following cell, we are segmenting the preprocessed physiological recordings into fixed-duration epochs. By calling `session.epoch_physio_recordings(method = "fixed_duration", duration = 30, overlap = 0)`, we divide each participant's physiological data into consecutive, non-overlapping segments of 30 seconds each. This segmentation facilitates time-based analysis of physiological responses and enables comparison of signal features across consistent time intervals.

In addition to fixed-duration segmentation, the `session.epoch_physio_recordings()` method supports other epoching strategies:

- **Fixed Number of Epochs:**  
    By specifying `method = "fixed_number"` and providing `n_epochs`, the data for each participant is divided into a set number of equally sized epochs. This is useful when you want to compare segments of equal count across recordings, regardless of their absolute duration.

    ```python
    session.epoch_physio_recordings(method = "fixed_number", n_epochs = 30)
    ```

- **Sliding Window Epochs:**  
    Using `method = "sliding_window"` with parameters like `duration` (window length) and `step` (stride), the data is segmented into overlapping windows. This approach allows for more granular, continuous analysis of physiological changes over time.

    ```python
    session.epoch_physio_recordings(method = "sliding_window", duration = 30, step = 5)
    ```

Each method provides a different perspective on the data, enabling flexible analysis tailored to specific research questions.

In [None]:
session.epoch_physio_recordings(method = "fixed_duration", duration = 30, overlap = 0)
#session.epoch_physio_recordings(method = "fixed_number", n_epochs = 30)
#session.epoch_physio_recordings(method = "sliding_window", duration = 30, step = 5)

# Save processed session in a file

This cell saves the processed session object to a pickle file. It ensures the output directory exists, then serializes the `session` object for future use or analysis.

In [None]:
session_pathname = Path(f"../data/output/processed/session_{SESSION_ID}_family_{FAMILY_ID}_seance_{SEANCE_ID}.pkl")
if not session_pathname.parent.exists():
    os.makedirs(session_pathname.parent)

with open(session_pathname, "wb") as f:
    pickle.dump(session, f)
print(f"Session saved to {session_pathname}")