# Demo 1: Loading and Preparing a Baseline Experiment

**Goal:** This notebook demonstrates the fundamental workflow for loading raw experimental data into our analysis pipeline. We'll take a 'Baseline' experiment, load each session into a special `Session` object, synchronize the photometry and behavioral data, and prepare it for analysis.

In [1]:
import sys
sys.path.append('../../../')

First we load in the library. This is mostly boilerplate, and a simple import fiberphotometry will suffice in most cases.

In [2]:
import warnings
import pytest
from pathlib import Path
from collections import defaultdict

from fiberphotometry.config import PLOTTING_CONFIG
from fiberphotometry.data.data_loading import DataContainer, load_all_sessions
from fiberphotometry.data.session_loading import populate_containers
from fiberphotometry.data.syncer import sync_session
from fiberphotometry.data.timepoint_processing import create_event_idxs_container_for_sessions
from fiberphotometry.processing.plotting_setup import PlottingSetup
from fiberphotometry.processing.signal_info_setup import assign_sessions_signal_info

Define the path from which our trial is located

In [3]:
baseline_path = Path("/Users/fsp585/Desktop/GetherLabCode/FiberphotometryCode/src/Baseline")

In [4]:
import os

# Let's look at the raw files for the first trial to see what we're starting with.
try:
    first_dir = next(baseline_path.iterdir())
    print(f"Raw data files in: {first_dir.name}")
    for fname in sorted(os.listdir(first_dir)):
        print(f"  - {fname}")
except StopIteration:
    print("No directories found in the baseline path.")

Raw data files in: T20_31.33.43.45
  - DigInput_A2023-02-09T13_48_22.csv
  - DigInput_B2023-02-09T13_48_22.csv
  - DigInput_C2023-02-09T13_48_22.csv
  - DigInput_D2023-02-09T13_48_22.csv
  - RAW_A.csv
  - RAW_B.csv
  - RAW_C.csv
  - RAW_D.csv
  - T20_trial_guide.csv
  - T20_trial_guide.xlsx
  - photometry_data_combined_SetupA_2023-02-09T13_48_23.csv
  - photometry_data_combined_SetupB_2023-02-09T13_48_23.csv


Here you see an example of what data our folder contains. This includes syncing files (DigInput*.csv), CPT records (RAW*.csv), photometry data (photometry_data_combined*.csv) and the manually created trial guides (T*trial_guides.csv). 

### Creating Session Objects

The `load_all_sessions` function scans the directories and creates a list of `Session` objects. Each object acts as a container for all the data and metadata related to a single recording session, making it easy to manage. Assuming everything is structured according to the data collection standard, this can be simply run as follows.

In [5]:
sessions = load_all_sessions(
            baseline_dir=str(baseline_path),
            session_type="cpt",
            first_n_dirs=10,
            remove_bad_signal_sessions=True
        );

Processing trial directories: 100%|██████████| 10/10 [00:00<00:00, 32.18it/s]


In [6]:
for session in sessions:
    print('mouse_id', session.mouse_id, 'trial_id', session.trial_id)

mouse_id 23 trial_id T1_23.25.29.e
mouse_id 25 trial_id T2_23.25.29.e_2
mouse_id 23 trial_id T3_23.25.29.e_4
mouse_id 25 trial_id T3_23.25.29.e_4
mouse_id 31 trial_id T4_31.33.35.37
mouse_id 35 trial_id T4_31.33.35.37
mouse_id 37 trial_id T4_31.33.35.37
mouse_id 35 trial_id T5_31.33.35.37_3
mouse_id 37 trial_id T5_31.33.35.37_3
mouse_id 33 trial_id T6_31.33.35.37_4
mouse_id 39 trial_id T7_39.e.43.45
mouse_id 45 trial_id T7_39.e.43.45
mouse_id 39 trial_id T8_39.41.43.45_2
mouse_id 43 trial_id T8_39.41.43.45_2
mouse_id 47 trial_id T10_47.49.e.53_2
mouse_id 49 trial_id T10_47.49.e.53_2


The function automatically picks up all columns in the session guide (with some reserved exceptions) making it easy to add new attributes like genotype, weight or age. Below we print all the attributes that the sessions have picked up from the trial guide.

In [7]:
obj = sessions[0]

for name in dir(obj):
    if name.startswith("__"):
        continue
    value = getattr(obj, name)
    if callable(value) or name == 'session_guide':
        continue
    print(f"{name}: {value}")


brain_regions: [('DLS', 'left', 'G'), ('DMS', 'right', 'G')]
chamber_id: A
dfs: <fiberphotometry.data.data_loading.DataContainer object at 0x15109e7d0>
dig_input: 0
drug_infos: []
fiber_to_region: {'0': ('DLS', 'left', 'G'), '2': ('DMS', 'right', 'G')}
genotype: WT
mouse_id: 23
notes: nan
remove_bad_signal_sessions: True
session_type: cpt
setup_id: A
task: CPT_stage4_baseline
trial_dir: /Users/fsp585/Desktop/GetherLabCode/FiberphotometryCode/src/Baseline/T1_23.25.29.e
trial_id: T1_23.25.29.e


In [8]:
# Now we have a list of session objects. Let's see what was loaded.
print("Loaded sessions and their corresponding mouse IDs:")
for s in sessions:
    # This info is automatically parsed from the directory and trial guide files.
    print(f"Directory: {s.trial_dir}, Mouse ID: {s.mouse_id}")

Loaded sessions and their corresponding mouse IDs:
Directory: /Users/fsp585/Desktop/GetherLabCode/FiberphotometryCode/src/Baseline/T1_23.25.29.e, Mouse ID: 23
Directory: /Users/fsp585/Desktop/GetherLabCode/FiberphotometryCode/src/Baseline/T2_23.25.29.e_2, Mouse ID: 25
Directory: /Users/fsp585/Desktop/GetherLabCode/FiberphotometryCode/src/Baseline/T3_23.25.29.e_4, Mouse ID: 23
Directory: /Users/fsp585/Desktop/GetherLabCode/FiberphotometryCode/src/Baseline/T3_23.25.29.e_4, Mouse ID: 25
Directory: /Users/fsp585/Desktop/GetherLabCode/FiberphotometryCode/src/Baseline/T4_31.33.35.37, Mouse ID: 31
Directory: /Users/fsp585/Desktop/GetherLabCode/FiberphotometryCode/src/Baseline/T4_31.33.35.37, Mouse ID: 35
Directory: /Users/fsp585/Desktop/GetherLabCode/FiberphotometryCode/src/Baseline/T4_31.33.35.37, Mouse ID: 37
Directory: /Users/fsp585/Desktop/GetherLabCode/FiberphotometryCode/src/Baseline/T5_31.33.35.37_3, Mouse ID: 35
Directory: /Users/fsp585/Desktop/GetherLabCode/FiberphotometryCode/src/Ba

## Step 2: Populating the Session Objects

Our `Session` objects currently hold metadata, but the actual experimental data (from the `.csv` files) isn't loaded into memory yet.

The `populate_containers` function handles this critical step. It finds the raw data files on the disk and loads them into pandas DataFrames, "filling" each session with the data it needs for analysis.

The next cell will demonstrate this by showing the row counts for each data table change from **"Not loaded"** to being filled with thousands of rows.

In [9]:
from fiberphotometry.data.data_loading import DataContainer # Required for the reset

# Use a single session for the demonstration
s0_for_populate_demo = sessions[0]
# Define the data tables we expect the function to load
expected_dfs = ['raw', 'ttl', 'phot_415', 'phot_470']

# ---
# 1. The "Before" State
# ---
print("## 1. Row Counts Before 'populate_containers' (Simulated Fresh Session)\n")

# To ensure the demo works correctly, we'll simulate a fresh session
# by replacing its container with a new, empty one.
s0_for_populate_demo.dfs = DataContainer()

print("Number of rows in each data table:")
for name in expected_dfs:
    # Before populating, the data tables don't exist in the container yet
    try:
        num_rows = len(s0_for_populate_demo.dfs.data[name])
    except KeyError:
        num_rows = "Not loaded"
    print(f"- {name}: {num_rows}")


# ---
# 2. Run the populate_containers function
# ---
populate_containers([s0_for_populate_demo])


# ---
# 3. The "After" State
# ---
print("\n## 2. Row Counts After 'populate_containers'\n")
print("Number of rows in each data table:")
for name in expected_dfs:
    try:
        num_rows = len(s0_for_populate_demo.dfs.data[name])
    except KeyError:
        num_rows = "Not loaded"
    print(f"- {name}: {num_rows}")


# ---
# 4. The Delta
# ---
print("\n## 3. The Delta (The Data We Filled In) ✅\n")
print("The function read the raw data files from disk and filled the data tables with rows.")


# Important: Run on all sessions now to prepare for the rest of the notebook.
populate_containers(sessions)

## 1. Row Counts Before 'populate_containers' (Simulated Fresh Session)

Number of rows in each data table:
- raw: Not loaded
- ttl: Not loaded
- phot_415: Not loaded
- phot_470: Not loaded

## 2. Row Counts After 'populate_containers'

Number of rows in each data table:
- raw: 8802
- ttl: 66
- phot_415: 103007
- phot_470: 103008

## 3. The Delta (The Data We Filled In) ✅

The function read the raw data files from disk and filled the data tables with rows.


## Step 3: Synchronizing Timestamps

Now that our data is loaded, we face the next challenge: the clocks from the photometry camera and the behavioral computer were not started at the exact same time.

The `sync_session` function solves this by finding a common reference point in the data—the "Set Blank Images" event—and creating new, perfectly aligned timestamp columns.

The cell below will show this process in action. You'll see it adds new columns like `sec_from_trial_start` to our data tables, making them ready for analysis.

In [10]:
import pandas as pd

# Use a single session for the demonstration
s0_for_sync_demo = sessions[0]

# 1. Populate containers to load the data and capture the "before" state
populate_containers([s0_for_sync_demo])
cols_phot_before = set(s0_for_sync_demo.dfs.data['phot_470'].columns)
cols_raw_before = set(s0_for_sync_demo.dfs.data['raw'].columns)

print("## 1. Columns Before Syncing\n")
print(f"Photometry 'phot_470': {list(cols_phot_before)}")
print(f"Behavioral 'raw':      {list(cols_raw_before)}")

# ---
# 2. Run the synchronization function
sync_session(s0_for_sync_demo)
# ---

# 3. Capture the "after" state
cols_phot_after = set(s0_for_sync_demo.dfs.data['phot_470'].columns)
cols_raw_after = set(s0_for_sync_demo.dfs.data['raw'].columns)

print("\n## 2. Columns After Syncing\n")
print(f"Photometry 'phot_470': {list(cols_phot_after)}")
print(f"Behavioral 'raw':      {list(cols_raw_after)}")

# 4. Calculate and show the delta
delta_phot = cols_phot_after - cols_phot_before
delta_raw = cols_raw_after - cols_raw_before

print("\n## 3. The Delta (Our New Synced Columns) ✅\n")
print(f"New columns in 'phot_470': {list(delta_phot)}")
print(f"New columns in 'raw':      {list(delta_raw)}")


# Important: Run on all sessions now to prepare for the rest of the notebook
populate_containers(sessions)
for session in sessions:
    sync_session(session)

## 1. Columns Before Syncing

Photometry 'phot_470': ['LedState', 'FrameCounter', 'signal_0', 'signal_1', 'SystemTimestamp']
Behavioral 'raw':      ['Evnt_ID', 'Arg2_Value', 'Arg3_Name', 'Num_Args', 'Item_Name', 'Evnt_Time', 'Arg1_Value', 'Evnt_Name', 'Alias_Name', 'Group_ID', 'Arg3_Value', 'Arg4_Name', 'Arg4_Value', 'Arg5_Value', 'Arg1_Name', 'Arg2_Name', 'Arg5_Name']

## 2. Columns After Syncing

Photometry 'phot_470': ['sec_from_zero', 'LedState', 'FrameCounter', 'signal_0', 'signal_1', 'SystemTimestamp', 'sec_from_trial_start']
Behavioral 'raw':      ['Evnt_ID', 'Arg3_Name', 'Evnt_Time', 'Arg1_Value', 'Group_ID', 'Arg2_Name', 'Arg5_Name', 'Item_Name', 'Evnt_Name', 'Arg3_Value', 'sec_from_trial_start', 'Arg5_Value', 'sec_from_zero', 'Arg2_Value', 'Num_Args', 'Arg4_Value', 'Alias_Name', 'Arg1_Name', 'Arg4_Name']

## 3. The Delta (Our New Synced Columns) ✅

New columns in 'phot_470': ['sec_from_zero', 'sec_from_trial_start']
New columns in 'raw':      ['sec_from_zero', 'sec_from_trial

## Step 4: Creating an Index of Event Occurrences

While our data is loaded and synced, we still need an efficient way to find every instance of a specific behavior.

The `create_event_idxs_container_for_sessions` function builds a dictionary that points to the exact **row number** for each event of interest in our raw data file.

1.  **Define a Mapping**: The `actions_attr_dict` and `reward_attr_dict` serve as a map, telling the function which raw event names (like `"Correct Rejection"`) should be categorized under a simpler key (like `'cor_reject'`).

2.  **Collect Row Indices**: The function then scans the event log and creates a list of every row index that corresponds to a given key. For example, it will create a list containing the row numbers for every single `'hit'`.

3.  **Store the Index Lists**: These lists are stored in a new container on the session, `event_idxs_container`. This gives us a direct pointer to every occurrence of any behavior we want to analyze. The function also creates special indices, like finding the event that occurred immediately before or after a `'dispimg'` event.

In [11]:
actions_attr_dict = {"Hit": "hit",
                    "Mistake": "mistake", 
                    "Missed Hit": "miss",                    
                    "Correction Trial Correct Rejection": "cor_reject", 
                    "Correct Rejection": "cor_reject"}

reward_attr_dict = {"Reward Collected Start ITI": "reward_collect"}

In [12]:
create_event_idxs_container_for_sessions(sessions, actions_attr_dict, reward_attr_dict)

In [13]:
for k, v in sessions[0].event_idxs_container.data.items():
    if k.startswith('before') or k.startswith('after'):
        continue
    print(k, v)

iti_touch [13, 589, 618, 655, 1737, 1745, 1753, 1938, 1947, 2466, 2521, 2654, 2686, 3864, 4524, 4533, 4543, 4783, 4861, 5428, 5436, 5444, 5453, 5461, 5704, 6024, 6032, 6543, 6551, 6900, 7403, 7670, 7853, 8292, 8300, 8380, 8636, 8644, 8652, 8660, 8690]
dispimg [97, 121, 145, 168, 190, 230, 255, 283, 307, 331, 369, 391, 414, 439, 462, 496, 519, 541, 565, 594, 627, 662, 684, 706, 744, 767, 790, 812, 834, 856, 878, 900, 922, 944, 966, 988, 1010, 1032, 1054, 1076, 1098, 1120, 1142, 1164, 1186, 1208, 1232, 1256, 1281, 1304, 1330, 1352, 1374, 1396, 1418, 1440, 1462, 1486, 1508, 1530, 1552, 1574, 1596, 1618, 1640, 1663, 1688, 1712, 1760, 1782, 1804, 1826, 1856, 1891, 1913, 1954, 1976, 1998, 2020, 2042, 2064, 2086, 2108, 2131, 2153, 2176, 2198, 2238, 2266, 2288, 2310, 2335, 2359, 2385, 2413, 2440, 2471, 2493, 2527, 2549, 2574, 2597, 2628, 2662, 2692, 2717, 2739, 2761, 2783, 2805, 2827, 2851, 2873, 2913, 2958, 2980, 3009, 3032, 3055, 3077, 3099, 3121, 3143, 3165, 3187, 3209, 3232, 3254, 3276, 32

## Step 5: Applying the Signal Processing Pipeline

This step applies our chosen signal processing pipeline to the raw fluorescence data.

1.  **Define Processing Parameters**: The `PLOTTING_CONFIG` dictionary sets the rules for the analysis. Crucially, it defines the `fit_window_start` and `fit_window_end` which specify the exact time window used to calculate the baseline for dF/F calculations.

2.  **Apply the Pipeline**: The `PlottingSetup` class uses these parameters to process each signal. In this notebook, we use the `calculate_dff_and_zscore` method by default, which performs an isosbestic-based motion correction and then computes the z-scored dF/F.

3.  **Modular Design**: The code is structured so that different processing methods can be easily swapped. The main `apply_phot_iso_calculation` function can be passed any compatible function, like `calculate_dff_exp2_iso`, allowing for flexible analysis without changing the core workflow.

The final, processed signal is saved as a new `_dff` column in the photometry data tables.

In [14]:
import pandas as pd

# Use a single session for the demonstration
s0_for_dff_demo = sessions[0]

# ---
# 1. The "Before" State
# ---
print("## 1. State Before Signal Processing\n")
phot_470_cols_before = set(s0_for_dff_demo.dfs.data['phot_470'].columns)
print(f"Columns in 'phot_470': {list(phot_470_cols_before)}")
print(f"Session has 'trial_start_idx' attribute: {hasattr(s0_for_dff_demo, 'trial_start_idx')}")

# ---
# 2. Run the PlottingSetup function
# ---
PlottingSetup(**PLOTTING_CONFIG['cpt']).apply_plotting_setup_to_sessions([s0_for_dff_demo])

# ---
# 3. The "After" State
# ---
print("\n## 2. State After Signal Processing\n")
phot_470_cols_after = set(s0_for_dff_demo.dfs.data['phot_470'].columns)
print(f"Columns in 'phot_470': {list(phot_470_cols_after)}")
print(f"Session has 'trial_start_idx' attribute: {hasattr(s0_for_dff_demo, 'trial_start_idx')}")
print(f"Value of 'trial_start_idx': {s0_for_dff_demo.trial_start_idx}")
print(f"Value of 'fit_start' index: {s0_for_dff_demo.fit_start}")


# ---
# 4. The Delta
# ---
delta_cols = phot_470_cols_after - phot_470_cols_before
print("\n## 3. The Delta (What We Added) ✅\n")
print("The function added new attributes to the session object to define key time windows.")
print(f"It also added the final processed dF/F columns to the data tables: {list(delta_cols)}")

## 1. State Before Signal Processing

Columns in 'phot_470': ['sec_from_zero', 'LedState', 'FrameCounter', 'signal_0', 'signal_1', 'SystemTimestamp', 'sec_from_trial_start']
Session has 'trial_start_idx' attribute: False

## 2. State After Signal Processing

Columns in 'phot_470': ['sec_from_zero', 'LedState', 'FrameCounter', 'signal_0', 'signal_0_dff', 'signal_1', 'SystemTimestamp', 'sec_from_trial_start', 'signal_1_dff']
Session has 'trial_start_idx' attribute: True
Value of 'trial_start_idx': 40392
Value of 'fit_start' index: 21192

## 3. The Delta (What We Added) ✅

The function added new attributes to the session object to define key time windows.
It also added the final processed dF/F columns to the data tables: ['signal_0_dff', 'signal_1_dff']


## Step 6: Creating Event-Triggered Signal Matrices

This is the final step in our data preparation pipeline. We now have our processed neural signals (dF/F) and a complete index of when every behavioral event occurred (`event_idxs_container`). This function, `assign_sessions_signal_info`, brings them together to create data structures ready for analysis and plotting.

Here’s how it works:
1.  **Iterate Through Events**: The code goes through every event type we defined (e.g., 'hit', 'miss', 'reward_collect').
2.  **Extract Signal Snippets**: For each individual occurrence of an event, it finds the corresponding time in the dF/F signal and cuts out a small window of data (a "snippet") from a few seconds before to a few seconds after the event.
3.  **Stack into a Matrix**: All the snippets for a given event type are collected and stacked into a **`signal_matrix`** (a 2D NumPy array). Each row in this matrix represents the neural signal for a single trial of that event type.

This final `signal_info` object contains these matrices, making downstream analysis—like calculating the average response to an event or plotting a heatmap of all trials—incredibly straightforward.

In [15]:
# Use a single session for the demonstration
s0_for_matrix_demo = sessions[0]

# ---
# 1. The "Before" State
# ---
print("## 1. State Before Creating Signal Matrices\n")
print(f"Session has 'signal_info' attribute: {hasattr(s0_for_matrix_demo, 'signal_info')}")

# ---
# 2. Run the assign_sessions_signal_info function
# ---
assign_sessions_signal_info([s0_for_matrix_demo])

# ---
# 3. The "After" State
# ---
print("\n## 2. State After Creating Signal Matrices\n")
print(f"Session has 'signal_info' attribute: {hasattr(s0_for_matrix_demo, 'signal_info')}")

# Show a few examples of what's inside the new 'signal_info' dictionary
print("\nExample contents of 'signal_info':")
count = 0
for key, value in s0_for_matrix_demo.signal_info.items():
    if 'before' not in key[2] and 'after' not in key[2]: # Filter for clarity
        matrix_shape = value['signal_matrix'].shape
        print(f"- Key: {key}, Matrix Shape: {matrix_shape}")
        count += 1
        if count >= 3:
            break

# ---
# 4. The Delta
# ---
print("\n## 3. The Delta (What We Added) ✅\n")
print("The function added the 'signal_info' attribute to the session.")
print("This dictionary contains the final event-triggered signal matrices, ready for analysis.")

## 1. State Before Creating Signal Matrices

Session has 'signal_info' attribute: False

## 2. State After Creating Signal Matrices

Session has 'signal_info' attribute: True

Example contents of 'signal_info':
- Key: ('DLS', 'G', 'iti_touch'), Matrix Shape: (41, 400)
- Key: ('DLS', 'G', 'dispimg'), Matrix Shape: (345, 400)
- Key: ('DLS', 'G', 'hit'), Matrix Shape: (18, 400)

## 3. The Delta (What We Added) ✅

The function added the 'signal_info' attribute to the session.
This dictionary contains the final event-triggered signal matrices, ready for analysis.


In [16]:
assign_sessions_signal_info(sessions)

KeyError: 'signal_0_dff'