# Notebook 1: Extract LFP

### Overview
This Jupyter notebook focuses on extracting local field potential (LFP) traces from Spikegadgets `.rec` files, specifically for neuroscience research related to social competition trials. The notebook includes procedures for preprocessing and synchronizing raw electrophysiology data with corresponding video data and computes various metrics, including Z-scored LFPs.

### Inputs & Data Sources
- **Electrophysiology and LFP Parameters**: Constants like `EPHYS_SAMPLING_RATE`, `LFP_SAMPLING_RATE`, `TRIAL_DURATION`, etc., define basic parameters for LFP data processing.
- **Recording Information**: Stream IDs (`ECU_STREAM_ID`, `TRODES_STREAM_ID`), recording extension (`RECORDING_EXTENSION`), and paths to recording directories (`ALL_SESSION_DIR`).
- **DataFrames for Mapping and Timestamps**: `CHANNEL_MAPPING_DF` for channel mapping, and `TONE_TIMESTAMP_DF` for tone timestamps, loaded from external sources.
- **Constants for DataFrame Columns**: Names for various columns in the DataFrame, defined in an all-caps snake case format, such as `EPHYS_INDEX_COL`, `LFP_INDEX_COL`, etc.

### Output & Utility
- **Processed Data**: The notebook outputs processed data, particularly the Z-scored LFP traces, which are critical for further analysis in neuroscience research.
- **Data Files**: Outputs are saved in various formats (`CSV`, `Pickle`) in a specified output directory (`OUTPUT_DIR`).
- **Visualization**: While not explicitly mentioned, the notebook has the potential for data visualization (plots) based on processed LFP data.

### Processing Workflow
1. **LFP Extraction and Preprocessing**: 
    - Iterates through recording sessions to process `.rec` files.
    - Applies a series of preprocessing steps like bandpass filtering, notch filtering, resampling, and Z-scoring on the LFP data.
    - Exception handling for cases where the recording doesn't contain specified stream IDs.

2. **DataFrame Manipulation and Merging**:
    - Filtering `TONE_TIMESTAMP_DF` for trials with obtained LFP.
    - Addition of trial numbers and merging with `CHANNEL_MAPPING_DF`.
    - Dropping unnecessary columns and restructuring for analysis.

3. **LFP Trace Extraction for Each Trial and Brain Region**: 
    - Linking LFP calculations with trials.
    - Creating new rows for each brain region, extracting baseline, trial, and combined LFP traces.
    - Results in a comprehensive DataFrame that combines trial information with corresponding LFP traces.

4. **Data Storage**:
    - Saving processed DataFrame in both `CSV` and `Pickle` formats for easy access and future use.

### Usage Notes
- The notebook is project-specific and tailored for a particular dataset structure, requiring modifications for different data formats.
- Users should ensure file paths and directory names match their project's structure and adjust constants and parameters as needed for their specific analysis requirements.
- The notebook forms a part of a larger research framework, thus necessitating compatibility checks with other components of the project.

### Dependencies
- Python Libraries: `sys`, `os`, `glob`, `numpy`, `pandas`, `spikeinterface`
- External Data: Channel mapping and tone timestamp files, along with Spikegadgets `.rec` files.

### Customization and Scalability
- The notebook's modular design allows for easy adaptation to different datasets or extensions to include additional processing steps.
- Functions and processing steps are clearly demarcated, facilitating straightforward updates or enhancements.

### Conclusion
This notebook is a vital tool in the preprocessing and analysis of LFP data from Spikegadgets recordings, integral to neuroscience research focused on social competition trials. It offers a structured approach to handle, process, and store electrophysiological data, ensuring reproducibility and efficiency in research workflows.

In [1]:
import sys
import os
import git

In [2]:
git_repo = git.Repo(".", search_parent_directories=True)
git_root = git_repo.git.rev_parse("--show-toplevel")

In [3]:
git_root

'/nancy/user/riwata/projects/reward_comp_ext'

In [4]:
sys.path.insert(0, os.path.join(git_root, 'src'))

In [5]:
# Imports of all used packages and libraries
import glob
import numpy as np
import pandas as pd

In [6]:
import spikeinterface.extractors as se
import spikeinterface.preprocessing as sp

In [7]:
from utilities import helper

## Inputs & Data

Explanation of each input and where it comes from.

In [8]:
EPHYS_SAMPLING_RATE = 20000
LFP_SAMPLING_RATE = 1000
LFP_RESAMPLE_RATIO = EPHYS_SAMPLING_RATE / LFP_SAMPLING_RATE
TRIAL_DURATION = 10
FRAME_RATE = 22
ECU_STREAM_ID = "ECU"
TRODES_STREAM_ID = "trodes"
LFP_FREQ_MIN = 0.5
LFP_FREQ_MAX = 300
ELECTRIC_NOISE_FREQ = 60
RECORDING_EXTENTION = "*.rec"

In [9]:
EPHYS_INDEX_COL = "time_stamp_index"
LFP_INDEX_COL = "lfp_index"
EPHYS_TIMESTAMP_COL = "time"
RECORDING_FILE_COL = "recording_file"
RECORDING_DIR_COL = "recording_dir"
BASELINE_LFP_INDEX_RANGE_COL = "baseline_lfp_index_range"
TRIAL_LFP_INDEX_RANGE_COL = "trial_lfp_index_range"
BASELINE_EPHYS_INDEX_RANGE_COL = "baseline_ephys_index_range"
TRIAL_EPHYS_INDEX_RANGE_COL = "trial_ephys_index_range"
BASELINE_VIDEOFRAME_RANGE_COL = "baseline_videoframe_range"
TRIAL_VIDEOFRAME_RANGE_COL = "trial_videoframe_range"
CURRENT_SUBJECT_COL = "current_subject"
ALL_CH_LFP_COL = "all_ch_lfp"
SUBJECT_COL = "Subject"
TRIAL_NUMBER_COL = "trial_number"
SPIKE_INTERFACE_COL = "spike_interface"
EIB_COL = "eib"

In [10]:
LFP_TRACE_COLUMNS = ["session_dir", "recording", "metadata_dir", "metadata_file", "first_dtype_name", "first_item_data", "last_dtype_name", "last_item_data", 'all_subjects', 'current_subject', 'filename']
VIDEO_COLUMNS = ['session_dir', 'recording', 'metadata_dir', 'metadata_file',
      'session_path', 'first_dtype_name', 'first_item_data',
       'all_subjects', 'current_subject', 'filename']

In [11]:
# NOTE: Change based on individual project data location

# Spreadsheet of channel mapping
CHANNEL_MAPPING_DF = pd.read_excel("./data/rce3_per_subject_channel_mapping.xlsx")
# Spreadsheet of tone time
CHANNEL_CONVERSION_DF = pd.read_excel("./data/EIB_to_channel_mapping.xlsx", header=1)

In [12]:
CHANNEL_CONVERSION_DF

Unnamed: 0,eib_board,Tetrode,spike_gadgets_minilogger_left,trodes_left,spike_interface_left,Unnamed: 6,spike_gadgets_minilogger_right,trodes_right,spike_interface_right,Unnamed: 9,spike_gadgets_minilogger_notion,trodes_notion,spike_interface_notion
0,Reference 1,Reference 1,Reference,,,,Reference,,,,,,
1,0,1,0,1.0,0.0,,16,17.0,16.0,,31.0,32.0,31.0
2,1,1,1,2.0,1.0,,17,18.0,17.0,,30.0,31.0,30.0
3,2,1,2,3.0,2.0,,18,19.0,18.0,,29.0,30.0,29.0
4,3,1,3,4.0,3.0,,19,20.0,19.0,,28.0,29.0,28.0
5,4,2,4,5.0,4.0,,20,21.0,20.0,,27.0,28.0,27.0
6,5,2,5,6.0,5.0,,21,22.0,21.0,,26.0,27.0,26.0
7,6,2,6,7.0,6.0,,22,23.0,22.0,,25.0,26.0,25.0
8,7,2,7,8.0,7.0,,23,24.0,23.0,,24.0,25.0,24.0
9,8,3,8,9.0,8.0,,24,25.0,24.0,,23.0,24.0,23.0


In [13]:
CHANNEL_MAPPING_DF.head()

Unnamed: 0,Cohort,Subject,eib_mPFC,eib_vHPC,eib_BLA,eib_LH,eib_MD,spike_interface_mPFC,spike_interface_vHPC,spike_interface_BLA,spike_interface_LH,spike_interface_MD,notes,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19
0,,3.1,,31.0,30.0,29.0,28.0,23.0,16.0,17.0,18.0,19.0,,,,,,"Here the channels, Leo",,
1,,3.3,,15.0,14.0,13.0,12.0,21.0,15.0,14.0,13.0,12.0,,,,,,Mouse,Channel,Brain region
2,,3.4,,31.0,30.0,29.0,28.0,22.0,16.0,17.0,18.0,19.0,,,,,,3.3,12,MD
3,,4.2,,15.0,14.0,13.0,12.0,5.0,15.0,14.0,13.0,12.0,,,,,,,13,LH
4,,4.3,,31.0,30.0,29.0,28.0,14.0,16.0,17.0,18.0,19.0,,,,,,,14,BLA


## Outputs

Describe each output that the notebook creates. 

- Is it a plot or is it data?

- How valuable is the output and why is it valuable or useful?

In [14]:
# Inputs and Required data loading
# input varaible names are in all caps snake case
# Whenever an input changes or is used for processing 
# the vairables are all lower in snake case
OUTPUT_DIR = r"./proc/" # where data is saved should always be shown in the inputs
os.makedirs(OUTPUT_DIR, exist_ok=True)
OUTPUT_PREFIX = "rce_pilot_3_alone_comp"

In [15]:
FULL_LFP_TRACES_PKL = "{}_01_lfp_traces_and_frames.pkl".format(OUTPUT_PREFIX)

# Functions

## Processing

Describe what is done to the data here and how inputs are manipulated to generate outputs. 

In [16]:
CHANNEL_CONVERSION_DF.columns

Index(['eib_board', 'Tetrode', 'spike_gadgets_minilogger_left', 'trodes_left',
       'spike_interface_left', ' ', 'spike_gadgets_minilogger_right',
       'trodes_right', 'spike_interface_right', 'Unnamed: 9',
       'spike_gadgets_minilogger_notion', 'trodes_notion',
       'spike_interface_notion'],
      dtype='object')

In [17]:
for col in CHANNEL_CONVERSION_DF:
    CHANNEL_CONVERSION_DF[col] = CHANNEL_CONVERSION_DF[col].astype(str).apply(lambda x: x.strip())

In [19]:
left_conversion_dict = dict(zip(CHANNEL_CONVERSION_DF['eib_board'], CHANNEL_CONVERSION_DF['spike_interface_left']))
right_conversion_dict = dict(zip(CHANNEL_CONVERSION_DF['eib_board'], CHANNEL_CONVERSION_DF['spike_interface_right']))

In [21]:
notion_conversion_dict = dict(zip(CHANNEL_CONVERSION_DF['eib_board'], CHANNEL_CONVERSION_DF['spike_interface_notion']))

In [22]:
CHANNEL_MAPPING_DF = CHANNEL_MAPPING_DF.dropna(subset=["Subject"])

In [23]:
for col in CHANNEL_MAPPING_DF:
    if "eib" in col and "mPFC" not in col:
        CHANNEL_MAPPING_DF[col] = CHANNEL_MAPPING_DF[col].astype(int).astype(str)

In [25]:
for col in CHANNEL_MAPPING_DF:
    if "eib" in col and "mPFC" not in col:
        brain_region = col.replace("eib_", "")
        CHANNEL_MAPPING_DF["left_{}".format(brain_region)] = CHANNEL_MAPPING_DF[col].map(left_conversion_dict)


for col in CHANNEL_MAPPING_DF:
    if "eib" in col and "mPFC" not in col:
        brain_region = col.replace("eib_", "")
        CHANNEL_MAPPING_DF["right_{}".format(brain_region)] = CHANNEL_MAPPING_DF[col].map(right_conversion_dict)

for col in CHANNEL_MAPPING_DF:
    if "eib" in col and "mPFC" not in col:
        brain_region = col.replace("eib_", "")
        CHANNEL_MAPPING_DF["notion_{}".format(brain_region)] = CHANNEL_MAPPING_DF[col].map(notion_conversion_dict)

In [26]:
CHANNEL_MAPPING_DF

Unnamed: 0,Cohort,Subject,eib_mPFC,eib_vHPC,eib_BLA,eib_LH,eib_MD,spike_interface_mPFC,spike_interface_vHPC,spike_interface_BLA,...,left_LH,left_MD,right_vHPC,right_BLA,right_LH,right_MD,notion_vHPC,notion_BLA,notion_LH,notion_MD
0,,3.1,,31,30,29,28,23.0,16.0,17.0,...,18,19,0,1,2,3,0.0,1.0,2.0,3.0
1,,3.3,,15,14,13,12,21.0,15.0,14.0,...,13,12,31,30,29,28,16.0,17.0,18.0,19.0
2,,3.4,,31,30,29,28,22.0,16.0,17.0,...,18,19,0,1,2,3,0.0,1.0,2.0,3.0
3,,4.2,,15,14,13,12,5.0,15.0,14.0,...,13,12,31,30,29,28,16.0,17.0,18.0,19.0
4,,4.3,,31,30,29,28,14.0,16.0,17.0,...,18,19,0,1,2,3,0.0,1.0,2.0,3.0
5,,4.4,,15,14,13,12,22.0,15.0,14.0,...,13,12,31,30,29,28,16.0,17.0,18.0,19.0
6,,5.2,,15,14,13,12,22.0,15.0,14.0,...,13,12,31,30,29,28,16.0,17.0,18.0,19.0
7,,5.3,,15,14,13,12,10.0,15.0,14.0,...,13,12,31,30,29,28,16.0,17.0,18.0,19.0
8,,5.4,,31,30,29,28,22.0,16.0,17.0,...,18,19,0,1,2,3,0.0,1.0,2.0,3.0


In [27]:
CHANNEL_MAPPING_DF[["Subject"] + [col for col in CHANNEL_MAPPING_DF if "spike_interface" in col]]

Unnamed: 0,Subject,spike_interface_mPFC,spike_interface_vHPC,spike_interface_BLA,spike_interface_LH,spike_interface_MD
0,3.1,23.0,16.0,17.0,18.0,19.0
1,3.3,21.0,15.0,14.0,13.0,12.0
2,3.4,22.0,16.0,17.0,18.0,19.0
3,4.2,5.0,15.0,14.0,13.0,12.0
4,4.3,14.0,16.0,17.0,18.0,19.0
5,4.4,22.0,15.0,14.0,13.0,12.0
6,5.2,22.0,15.0,14.0,13.0,12.0
7,5.3,10.0,15.0,14.0,13.0,12.0
8,5.4,22.0,16.0,17.0,18.0,19.0


In [28]:
CHANNEL_MAPPING_DF[["Subject"] + [col for col in CHANNEL_MAPPING_DF if "eib" in col]]

Unnamed: 0,Subject,eib_mPFC,eib_vHPC,eib_BLA,eib_LH,eib_MD
0,3.1,,31,30,29,28
1,3.3,,15,14,13,12
2,3.4,,31,30,29,28
3,4.2,,15,14,13,12
4,4.3,,31,30,29,28
5,4.4,,15,14,13,12
6,5.2,,15,14,13,12
7,5.3,,15,14,13,12
8,5.4,,31,30,29,28


In [31]:
CHANNEL_MAPPING_DF[["Subject"] + [col for col in CHANNEL_MAPPING_DF if "notion" in col]]

Unnamed: 0,Subject,notion_vHPC,notion_BLA,notion_LH,notion_MD
0,3.1,0.0,1.0,2.0,3.0
1,3.3,16.0,17.0,18.0,19.0
2,3.4,0.0,1.0,2.0,3.0
3,4.2,16.0,17.0,18.0,19.0
4,4.3,0.0,1.0,2.0,3.0
5,4.4,16.0,17.0,18.0,19.0
6,5.2,16.0,17.0,18.0,19.0
7,5.3,16.0,17.0,18.0,19.0
8,5.4,0.0,1.0,2.0,3.0


In [29]:
CHANNEL_MAPPING_DF[["Subject"] + [col for col in CHANNEL_MAPPING_DF if "left" in col]]

Unnamed: 0,Subject,left_vHPC,left_BLA,left_LH,left_MD
0,3.1,16,17,18,19
1,3.3,15,14,13,12
2,3.4,16,17,18,19
3,4.2,15,14,13,12
4,4.3,16,17,18,19
5,4.4,15,14,13,12
6,5.2,15,14,13,12
7,5.3,15,14,13,12
8,5.4,16,17,18,19


In [30]:
CHANNEL_MAPPING_DF[["Subject"] + [col for col in CHANNEL_MAPPING_DF if "right" in col]]

Unnamed: 0,Subject,right_vHPC,right_BLA,right_LH,right_MD
0,3.1,0,1,2,3
1,3.3,31,30,29,28
2,3.4,0,1,2,3
3,4.2,31,30,29,28
4,4.3,0,1,2,3
5,4.4,31,30,29,28
6,5.2,31,30,29,28
7,5.3,31,30,29,28
8,5.4,0,1,2,3


## Notes on subject channel mapping

3.1 (confirmed with 3)
    - 20240320
        - 31 looks different but 28, 29, 30 look similar
        - 0, 1, 2, 3 look different
        - matches right
    - 20240323
        - 31 looks broken or at least very different
        - double check by making bigger
        - 0 to 3 looks different
            - matches right
    - 20240317
        - 0 to 3 look different
            - matches right

3.3 (confirmed with 3)
    - 20240320
        - 28, 29, 30, 31 looks different-ish. need to reconfirm with maybe short length than 500?
    - 20240322
        - 1, 16 to 19 might be broken
        - hard to know which ones are actually different
    - 20240317
        - maybe 28 to 31 are different? hard to tell
            - matches right
    - 20240318 
        - 28 to 31 definitely looks different
            - matches right alignment
3.4 (kinda confirmed with 3)
    - 20240323
        - maybe 28 to 31 are different??
    - 20240322
        - maybe 28 to 31 are different??
            - Doesn't match any of the mappings
            - Possibly 16 to 19 are different which matches left
    - 20240318
        - maybe 28 to 31 are different?? 
        
4.2
    - 20240320
        - 28 to 31 are different
            matches right
    - 20240323
        - 28 to 31 are different
        matches right
    - 20240317
        - 28 to 31 are different
4.3 (confirmed by 3)
    - 20240320
        - 0 to 3 and 28 to 31 look different
    - 20240322
        - Maybe 28 to 31 are different
            - Doesn't match any mapping
    - 20240317
        - 28 to 31 are different
            - Doesn't match any mapping
4.4
    - 20240323 
        - 28 to 31 are a lot bigger
        - double check by making smaller
    - 20240322
        - 28 to 31 are a lot bigger
            - matches right
5.2
    - 20240323
        - Maybe 0 to 3 are different? Doesn't match any mapping
        - Maybe 28 to 31 are different? Matches right
5.3
    - 20240323
        - 12 to 15 are different
            - matches left
        - 28 to 31 are different
            - matches right
5.4