# Notebook 1: Extract Z-scored LFP

### Overview
This Jupyter notebook focuses on extracting local field potential (LFP) traces from Spikegadgets `.rec` files, specifically for neuroscience research related to social competition trials. The notebook includes procedures for preprocessing and synchronizing raw electrophysiology data with corresponding video data and computes various metrics, including Z-scored LFPs.

### Inputs & Data Sources
- **Electrophysiology and LFP Parameters**: Constants like `EPHYS_SAMPLING_RATE`, `LFP_SAMPLING_RATE`, `TRIAL_DURATION`, etc., define basic parameters for LFP data processing.
- **Recording Information**: Stream IDs (`ECU_STREAM_ID`, `TRODES_STREAM_ID`), recording extension (`RECORDING_EXTENSION`), and paths to recording directories (`ALL_SESSION_DIR`).
- **DataFrames for Mapping and Timestamps**: `CHANNEL_MAPPING_DF` for channel mapping, and `TONE_TIMESTAMP_DF` for tone timestamps, loaded from external sources.
- **Constants for DataFrame Columns**: Names for various columns in the DataFrame, defined in an all-caps snake case format, such as `EPHYS_INDEX_COL`, `LFP_INDEX_COL`, etc.

### Output & Utility
- **Processed Data**: The notebook outputs processed data, particularly the Z-scored LFP traces, which are critical for further analysis in neuroscience research.
- **Data Files**: Outputs are saved in various formats (`CSV`, `Pickle`) in a specified output directory (`OUTPUT_DIR`).
- **Visualization**: While not explicitly mentioned, the notebook has the potential for data visualization (plots) based on processed LFP data.

### Processing Workflow
1. **LFP Extraction and Preprocessing**: 
    - Iterates through recording sessions to process `.rec` files.
    - Applies a series of preprocessing steps like bandpass filtering, notch filtering, resampling, and Z-scoring on the LFP data.
    - Exception handling for cases where the recording doesn't contain specified stream IDs.

2. **DataFrame Manipulation and Merging**:
    - Filtering `TONE_TIMESTAMP_DF` for trials with obtained LFP.
    - Addition of trial numbers and merging with `CHANNEL_MAPPING_DF`.
    - Dropping unnecessary columns and restructuring for analysis.

3. **LFP Trace Extraction for Each Trial and Brain Region**: 
    - Linking LFP calculations with trials.
    - Creating new rows for each brain region, extracting baseline, trial, and combined LFP traces.
    - Results in a comprehensive DataFrame that combines trial information with corresponding LFP traces.

4. **Data Storage**:
    - Saving processed DataFrame in both `CSV` and `Pickle` formats for easy access and future use.

### Usage Notes
- The notebook is project-specific and tailored for a particular dataset structure, requiring modifications for different data formats.
- Users should ensure file paths and directory names match their project's structure and adjust constants and parameters as needed for their specific analysis requirements.
- The notebook forms a part of a larger research framework, thus necessitating compatibility checks with other components of the project.

### Dependencies
- Python Libraries: `sys`, `os`, `glob`, `numpy`, `pandas`, `spikeinterface`
- External Data: Channel mapping and tone timestamp files, along with Spikegadgets `.rec` files.

### Customization and Scalability
- The notebook's modular design allows for easy adaptation to different datasets or extensions to include additional processing steps.
- Functions and processing steps are clearly demarcated, facilitating straightforward updates or enhancements.

### Conclusion
This notebook is a vital tool in the preprocessing and analysis of LFP data from Spikegadgets recordings, integral to neuroscience research focused on social competition trials. It offers a structured approach to handle, process, and store electrophysiological data, ensuring reproducibility and efficiency in research workflows.

In [1]:
import sys
import os
import git

In [2]:
git_repo = git.Repo(".", search_parent_directories=True)
git_root = git_repo.git.rev_parse("--show-toplevel")

In [3]:
git_root

'/blue/npadillacoreano/ryoi360/projects/reward_competition_extention'

In [4]:
sys.path.insert(0, os.path.join(git_root, 'src'))

In [5]:
# Imports of all used packages and libraries
import glob
import numpy as np
import pandas as pd

In [6]:
import spikeinterface.extractors as se
import spikeinterface.preprocessing as sp

In [7]:
from utilities import helper

## Inputs & Data

Explanation of each input and where it comes from.

In [8]:
EPHYS_SAMPLING_RATE = 20000
LFP_SAMPLING_RATE = 1000
TRIAL_DURATION = 10
FRAME_RATE = 22
ECU_STREAM_ID = "ECU"
TRODES_STREAM_ID = "trodes"
LFP_FREQ_MIN = 0.5
LFP_FREQ_MAX = 300
ELECTRIC_NOISE_FREQ = 60
RECORDING_EXTENTION = "*.rec"

In [9]:
EPHYS_INDEX_COL = "time_stamp_index"
LFP_INDEX_COL = "lfp_index"
EPHYS_TIMESTAMP_COL = "time"

In [10]:
RECORDING_FILE_COL = "recording_file"
RECORDING_DIR_COL = "recording_dir"
BASELINE_LFP_INDEX_RANGE_COL = "baseline_lfp_index_range"
TRIAL_LFP_INDEX_RANGE_COL = "trial_lfp_index_range"
BASELINE_EPHYS_INDEX_RANGE_COL = "baseline_ephys_index_range"
TRIAL_EPHYS_INDEX_RANGE_COL = "trial_ephys_index_range"
BASELINE_VIDEOFRAME_RANGE_COL = "baseline_videoframe_range"
TRIAL_VIDEOFRAME_RANGE_COL = "trial_videoframe_range"
CURRENT_SUBJECT_COL = "current_subject"
ALL_CH_LFP_COL = "all_ch_lfp"
SUBJECT_COL = "Subject"
TRIAL_NUMBER_COL = "trial_number"
SPIKE_INTERFACE_COL = "spike_interface"
EIB_COL = "eib"

In [11]:
SPIKE_INTERFACE_COL.upper()

'SPIKE_INTERFACE'

In [12]:
# NOTE: Change based on individual project data location

# Spreadsheet of channel mapping
CHANNEL_MAPPING_DF = pd.read_excel("../../data/rce_per_subject_channel_mapping.xlsx")
# Spreadsheet of tone time
TONE_TIMESTAMP_DF = pd.read_pickle("../../proc/rce_tone_timestamps.pkl")
# TONE_TIMESTAMP_DF = pd.read_pickle("../../data/rce_per_trial_labeling.xlsx")

In [13]:
CHANNEL_MAPPING_DF.head()

Unnamed: 0,Cohort,Subject,eib_mPFC,eib_vHPC,eib_BLA,eib_LH,eib_MD,spike_interface_mPFC,spike_interface_vHPC,spike_interface_BLA,spike_interface_LH,spike_interface_MD
0,1,6.1,,15,14,13,31,21.0,15.0,14.0,13.0,16.0
1,1,6.2,,15,14,13,31,,,,,
2,1,6.3,,15,14,13,31,,,,,
3,1,6.4,,15,14,13,31,,,,,
4,2,1.1,,16,17,18,19,5.0,31.0,30.0,29.0,28.0


In [14]:
# NOTE: Change based on individual project data location
# Where all the recording files are being saved
ALL_SESSION_DIR = glob.glob("/blue/npadillacoreano/ryoi360/reward_competition_extention/data/standard/2023_06_16/*.rec")

In [15]:
ALL_SESSION_DIR

['/blue/npadillacoreano/ryoi360/reward_competition_extention/data/standard/2023_06_16/20230616_111904_standard_comp_to_training_D4_subj_1-4_and_1-2.rec']

## Outputs

Describe each output that the notebook creates. 

- Is it a plot or is it data?

- How valuable is the output and why is it valuable or useful?

In [16]:
# Inputs and Required data loading
# input varaible names are in all caps snake case
# Whenever an input changes or is used for processing 
# the vairables are all lower in snake case
OUTPUT_DIR = r"./proc/" # where data is saved should always be shown in the inputs
os.makedirs(OUTPUT_DIR, exist_ok=True)

In [17]:
TONE_TIMESTAMPS_CSV = "rce_tone_timestamps.csv"
TONE_TIMESTAMPS_PKL = "rce_tone_timestamps.pkl"
FULL_LFP_TRACES_PKL = "full_baseline_and_trial_lfp_traces.pkl"

# Functions

## Processing

Describe what is done to the data here and how inputs are manipulated to generate outputs. 

In [18]:
# As much code and as many cells as required
# includes EDA and playing with data
# GO HAM!

# Ideally functions are defined here first and then data is processed using the functions

# function names are short and in snake case all lowercase
# a function name should be unique but does not have to describe the function
# doc strings describe functions not function names




# Extracting the LFP

In [19]:
recording_name_to_all_ch_lfp = {}
# Going through all the recording sessions 
for session_dir in ALL_SESSION_DIR:
    # Going through all the recordings in each session
    for recording_path in glob.glob(os.path.join(session_dir, RECORDING_EXTENTION)):
        try:
            recording_basename = os.path.splitext(os.path.basename(recording_path))[0]
            # checking to see if the recording has an ECU component
            # if it doesn't, then the next one be extracted
            current_recording = se.read_spikegadgets(recording_path, stream_id=ECU_STREAM_ID)
            current_recording = se.read_spikegadgets(recording_path, stream_id=TRODES_STREAM_ID)
            print(recording_basename)
            # Preprocessing the LFP
            current_recording = sp.bandpass_filter(current_recording, freq_min=LFP_FREQ_MIN, freq_max=LFP_FREQ_MAX)
            current_recording = sp.notch_filter(current_recording, freq=ELECTRIC_NOISE_FREQ)
            current_recording = sp.resample(current_recording, resample_rate=LFP_SAMPLING_RATE)
            current_recording = sp.zscore(current_recording)
            recording_name_to_all_ch_lfp[recording_basename] = current_recording
        except Exception as error:
            # handle the exception
            print("An exception occurred:", error) # An exception occurred: division by zero
    




20230616_111904_standard_comp_to_training_D4_subj_1-4_t4b3L_box1_merged
An exception occurred: stream_id trodes is not in ['ECU']
20230616_111904_standard_comp_to_training_D4_subj_1-2_t2b2L_box2_merged


In [20]:
TONE_TIMESTAMP_DF

Unnamed: 0,time,recording_dir,recording_file,time_stamp_index,video_file,video_frame,reward_frame,video_number,subject_info,competition_closeness,...,video_name,baseline_lfp_index_range,trial_lfp_index_range,baseline_ephys_index_range,trial_ephys_index_range,baseline_videoframe_range,trial_videoframe_range,all_subjects,current_subject,trial_outcome
0,9781115,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,982229,20230612_101430_standard_comp_to_training_D1_s...,980,1060.0,1.0,1-3_t3b3L_box2,lose_non_comp,...,20230612_101430_standard_comp_to_training_D1_s...,"(39111, 49111)","(49111, 59111)","(782229, 982229)","(982229, 1182229)","(760, 980)","(980, 1200)","(1.3, 1.4)",1.3,lose
1,12181113,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,3382227,20230612_101430_standard_comp_to_training_D1_s...,3376,3456.0,1.0,1-3_t3b3L_box2,win_non_comp,...,20230612_101430_standard_comp_to_training_D1_s...,"(159111, 169111)","(169111, 179111)","(3182227, 3382227)","(3382227, 3582227)","(3156, 3376)","(3376, 3596)","(1.3, 1.4)",1.3,win
2,14481111,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,5682225,20230612_101430_standard_comp_to_training_D1_s...,5671,5751.0,1.0,1-3_t3b3L_box2,lose_non_comp,...,20230612_101430_standard_comp_to_training_D1_s...,"(274111, 284111)","(284111, 294111)","(5482225, 5682225)","(5682225, 5882225)","(5451, 5671)","(5671, 5891)","(1.3, 1.4)",1.3,lose
3,16281110,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,7482224,20230612_101430_standard_comp_to_training_D1_s...,7468,7548.0,1.0,1-3_t3b3L_box2,lose_non_comp,...,20230612_101430_standard_comp_to_training_D1_s...,"(364111, 374111)","(374111, 384111)","(7282224, 7482224)","(7482224, 7682224)","(7248, 7468)","(7468, 7688)","(1.3, 1.4)",1.3,lose
4,17381106,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,8582220,20230612_101430_standard_comp_to_training_D1_s...,8566,8646.0,1.0,1-3_t3b3L_box2,lose_non_comp,...,20230612_101430_standard_comp_to_training_D1_s...,"(419111, 429111)","(429111, 439111)","(8382220, 8582220)","(8582220, 8782220)","(8346, 8566)","(8566, 8786)","(1.3, 1.4)",1.3,lose
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
765,28120638,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,24708627,20230628_111202_standard_comp_to_novel_agent_D...,24552,24632.0,1.0,1-2vs1-1and2-1,win_comp,...,20230628_111202_standard_comp_to_novel_agent_D...,"(1225431, 1235431)","(1235431, 1245431)","(24508627, 24708627)","(24708627, 24908627)","(24332, 24552)","(24552, 24772)","(1.1vs2.2, 1.2vs2.1)",1.2,win
766,29720656,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,26308645,20230628_111202_standard_comp_to_novel_agent_D...,26149,26229.0,1.0,1-2vs1-1and2-1,lose_comp,...,20230628_111202_standard_comp_to_novel_agent_D...,"(1305432, 1315432)","(1315432, 1325432)","(26108645, 26308645)","(26308645, 26508645)","(25929, 26149)","(26149, 26369)","(1.1vs2.2, 1.2vs2.1)",1.2,lose
767,31120674,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,27708663,20230628_111202_standard_comp_to_novel_agent_D...,27547,27627.0,1.0,1-2vs1-1and2-1,lose_comp,...,20230628_111202_standard_comp_to_novel_agent_D...,"(1375433, 1385433)","(1385433, 1395433)","(27508663, 27708663)","(27708663, 27908663)","(27327, 27547)","(27547, 27767)","(1.1vs2.2, 1.2vs2.1)",1.2,lose
768,33320701,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,29908690,20230628_111202_standard_comp_to_novel_agent_D...,29743,29823.0,1.0,1-2vs1-1and2-1,lose_comp,...,20230628_111202_standard_comp_to_novel_agent_D...,"(1485434, 1495434)","(1495434, 1505434)","(29708690, 29908690)","(29908690, 30108690)","(29523, 29743)","(29743, 29963)","(1.1vs2.2, 1.2vs2.1)",1.2,lose


In [21]:
RECORDING_TO_SUBJECT = TONE_TIMESTAMP_DF.drop_duplicates(subset=[RECORDING_FILE_COL, CURRENT_SUBJECT_COL])[[RECORDING_DIR_COL, RECORDING_FILE_COL, CURRENT_SUBJECT_COL]].copy()

In [22]:
RECORDING_TO_SUBJECT = RECORDING_TO_SUBJECT[RECORDING_TO_SUBJECT[RECORDING_FILE_COL].isin(recording_name_to_all_ch_lfp)].reset_index(drop=True)

In [23]:
RECORDING_TO_SUBJECT

Unnamed: 0,recording_dir,recording_file,current_subject
0,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,1.4


## Adding the channel mapping

In [24]:
CHANNEL_MAPPING_DF = CHANNEL_MAPPING_DF.drop(columns=[col for col in CHANNEL_MAPPING_DF.columns if "eib" in col], errors="ignore")

- Adding all the brain region to ch information

In [25]:
CHANNEL_MAPPING_DF[SUBJECT_COL] = CHANNEL_MAPPING_DF[SUBJECT_COL].astype(str)

- Merging the recording and the channel dataframes

In [26]:
RECORDING_TO_SUBJECT = pd.merge(RECORDING_TO_SUBJECT, CHANNEL_MAPPING_DF, left_on=CURRENT_SUBJECT_COL, right_on=SUBJECT_COL, how="left")



In [27]:
RECORDING_TO_SUBJECT

Unnamed: 0,recording_dir,recording_file,current_subject,Cohort,Subject,spike_interface_mPFC,spike_interface_vHPC,spike_interface_BLA,spike_interface_LH,spike_interface_MD
0,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,1.4,2,1.4,15.0,31.0,30.0,29.0,28.0


In [28]:
columns_to_convert = [col for col in RECORDING_TO_SUBJECT.columns if "spike_interface" in col]

for col in columns_to_convert:
    RECORDING_TO_SUBJECT[col] = RECORDING_TO_SUBJECT[col].astype(int).astype(str)


# Getting the channel specific LFP traces

- Linking up all LFP calculations with all the trials

In [29]:
RECORDING_TO_SUBJECT[ALL_CH_LFP_COL] = RECORDING_TO_SUBJECT[RECORDING_FILE_COL].map(recording_name_to_all_ch_lfp)



In [30]:
RECORDING_TO_SUBJECT

Unnamed: 0,recording_dir,recording_file,current_subject,Cohort,Subject,spike_interface_mPFC,spike_interface_vHPC,spike_interface_BLA,spike_interface_LH,spike_interface_MD,all_ch_lfp
0,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,1.4,2,1.4,15,31,30,29,28,ZScoreRecording: 32 channels - 1.0kHz - 1 segm...


- Extracting the traces for each brain region

In [31]:
for col in columns_to_convert:
    print(col)
    brain_region = col.strip(SPIKE_INTERFACE_COL).strip("_")
    trace_column = "{}_lfp_trace".format(brain_region)
    RECORDING_TO_SUBJECT[trace_column] = RECORDING_TO_SUBJECT.apply(lambda row: row[ALL_CH_LFP_COL].get_traces(channel_ids=[row[col]]).T[0], axis=1)
                                                                                                                                                       
                                                                                                                                                    

spike_interface_mPFC
spike_interface_vHPC
spike_interface_BLA
spike_interface_LH
spike_interface_MD


In [32]:
RECORDING_TO_SUBJECT = RECORDING_TO_SUBJECT.drop(columns=[ALL_CH_LFP_COL], errors="ignore")

In [33]:
RECORDING_TO_SUBJECT = RECORDING_TO_SUBJECT.drop(columns=[col for col in RECORDING_TO_SUBJECT if SPIKE_INTERFACE_COL in col], errors="ignore")

In [34]:
RECORDING_TO_SUBJECT.head()

Unnamed: 0,recording_dir,recording_file,current_subject,Cohort,Subject,mPFC_lfp_trace,vHPC_lfp_trace,BLA_lfp_trace,LH_lfp_trace,MD_lfp_trace
0,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,1.4,2,1.4,"[-0.30157474, -0.25617638, -0.045398347, 0.252...","[0.316397, 0.54940253, 0.72599626, 0.66713166,...","[0.20018278, 0.35167247, 0.5121794, 0.6059587,...","[0.03897052, 0.044965982, 0.14688888, 0.350734...","[0.6358626, 0.98053575, 1.1825856, 1.224184, 1..."


In [35]:
RECORDING_TO_SUBJECT.tail()

Unnamed: 0,recording_dir,recording_file,current_subject,Cohort,Subject,mPFC_lfp_trace,vHPC_lfp_trace,BLA_lfp_trace,LH_lfp_trace,MD_lfp_trace
0,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,1.4,2,1.4,"[-0.30157474, -0.25617638, -0.045398347, 0.252...","[0.316397, 0.54940253, 0.72599626, 0.66713166,...","[0.20018278, 0.35167247, 0.5121794, 0.6059587,...","[0.03897052, 0.044965982, 0.14688888, 0.350734...","[0.6358626, 0.98053575, 1.1825856, 1.224184, 1..."


In [36]:
RECORDING_TO_SUBJECT.columns

Index(['recording_dir', 'recording_file', 'current_subject', 'Cohort',
       'Subject', 'mPFC_lfp_trace', 'vHPC_lfp_trace', 'BLA_lfp_trace',
       'LH_lfp_trace', 'MD_lfp_trace'],
      dtype='object')

In [37]:
RECORDING_TO_SUBJECT.to_pickle(os.path.join(OUTPUT_DIR, FULL_LFP_TRACES_PKL))