# Notebook 0: Setup LFP trial dataframe

Purpose:
---------
This notebook is dedicated to extracting local field potential (LFP) traces from Spikegadgets .rec files 
for the purpose of analyzing social competition trials in neuroscience research. The notebook processes 
the raw electrophysiology data, synchronizes it with corresponding video data, and computes various metrics.

Dataframe Columns Description:
------------------------------
- time: Spikegadgets timestamp used during recording.
- state: The state of the environmental control unit's (ECU) input for Spikegadgets recordings.
- recording_dir: Directory path where the recording file is located.
- recording_file: Base name of the recording file without the extension.
- din: Identifier of the input from the Spikegadgets ECU.
- time_stamp_index: Index representing the specific timestamp of the recording.
- video_file: Basename of the associated video file for the trial.
- video_frame: Specific video frame number corresponding to the trial.
- video_number: Unique number identifying the video associated with the trial.
- subject_info: Information about the subject, such as ID, age, or other relevant details.
- condition: Label categorizing the trial, typically based on experimental conditions.
- competition_closeness: Secondary label detailing the competitive closeness of the trial.
- lfp_index: Index associated with the LFP data for the trial.
- video_name: Basename of the video file associated with the LFP recording.
- baseline_lfp_index_range: Range of LFP indices representing the baseline period of the trial.
- trial_lfp_index_range: Range of LFP indices corresponding to the actual trial period.
- baseline_ephys_index_range: Range of electrophysiology (ephys) indices representing the baseline period.
- trial_ephys_index_range: Range of ephys indices corresponding to the trial period.
- baseline_videoframe_range: Range of video frames representing the baseline period of the trial.
- trial_videoframe_range: Range of video frames corresponding to the trial period.
- all_subjects: List or array of all subjects involved in the trials.
- current_subject: ID number or identifier of the subject for the current row in the DataFrame.
- trial_outcome: Updated trial label based on the "condition" column.

Processing Steps:
-----------------
1. Importing necessary libraries and setting up the environment.
2. Loading and preprocessing the tone timestamp data from an Excel file.
3. Identifying and processing all session directories containing .rec files.
4. Reformatting the DataFrame for analysis, including dropping unnecessary rows and columns.
5. Adding columns for different timestamps and ranges (LFP, ephys, video frames).
6. Converting trial labels to 'win' or 'lose' based on trial outcomes.
7. Adding a column for competition closeness and reformatting it.
8. Saving the processed DataFrame to a specified output directory.

Outputs:
--------
- Processed DataFrame saved as a CSV file, containing synchronized and labeled LFP and video data.
- Plots or other outputs can be added as per the project's specific analysis requirements.

Usage:
------
- Run the cells in order as they appear.
- Ensure that the file paths and directory names match your project's structure.
- Customize processing steps according to your project's specific needs.

Note:
-----
- This notebook is part of a larger research project. Ensure compatibility with other components of the project.
- The notebook is configured for a specific dataset structure and might require modifications for different data formats.

In [1]:
import sys
import os
import git

In [2]:
git_repo = git.Repo(".", search_parent_directories=True)
git_root = git_repo.git.rev_parse("--show-toplevel")

In [3]:
git_root

'/blue/npadillacoreano/ryoi360/projects/reward_competition_extention'

In [4]:
sys.path.insert(0, os.path.join(git_root, 'src'))

In [5]:
# Imports of all used packages and libraries
import glob
import numpy as np
import pandas as pd

In [6]:
import spikeinterface.extractors as se
import spikeinterface.preprocessing as sp

In [7]:
from utilities import helper

## Inputs & Data

Explanation of each input and where it comes from.

In [8]:
EPHYS_SAMPLING_RATE = 20000
LFP_SAMPLING_RATE = 1000
TRIAL_DURATION = 10
FRAME_RATE = 22
ECU_STREAM_ID = "ECU"
TRODES_STREAM_ID = "trodes"
LFP_FREQ_MIN = 0.5
LFP_FREQ_MAX = 300
ELECTRIC_NOISE_FREQ = 60
RECORDING_EXTENTION = "*.rec"

In [9]:
EPHYS_INDEX_COL = "time_stamp_index"
LFP_INDEX_COL = "lfp_index"
EPHYS_TIMESTAMP_COL = "time"

In [10]:
ORIGINAL_TRIAL_COL = "condition"
TRIAL_OUTCOME_COL = "trial_outcome"
RECORDING_DIR_COL = "recording_dir"
VIDEO_FRAME_COL = "video_frame"

In [11]:
STATE_COL = "state"
RECORDING_FILE_COL = "recording_file"
DIN_COL = "din"
TIME_STAMP_INDEX_COL = "time_stamp_index"
VIDEO_FILE_COL = "video_file"
VIDEO_NUMBER_COL = "video_number"
SUBJECT_INFO_COL = "subject_info"
COMPETITION_CLOSENESS_COL = "competition_closeness"
VIDEO_NAME_COL = "video_name"
BASELINE_LFP_INDEX_RANGE_COL = "baseline_lfp_index_range"
TRIAL_LFP_INDEX_RANGE_COL = "trial_lfp_index_range"
BASELINE_EPHYS_INDEX_RANGE_COL = "baseline_ephys_index_range"
TRIAL_EPHYS_INDEX_RANGE_COL = "trial_ephys_index_range"
BASELINE_VIDEOFRAME_RANGE_COL = "baseline_videoframe_range"
TRIAL_VIDEOFRAME_RANGE_COL = "trial_videoframe_range"
ALL_SUBJECTS_COL = "all_subjects"
CURRENT_SUBJECT_COL = "current_subject"


In [12]:
# NOTE: Change based on individual project data location

# Spreadsheet of tone time
TONE_TIMESTAMP_DF = pd.read_excel(os.path.join(git_root, "data/rce_per_trial_labeling.xlsx"), index_col=0)

In [13]:
TONE_TIMESTAMP_DF.head()

Unnamed: 0,time,state,recording_dir,recording_file,din,time_stamp_index,video_file,video_frame,reward_frame,video_number,subject_info,condition,competition_closeness,Unnamed: 14
2299.0,3772337.0,1.0,20221122_161341_omission_subject_6_1_and_6_3,20221122_161341_omission_subject_6_1_top_4_base_2,dio_ECU_Din1,0.0,20221122_161341_omission_subject_6_1_and_6_3.1...,0.0,80.0,1.0,6_1_top_4_base_2,,,
2300.0,5204112.0,1.0,20221122_161341_omission_subject_6_1_and_6_3,20221122_161341_omission_subject_6_1_top_4_base_2,dio_ECU_Din1,1431775.0,20221122_161341_omission_subject_6_1_and_6_3.1...,1784.0,1864.0,1.0,6_1_top_4_base_2,,,
2301.0,6804107.0,1.0,20221122_161341_omission_subject_6_1_and_6_3,20221122_161341_omission_subject_6_1_top_4_base_2,dio_ECU_Din1,3031770.0,20221122_161341_omission_subject_6_1_and_6_3.1...,3779.0,3859.0,1.0,6_1_top_4_base_2,,,
2302.0,8604101.0,1.0,20221122_161341_omission_subject_6_1_and_6_3,20221122_161341_omission_subject_6_1_top_4_base_2,dio_ECU_Din1,4831764.0,20221122_161341_omission_subject_6_1_and_6_3.1...,6021.0,6101.0,1.0,6_1_top_4_base_2,,,
2303.0,10204096.0,1.0,20221122_161341_omission_subject_6_1_and_6_3,20221122_161341_omission_subject_6_1_top_4_base_2,dio_ECU_Din1,6431759.0,20221122_161341_omission_subject_6_1_and_6_3.1...,8015.0,8095.0,1.0,6_1_top_4_base_2,,,


In [14]:
# NOTE: Change based on individual project data location
# Where all the recording files are being saved
ALL_SESSION_DIR = glob.glob("/blue/npadillacoreano/ryoi360/reward_competition_extention/data/standard/2023_06_16/*.rec")

In [15]:
ALL_SESSION_DIR

['/blue/npadillacoreano/ryoi360/reward_competition_extention/data/standard/2023_06_16/20230616_111904_standard_comp_to_training_D4_subj_1-4_and_1-2.rec']

## Outputs

Describe each output that the notebook creates. 

- Is it a plot or is it data?

- How valuable is the output and why is it valuable or useful?

In [16]:
# Inputs and Required data loading
# input varaible names are in all caps snake case
# Whenever an input changes or is used for processing 
# the vairables are all lower in snake case
OUTPUT_DIR = r"../../proc/" # where data is saved should always be shown in the inputs
os.makedirs(OUTPUT_DIR, exist_ok=True)

In [17]:
TONE_TIMESTAMPS_CSV = "rce_tone_timestamps.csv"
TONE_TIMESTAMPS_PKL = "rce_tone_timestamps.pkl"

# Functions

## Processing

Describe what is done to the data here and how inputs are manipulated to generate outputs. 

In [18]:
# As much code and as many cells as required
# includes EDA and playing with data
# GO HAM!

# Ideally functions are defined here first and then data is processed using the functions

# function names are short and in snake case all lowercase
# a function name should be unique but does not have to describe the function
# doc strings describe functions not function names




# Reformatting Dataframe

- Dropping all rows that have not been labeled

In [19]:
TONE_TIMESTAMP_DF = TONE_TIMESTAMP_DF.dropna(subset=ORIGINAL_TRIAL_COL).reset_index(drop=True)

In [20]:
sorted(TONE_TIMESTAMP_DF[RECORDING_DIR_COL].unique())

['20221202_134600_omission_and_competition_subject_6_1_and_6_2',
 '20221203_154800_omission_and_competition_subject_6_4_and_6_1',
 '20221214_125409_om_and_comp_6_1_and_6_3',
 '20221215_145401_comp_amd_om_6_1_and_6_3',
 '20230612_101430_standard_comp_to_training_D1_subj_1-4_and_1-3',
 '20230612_112630_standard_comp_to_training_D1_subj_1-2_and_1-1',
 '20230613_105657_standard_comp_to_training_D2_subj_1-1_and_1-4',
 '20230614_114041_standard_comp_to_training_D3_subj_1-1_and_1-2',
 '20230616_111904_standard_comp_to_training_D4_subj_1-4_and_1-2',
 '20230617_115521_standard_comp_to_omission_D1_subj_1-1_and_1-2',
 '20230618_100636_standard_comp_to_omission_D2_subj_1-4_and_1-1',
 '20230619_115321_standard_comp_to_omission_D3_subj_1-2_and_1-4',
 '20230620_114347_standard_comp_to_omission_D4_subj_1-2_and_1-1',
 '20230621_111240_standard_comp_to_omission_D5_subj_1-4_and_1-2',
 '20230622_110832_standard_comp_to_both_rewarded_D1_subj_1-1_and_1-2',
 '20230624_105855_standard_comp_to_both_rewarded_D3

- Adding the LFP index

In [21]:
TONE_TIMESTAMP_DF[EPHYS_INDEX_COL] = TONE_TIMESTAMP_DF[EPHYS_INDEX_COL].astype(int)

In [22]:
TONE_TIMESTAMP_DF[EPHYS_TIMESTAMP_COL] = TONE_TIMESTAMP_DF[EPHYS_TIMESTAMP_COL].astype(int)

In [23]:
TONE_TIMESTAMP_DF[LFP_INDEX_COL] = (TONE_TIMESTAMP_DF[EPHYS_INDEX_COL] // (EPHYS_SAMPLING_RATE/LFP_SAMPLING_RATE)).astype(int)

- Making the video frame number usable

In [24]:
TONE_TIMESTAMP_DF[VIDEO_FRAME_COL] = TONE_TIMESTAMP_DF[VIDEO_FRAME_COL].astype(int)

- Getting the name of the video so that we can sync it up with the ephys recording

In [25]:
TONE_TIMESTAMP_DF[VIDEO_NAME_COL]  = TONE_TIMESTAMP_DF[VIDEO_FILE_COL].apply(lambda x: x.strip(".videoTimeStamps.cameraHWSync"))

- Making columns of the different timestamps

In [26]:
TONE_TIMESTAMP_DF[BASELINE_LFP_INDEX_RANGE_COL] = TONE_TIMESTAMP_DF[LFP_INDEX_COL].apply(lambda x: (x - TRIAL_DURATION * LFP_SAMPLING_RATE, x))

In [27]:
TONE_TIMESTAMP_DF[TRIAL_LFP_INDEX_RANGE_COL] = TONE_TIMESTAMP_DF[LFP_INDEX_COL].apply(lambda x: (x, x + TRIAL_DURATION * LFP_SAMPLING_RATE))

In [28]:
TONE_TIMESTAMP_DF[BASELINE_EPHYS_INDEX_RANGE_COL] = TONE_TIMESTAMP_DF[EPHYS_INDEX_COL].apply(lambda x: (x - TRIAL_DURATION * EPHYS_SAMPLING_RATE, x))

In [29]:
TONE_TIMESTAMP_DF[TRIAL_EPHYS_INDEX_RANGE_COL] = TONE_TIMESTAMP_DF[EPHYS_INDEX_COL].apply(lambda x: (x, x + TRIAL_DURATION * EPHYS_SAMPLING_RATE))

In [30]:
TONE_TIMESTAMP_DF[BASELINE_VIDEOFRAME_RANGE_COL] = TONE_TIMESTAMP_DF[VIDEO_FRAME_COL].apply(lambda x: (x - TRIAL_DURATION * FRAME_RATE, x))

In [31]:
TONE_TIMESTAMP_DF[TRIAL_VIDEOFRAME_RANGE_COL] = TONE_TIMESTAMP_DF[VIDEO_FRAME_COL].apply(lambda x: (x, x + TRIAL_DURATION * FRAME_RATE))

In [32]:
TONE_TIMESTAMP_DF.columns

Index(['time', 'state', 'recording_dir', 'recording_file', 'din',
       'time_stamp_index', 'video_file', 'video_frame', 'reward_frame',
       'video_number', 'subject_info', 'condition', 'competition_closeness',
       'Unnamed: 14', 'lfp_index', 'video_name', 'baseline_lfp_index_range',
       'trial_lfp_index_range', 'baseline_ephys_index_range',
       'trial_ephys_index_range', 'baseline_videoframe_range',
       'trial_videoframe_range'],
      dtype='object')

In [33]:
TONE_TIMESTAMP_DF = TONE_TIMESTAMP_DF.drop(columns=["state", "din", "Unnamed: 13"], errors="ignore")

In [34]:
TONE_TIMESTAMP_DF.to_pickle(os.path.join(OUTPUT_DIR, TONE_TIMESTAMPS_PKL))

# Custom processing

- NOTE: Rest of notebook is project specific processing of collected data. Run the cells below if the format is similar to original project. 

In [35]:
raise ValueError()

ValueError: 

- Getting all subject IDs for a given recording

In [36]:
TONE_TIMESTAMP_DF = TONE_TIMESTAMP_DF[TONE_TIMESTAMP_DF[RECORDING_DIR_COL].str.contains("2023")].reset_index(drop=True)

In [37]:
# using different id extractions for different file formats
TONE_TIMESTAMP_DF[ALL_SUBJECTS_COL] = TONE_TIMESTAMP_DF[RECORDING_DIR_COL].apply(lambda x: x if "2023" in x else "subj" + "_".join(x.split("_")[-5:]))
TONE_TIMESTAMP_DF[ALL_SUBJECTS_COL] = TONE_TIMESTAMP_DF[ALL_SUBJECTS_COL].apply(lambda x: tuple(sorted([num.strip("_").replace("_",".") for num in x.replace("-", "_").split("subj")[-1].strip("_").split("and")])))
# TONE_TIMESTAMP_DF[ALL_SUBJECTS_COL] = TONE_TIMESTAMP_DF[ALL_SUBJECTS_COL].apply(lambda x: num.split("vs") for num in x)


In [38]:
TONE_TIMESTAMP_DF[ALL_SUBJECTS_COL].unique()

array([('1.3', '1.4'), ('1.1', '1.2'), ('1.1', '1.4'), ('1.2', '1.4'),
       ('1.1vs2.2', '1.2vs2.1')], dtype=object)

In [39]:
TONE_TIMESTAMP_DF[CURRENT_SUBJECT_COL] = TONE_TIMESTAMP_DF[SUBJECT_INFO_COL].apply(lambda x: ".".join(x.replace("-","_").split("_")[:2])).astype(str)
TONE_TIMESTAMP_DF[CURRENT_SUBJECT_COL] = TONE_TIMESTAMP_DF[CURRENT_SUBJECT_COL].apply(lambda x: x.split("vs")[0]).astype(str)





In [40]:
TONE_TIMESTAMP_DF[CURRENT_SUBJECT_COL].unique()

array(['1.3', '1.4', '1.2', '1.1'], dtype=object)

- Converting the trial label to win or lose based on who won the trial

In [41]:
TONE_TIMESTAMP_DF[ORIGINAL_TRIAL_COL] = TONE_TIMESTAMP_DF[ORIGINAL_TRIAL_COL].astype(str)

In [42]:
TONE_TIMESTAMP_DF[CURRENT_SUBJECT_COL] = TONE_TIMESTAMP_DF[CURRENT_SUBJECT_COL].astype(str)

In [43]:
TONE_TIMESTAMP_DF[TRIAL_OUTCOME_COL] = TONE_TIMESTAMP_DF.apply(
    lambda x: "win" if str(x[ORIGINAL_TRIAL_COL]).strip() == str(x[CURRENT_SUBJECT_COL]) 
             else ("lose" if str(x[ORIGINAL_TRIAL_COL]) in TONE_TIMESTAMP_DF[CURRENT_SUBJECT_COL].unique() 
                   else x[ORIGINAL_TRIAL_COL]), axis=1)

In [44]:
TONE_TIMESTAMP_DF[TRIAL_OUTCOME_COL].unique()

array(['lose', 'win', 'rewarded', 'omission'], dtype=object)

- Adding the competition closeness as a column

In [45]:
competition_closeness_map = {k: "non_comp" if "only" in str(k).lower() else "comp" if type(k) is str else np.nan for k in TONE_TIMESTAMP_DF[COMPETITION_CLOSENESS_COL].unique()}

In [46]:
competition_closeness_map

{'Subj 2 Only': 'non_comp',
 'Subj 1 Only': 'non_comp',
 'Subj 1 blocking Subj 2': 'comp',
 'Subj 2 blocking Subj 1': 'comp',
 'Subj 1 then Subj 2': 'comp',
 nan: nan,
 'Subj 2 then Subj 1': 'comp',
 'Close Call': 'comp'}

In [47]:
TONE_TIMESTAMP_DF[COMPETITION_CLOSENESS_COL] = TONE_TIMESTAMP_DF[COMPETITION_CLOSENESS_COL].map(competition_closeness_map)

In [48]:
TONE_TIMESTAMP_DF[COMPETITION_CLOSENESS_COL] = TONE_TIMESTAMP_DF.apply(lambda x: "_".join([str(x[TRIAL_OUTCOME_COL]), str(x[COMPETITION_CLOSENESS_COL])]).strip("nan").strip("_"), axis=1)

In [49]:
TONE_TIMESTAMP_DF[COMPETITION_CLOSENESS_COL].unique()

array(['lose_non_comp', 'win_non_comp', 'win_comp', 'lose_comp',
       'rewarded', 'omission'], dtype=object)

- Removing unnecessary columns

In [50]:
TONE_TIMESTAMP_DF = TONE_TIMESTAMP_DF.drop(columns=[STATE_COL, DIN_COL, ORIGINAL_TRIAL_COL], errors="ignore")
TONE_TIMESTAMP_DF = TONE_TIMESTAMP_DF.drop(columns=[col for col in TONE_TIMESTAMP_DF.columns if "unnamed" in col.lower()], errors="ignore")

In [51]:
TONE_TIMESTAMP_DF.groupby([COMPETITION_CLOSENESS_COL]).count()

Unnamed: 0_level_0,time,recording_dir,recording_file,time_stamp_index,video_file,video_frame,reward_frame,video_number,subject_info,lfp_index,video_name,baseline_lfp_index_range,trial_lfp_index_range,baseline_ephys_index_range,trial_ephys_index_range,baseline_videoframe_range,trial_videoframe_range,all_subjects,current_subject,trial_outcome
competition_closeness,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
lose_comp,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168
lose_non_comp,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106
omission,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36
rewarded,246,246,246,246,246,246,246,246,246,246,246,246,246,246,246,246,246,246,246,246
win_comp,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110
win_non_comp,104,104,104,104,104,104,104,104,104,104,104,104,104,104,104,104,104,104,104,104


In [52]:
TONE_TIMESTAMP_DF.head()

Unnamed: 0,time,recording_dir,recording_file,time_stamp_index,video_file,video_frame,reward_frame,video_number,subject_info,competition_closeness,...,video_name,baseline_lfp_index_range,trial_lfp_index_range,baseline_ephys_index_range,trial_ephys_index_range,baseline_videoframe_range,trial_videoframe_range,all_subjects,current_subject,trial_outcome
0,9781115,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,982229,20230612_101430_standard_comp_to_training_D1_s...,980,1060.0,1.0,1-3_t3b3L_box2,lose_non_comp,...,20230612_101430_standard_comp_to_training_D1_s...,"(39111, 49111)","(49111, 59111)","(782229, 982229)","(982229, 1182229)","(760, 980)","(980, 1200)","(1.3, 1.4)",1.3,lose
1,12181113,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,3382227,20230612_101430_standard_comp_to_training_D1_s...,3376,3456.0,1.0,1-3_t3b3L_box2,win_non_comp,...,20230612_101430_standard_comp_to_training_D1_s...,"(159111, 169111)","(169111, 179111)","(3182227, 3382227)","(3382227, 3582227)","(3156, 3376)","(3376, 3596)","(1.3, 1.4)",1.3,win
2,14481111,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,5682225,20230612_101430_standard_comp_to_training_D1_s...,5671,5751.0,1.0,1-3_t3b3L_box2,lose_non_comp,...,20230612_101430_standard_comp_to_training_D1_s...,"(274111, 284111)","(284111, 294111)","(5482225, 5682225)","(5682225, 5882225)","(5451, 5671)","(5671, 5891)","(1.3, 1.4)",1.3,lose
3,16281110,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,7482224,20230612_101430_standard_comp_to_training_D1_s...,7468,7548.0,1.0,1-3_t3b3L_box2,lose_non_comp,...,20230612_101430_standard_comp_to_training_D1_s...,"(364111, 374111)","(374111, 384111)","(7282224, 7482224)","(7482224, 7682224)","(7248, 7468)","(7468, 7688)","(1.3, 1.4)",1.3,lose
4,17381106,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,8582220,20230612_101430_standard_comp_to_training_D1_s...,8566,8646.0,1.0,1-3_t3b3L_box2,lose_non_comp,...,20230612_101430_standard_comp_to_training_D1_s...,"(419111, 429111)","(429111, 439111)","(8382220, 8582220)","(8582220, 8782220)","(8346, 8566)","(8566, 8786)","(1.3, 1.4)",1.3,lose


In [53]:
TONE_TIMESTAMP_DF.to_csv(os.path.join(OUTPUT_DIR, TONE_TIMESTAMPS_CSV))
TONE_TIMESTAMP_DF.to_pickle(os.path.join(OUTPUT_DIR, TONE_TIMESTAMPS_PKL))