# Time Stamp Extract

Brief 1-2 sentence description of notebook.

TODO: Supplement the description
- Notebook that extracts the timestamps and gets the time that tones played

In [1]:
# Imports of all used packages and libraries
import sys
import os
import git
import glob
from collections import defaultdict

In [2]:
git_repo = git.Repo(".", search_parent_directories=True)
git_root = git_repo.git.rev_parse("--show-toplevel")

In [3]:
git_root

'/nancy/user/riwata/projects/reward_comp_ext'

In [4]:
sys.path.insert(0, os.path.join(git_root, 'src'))

In [5]:
# Imports of all used packages and libraries
import glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [6]:
import spikeinterface.extractors as se
import spikeinterface.preprocessing as sp

In [7]:
import utilities.helper
import trodes.read_exported

# Functions

In [8]:
import re

def extract_floats(s):
    """
    Extracts all floats from a string and returns them as a list of strings.

    Parameters:
    - s (str): The string to extract floats from.

    Returns:
    - list: A list of strings, each representing a float found in the input string.
    """
    float_pattern = r"[-+]?\d*\.\d+|\d+"
    return [str(float(num)) for num in re.findall(float_pattern, s)]

## Inputs & Data

- Explanation of each input and where it comes from.

Inputs and Required data loading
- input variable names are in all caps snake case
- Whenever an input changes or is used for processing 
- The variables are all lower in snake case

In [9]:
# Path of the directory that contains the Spike Gadgets recording and the exported timestamp files
# Exported with this tool https://docs.spikegadgets.com/en/latest/basic/ExportFunctions.html
# Export these files:
    # -raw – Continuous raw band export.
    # -dio – Digital IO channel state change export.
    # -analogio – Continuous analog IO export.
INPUT_DIR = "/scratch/back_up/reward_competition_extention/data/rce_cohort_3"
OUTPUT_DIR = r"./proc" # where data is saved should always be shown in the inputs
TONE_DIN = "dio_ECU_Din1"
TONE_STATE = 1
os.makedirs(OUTPUT_DIR, exist_ok=True)
OUTPUT_PREFIX = "rce_pilot_3_comp_both_rewarded"

In [10]:
COLS_TO_KEEP = ['session_dir', 'recording', 'metadata_dir', 'metadata_file',
'original_file', 'filename', 'session_path', 'all_subjects',
       'current_subject', 'event_timestamps', 'video_name',
       'video_timestamps', 'event_frames', 'first_item_data']

In [11]:
RAW_COLS_TO_KEEP = ['session_dir',
 'recording',
 'original_file',
 'session_path',
 'current_subject',
 'first_item_data',
 'first_timestamp',
 'all_subjects']

In [12]:
STATE_COLS_TO_KEEP = ['session_dir',
 'metadata_file',
 'event_timestamps',
 'video_name',
 'video_timestamps',
 'event_frames',]

In [13]:
same_columns = ['session_dir', 'video_name']
different_columns = ['metadata_file', 'event_frames', 'event_timestamps']

In [14]:
# TODO: Find way not to hard code this
# ALL_SESSION_DIR = glob.glob("/scratch/back_up/reward_competition_extention/data/standard/2023_06_*/*.rec")
ALL_SESSION_DIR = glob.glob("/scratch/back_up/reward_competition_extention/data/rce_cohort_3/both_rewarded/*.rec")



In [15]:
ALL_SESSION_DIR

['/scratch/back_up/reward_competition_extention/data/rce_cohort_3/both_rewarded/20240330_141834_comp_both_subj_3-3_and_3-4.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/both_rewarded/20240329_160356_comp_both_subj_4_2_and_4-3.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/both_rewarded/20240327_165151_comp_both_subj_4-2_and_4-4.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/both_rewarded/20240331_164629_comp_both_subj_4-3_and_4-4.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/both_rewarded/20240331_152930_comp_both_subj_5-2_and_5-3.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/both_rewarded/20240327_153306_comp_both_subj_3-1_and_3-3.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/both_rewarded/20240329_144111_comp_both_subj_3-1_and_3-4.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/both_rewarded/20240327_141822_c

## Outputs

Describe each output that the notebook creates. 

- Is it a plot or is it data?

- How valuable is the output and why is it valuable or useful?

## Other documentation

raw directory
- raw_group0.dat
    - voltage_value: Array with voltage measurement for each channel at each timestamp
- timestamps.dat
    - voltage_time_stamp: The time stamp of each voltage measurement

parent directory
- 1.videoTimeStamps.cameraHWSync
    - frame_number: Calculated by getting the index of each video time stamp tuple 
    - PosTimestamp: The time stamp of each video frame
    - HWframeCount: Unknown value. Starts at 30742 and increases by 1 for each tuple  
    - HWTimestamp: Unknown value. All zeroes
    - video_time: Calculated by dividing the frame number by the fps(frames per second) 
    - video_seconds: video_time, but rounded to seconds  	
    - These are filled in versions of the above collumns with the value from the most recent previous cell
        - filled_PosTimestamp 	
        - filledHWframeCount 	
        - filled_frame_number 	
        - filled_video_time 	
        - filled_video_seconds 	

DIO directory
- dio_ECU_Din1.dat
    - time: The time stamp the corresponds to the DIN input
    - state: Binary state of whether there is input from DIN or not 	
    - trial_number: Calculated by adding 1 to every time there is a DIN input
    - These are filled in versions of the above collumns with the value from the most recent previous cell
        - filled_state 	
        - filled_trial_number

ss_output directory (Spike sorting with Spike interface)
- firings.npz
    - unit_id: All the units that had a spike train for the given timestamp 	
    - number_of_units: Calculated by counting the number of units that had a spike train

## Functions

- function names are short and in snake case all lowercase
- a function name should be unique but does not have to describe the function
- doc strings describe functions not function names

## Processing

Describe what is done to the data here and how inputs are manipulated to generate outputs. 

In [16]:
# As much code and as many cells as required
# includes EDA and playing with data
# GO HAM!

# LOOP 1: Extracting all the Trodes

- Getting all the data from all the exported Trodes files

- Getting all the data from all the exported Trodes files and saving it to `session_to_trodes_data`
    - Creates a dictionary with the structure of:
        - `{dir_name: {file_name: metadata, file_name_2: metadata_2}, dir_name_2: {file_name_3: metadata_3, file_name_4: metadata_4}}`

In [17]:
# Saving the trodes data for each session
# Each key is a session name
# Each value is a dictionary of every recording file in that session
session_to_trodes_data = utilities.helper.create_recursive_dict()


# Saving the path of the session recording
session_to_path = {}

# Going through each session recording
# Which includes all the recordings from all the miniloggers and cameras
for session_path in ALL_SESSION_DIR:   
    try:
        # Getting the name of the session from the path
        session_basename = os.path.splitext(os.path.basename(session_path))[0]
        print("Current Session: {}".format(session_basename))
        # Reading the trodes data for every recording file in the session directory
        session_to_trodes_data[session_basename] = trodes.read_exported.organize_all_trodes_export(session_path)
        
        session_to_path[session_basename] = session_path
    except Exception as e: 
        print(e)


Current Session: 20240330_141834_comp_both_subj_3-3_and_3-4
Skipping file 20240330_141834_comp_both_subj_3-3_t5b5_merged.timestampoffset.txt due to error: Settings format not supported


  return np.dtype(dtype_spec)


Skipping file 20240330_141834_comp_both_subj_3-4_t6b6_merged.timestampoffset.txt due to error: Settings format not supported
Current Session: 20240329_160356_comp_both_subj_4_2_and_4-3
Skipping file 20240329_160356_comp_both_subj_4-2_t4b4_merged.timestampoffset.txt due to error: Settings format not supported
Skipping file 20240329_160356_comp_both_subj_4-3_t3b3_merged.timestampoffset.txt due to error: Settings format not supported
Current Session: 20240327_165151_comp_both_subj_4-2_and_4-4
Skipping file 20240327_165151_comp_both_subj_4-4_t4b4_merged.timestampoffset.txt due to error: Settings format not supported
Skipping file 20240327_165151_comp_both_subj_4-2_t3b3_merged.timestampoffset.txt due to error: Settings format not supported
Current Session: 20240331_164629_comp_both_subj_4-3_and_4-4
Skipping file 20240331_164629_comp_both_subj_4-4_t3b3_merged.timestampoffset.txt due to error: Settings format not supported
Skipping file 20240331_164629_comp_both_subj_4-3_t4b4_merged.timestamp

In [18]:
session_to_trodes_data

defaultdict(<function utilities.helper.create_recursive_dict()>,
            {'20240330_141834_comp_both_subj_3-3_and_3-4': defaultdict(dict,
                         {'20240330_141834_comp_both_subj_3-4_t6b6_merged': {'DIO': {'dio_ECU_Din1': {'description': 'State change data for one digital channel. Display_order is 1-based',
                             'byte_order': 'little endian',
                             'original_file': '20240330_141834_comp_both_subj_3-4_t6b6_merged.rec',
                             'clockrate': '20000',
                             'trodes_version': '2.3.4',
                             'compile_date': 'Nov 28 2022',
                             'compile_time': '15:10:45',
                             'qt_version': '6.2.2',
                             'commit_tag': 'heads/Release_2.3.4-0-gd5a58cd9-dirty',
                             'controller_firmware': '3.17',
                             'headstage_firmware': '2.4',
                             'co

- Adding the video timestamps

In [19]:
for session_path in ALL_SESSION_DIR:   
    try:
        session_basename = os.path.splitext(os.path.basename(session_path))[0]
        print("Current Session: {}".format(session_basename))
        file_to_video_timestamps = {}
        for video_timestamps in glob.glob(os.path.join(session_path, "*cameraHWSync")):
            video_basename = os.path.basename(video_timestamps)
            print("Current Video Name: {}".format(video_basename))
            timestamp_array = trodes.read_exported.read_trodes_extracted_data_file(video_timestamps)
            if "video_timestamps" not in session_to_trodes_data[session_basename][session_basename]:
                session_to_trodes_data[session_basename][session_basename]["video_timestamps"] = defaultdict(dict)
            session_to_trodes_data[session_basename][session_basename]["video_timestamps"][video_basename.split(".")[-3]] = timestamp_array
    
    
    except Exception as e: 
        print(e)

Current Session: 20240330_141834_comp_both_subj_3-3_and_3-4
Current Video Name: 20240330_141834_comp_both_subj_3-3_and_3-4.1.videoTimeStamps.cameraHWSync
Current Video Name: 20240330_141834_comp_both_subj_3-3_and_3-4.2.videoTimeStamps.cameraHWSync
Current Session: 20240329_160356_comp_both_subj_4_2_and_4-3
Current Video Name: 20240329_160356_comp_both_subj_4-2_and_4-3.1.videoTimeStamps.cameraHWSync
Current Video Name: 20240329_160356_comp_both_subj_4-2_and_4-3.2.videoTimeStamps.cameraHWSync
Current Session: 20240327_165151_comp_both_subj_4-2_and_4-4
Current Video Name: 20240327_165151_comp_both_subj_4-2_and_4-4.2.videoTimeStamps.cameraHWSync
Current Video Name: 20240327_165151_comp_both_subj_4-2_and_4-4.1.videoTimeStamps.cameraHWSync
Current Session: 20240331_164629_comp_both_subj_4-3_and_4-4
Current Video Name: 20240331_164629_comp_both_subj_4-3_and_4-4.1.videoTimeStamps.cameraHWSync
Current Video Name: 20240331_164629_comp_both_subj_4-3_and_4-4.2.videoTimeStamps.cameraHWSync
Current 

In [20]:
session_to_trodes_data[session_basename][session_basename]["video_timestamps"]

defaultdict(dict,
            {'1': {'clock rate': '20000',
              'camera_name': 'HD USB Camera (\\\\?\\usb#vid_32e4&pid_9230&mi_00#6&bec0719&2&0000#{e5323777-f976-4f5b-9b55-b94699c46e44}\\global)',
              'fields': '<PosTimestamp uint32><HWframeCount uint32><HWTimestamp uint64>',
              'data': array([( 2279754, 0, 0), ( 2279754, 0, 0), ( 2281140, 0, 0), ...,
                     (66200216, 0, 0), (66201602, 0, 0), (66201602, 0, 0)],
                    dtype=[('PosTimestamp', '<u4'), ('HWframeCount', '<u4'), ('HWTimestamp', '<u8')]),
              'filename': '20240330_153409_comp_both_subj_4-3_and_4-4.1.videoTimeStamps.cameraHWSync'},
             '2': {'clock rate': '20000',
              'camera_name': 'HD USB Camera (\\\\?\\usb#vid_32e4&pid_9230&mi_00#6&315f6863&1&0000#{e5323777-f976-4f5b-9b55-b94699c46e44}\\global)',
              'fields': '<PosTimestamp uint32><HWframeCount uint32><HWTimestamp uint64>',
              'data': array([( 2279754, 0, 0), ( 227

- Creating a dataframe the dictionary with a column for:
  - Session directory
  - Recording name
  - Metadata directory
  - Metadata file
  - And a column for each metadata

In [21]:
# Creating a dataframe from the nested dictionary
trodes_metadata_df = pd.DataFrame.from_dict({(i,j,k,l): session_to_trodes_data[i][j][k][l] 
                           for i in session_to_trodes_data.keys() 
                           for j in session_to_trodes_data[i].keys()
                           for k in session_to_trodes_data[i][j].keys()
                           for l in session_to_trodes_data[i][j][k].keys()},
                           orient='index')

# Resetting the index and renaming the columns
trodes_metadata_df = trodes_metadata_df.reset_index()
trodes_metadata_df = trodes_metadata_df.rename(columns={'level_0': 'session_dir', 'level_1': 'recording', 'level_2': 'metadata_dir', 'level_3': 'metadata_file'}, errors="ignore")

# Adding the session path to the dataframe
trodes_metadata_df["session_path"] = trodes_metadata_df["session_dir"].map(session_to_path)

In [22]:
trodes_metadata_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,direction,id,display_order,fields,data,filename,decimation,clock rate,camera_name,session_path
0,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,input,ECU_Din1,7,<time uint32><state uint8>,"[[1829348, 1], [1906254, 0], [3106667, 1], [33...",20240330_141834_comp_both_subj_3-4_t6b6_merged...,,,,/scratch/back_up/reward_competition_extention/...
1,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din4,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,input,ECU_Din4,9,<time uint32><state uint8>,"[[1829348, 0]]",20240330_141834_comp_both_subj_3-4_t6b6_merged...,,,,/scratch/back_up/reward_competition_extention/...
2,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Dout1,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,output,ECU_Dout1,2,<time uint32><state uint8>,"[[1829348, 0]]",20240330_141834_comp_both_subj_3-4_t6b6_merged...,,,,/scratch/back_up/reward_competition_extention/...
3,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Dout4,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,output,ECU_Dout4,5,<time uint32><state uint8>,"[[1829348, 0]]",20240330_141834_comp_both_subj_3-4_t6b6_merged...,,,,/scratch/back_up/reward_competition_extention/...
4,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,input,ECU_Din2,6,<time uint32><state uint8>,"[[1829348, 1], [1906254, 0], [30612400, 1], [3...",20240330_141834_comp_both_subj_3-4_t6b6_merged...,,,,/scratch/back_up/reward_competition_extention/...


In [23]:
trodes_metadata_df.tail()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,direction,id,display_order,fields,data,filename,decimation,clock rate,camera_name,session_path
211,20240329_144111_comp_both_subj_3-1_and_3-4,20240329_144111_comp_both_subj_3-1_and_3-4,video_timestamps,2,,,,,,,...,,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[1808558, 0, 0], [1808558, 0, 0], [1809944, 0...",20240329_144111_comp_both_subj_3-1_and_3-4.2.v...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...
212,20240327_141822_comp_both_subj_5-2_and_5-3,20240327_141822_comp_both_subj_5-2_and_5-3,video_timestamps,1,,,,,,,...,,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[1948531, 0, 0], [1949917, 0, 0], [1949917, 0...",20240327_141822_comp_both_subj_5-2_and_5-3.1.v...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...
213,20240327_141822_comp_both_subj_5-2_and_5-3,20240327_141822_comp_both_subj_5-2_and_5-3,video_timestamps,2,,,,,,,...,,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[1948531, 0, 0], [1949917, 0, 0], [1951303, 0...",20240327_141822_comp_both_subj_5-2_and_5-3.2.v...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...
214,20240330_153409_comp_both_subj_4-3_and_4-4,20240330_153409_comp_both_subj_4-3_and_4-4,video_timestamps,1,,,,,,,...,,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[2279754, 0, 0], [2279754, 0, 0], [2281140, 0...",20240330_153409_comp_both_subj_4-3_and_4-4.1.v...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...
215,20240330_153409_comp_both_subj_4-3_and_4-4,20240330_153409_comp_both_subj_4-3_and_4-4,video_timestamps,2,,,,,,,...,,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[2279754, 0, 0], [2279754, 0, 0], [2281140, 0...",20240330_153409_comp_both_subj_4-3_and_4-4.2.v...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...


- Getting the first item from each tuple in the arrays in the `data` column
  - This first item is usually just the timestamp

In [24]:
trodes_metadata_df["data"].iloc[0]

array([( 1829348, 1), ( 1906254, 0), ( 3106667, 1), ( 3306666, 0),
       ( 5106691, 1), ( 5306693, 0), ( 6306705, 1), ( 6506708, 0),
       ( 7306717, 1), ( 7506720, 0), ( 8806733, 1), ( 9006735, 0),
       (10206753, 1), (10406753, 0), (11406767, 1), (11606770, 0),
       (13106788, 1), (13306790, 0), (14406804, 1), (14606806, 0),
       (15706819, 1), (15906822, 0), (16706829, 1), (16906833, 0),
       (17906846, 1), (18106848, 0), (18906860, 1), (19106858, 0),
       (20006871, 1), (20206874, 0), (22006893, 1), (22206898, 0),
       (23406912, 1), (23606915, 0), (24506926, 1), (24706930, 0),
       (26006944, 1), (26206946, 0), (27306957, 1), (27506962, 0),
       (28406973, 1), (28606975, 0), (30106993, 1), (30306998, 0),
       (36110466, 1), (36310466, 0), (37110475, 1), (37310478, 0),
       (38210491, 1), (38410491, 0), (39210504, 1), (39410505, 0),
       (40710524, 1), (40910521, 0), (42410542, 1), (42610542, 0),
       (44710570, 1), (44910572, 0), (45710582, 1), (45910584,

In [25]:
# Getting the dtype name of each column in the numpy array
trodes_metadata_df["first_dtype_name"] = trodes_metadata_df["data"].apply(lambda x: x.dtype.names[0])
# Getting the first item of each column in the numpy array
trodes_metadata_df["first_item_data"] = trodes_metadata_df["data"].apply(lambda x: x[x.dtype.names[0]])


In [26]:
# Same as above but for the last column
trodes_metadata_df["last_dtype_name"] = trodes_metadata_df["data"].apply(lambda x: x.dtype.names[-1])
trodes_metadata_df["last_item_data"] = trodes_metadata_df["data"].apply(lambda x: x[x.dtype.names[-1]])

In [27]:
trodes_metadata_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,data,filename,decimation,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data
0,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,"[[1829348, 1], [1906254, 0], [3106667, 1], [33...",20240330_141834_comp_both_subj_3-4_t6b6_merged...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1829348, 1906254, 3106667, 3306666, 5106691, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ..."
1,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din4,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,"[[1829348, 0]]",20240330_141834_comp_both_subj_3-4_t6b6_merged...,,,,/scratch/back_up/reward_competition_extention/...,time,[1829348],state,[0]
2,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Dout1,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,"[[1829348, 0]]",20240330_141834_comp_both_subj_3-4_t6b6_merged...,,,,/scratch/back_up/reward_competition_extention/...,time,[1829348],state,[0]
3,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Dout4,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,"[[1829348, 0]]",20240330_141834_comp_both_subj_3-4_t6b6_merged...,,,,/scratch/back_up/reward_competition_extention/...,time,[1829348],state,[0]
4,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,"[[1829348, 1], [1906254, 0], [30612400, 1], [3...",20240330_141834_comp_both_subj_3-4_t6b6_merged...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1829348, 1906254, 30612400, 30647402, 3167401...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ..."


In [28]:
trodes_metadata_df.tail()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,data,filename,decimation,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data
211,20240329_144111_comp_both_subj_3-1_and_3-4,20240329_144111_comp_both_subj_3-1_and_3-4,video_timestamps,2,,,,,,,...,"[[1808558, 0, 0], [1808558, 0, 0], [1809944, 0...",20240329_144111_comp_both_subj_3-1_and_3-4.2.v...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[1808558, 1808558, 1809944, 1811330, 1811330, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
212,20240327_141822_comp_both_subj_5-2_and_5-3,20240327_141822_comp_both_subj_5-2_and_5-3,video_timestamps,1,,,,,,,...,"[[1948531, 0, 0], [1949917, 0, 0], [1949917, 0...",20240327_141822_comp_both_subj_5-2_and_5-3.1.v...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[1948531, 1949917, 1949917, 1951303, 1952689, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
213,20240327_141822_comp_both_subj_5-2_and_5-3,20240327_141822_comp_both_subj_5-2_and_5-3,video_timestamps,2,,,,,,,...,"[[1948531, 0, 0], [1949917, 0, 0], [1951303, 0...",20240327_141822_comp_both_subj_5-2_and_5-3.2.v...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[1948531, 1949917, 1951303, 1951303, 1952689, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
214,20240330_153409_comp_both_subj_4-3_and_4-4,20240330_153409_comp_both_subj_4-3_and_4-4,video_timestamps,1,,,,,,,...,"[[2279754, 0, 0], [2279754, 0, 0], [2281140, 0...",20240330_153409_comp_both_subj_4-3_and_4-4.1.v...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2279754, 2279754, 2281140, 2282526, 2283912, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
215,20240330_153409_comp_both_subj_4-3_and_4-4,20240330_153409_comp_both_subj_4-3_and_4-4,video_timestamps,2,,,,,,,...,"[[2279754, 0, 0], [2279754, 0, 0], [2281140, 0...",20240330_153409_comp_both_subj_4-3_and_4-4.2.v...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2279754, 2279754, 2281140, 2282526, 2283912, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."


In [29]:
trodes_metadata_df["recording"].unique()

array(['20240330_141834_comp_both_subj_3-4_t6b6_merged',
       '20240330_141834_comp_both_subj_3-3_t5b5_merged',
       '20240329_160356_comp_both_subj_4-3_t3b3_merged',
       '20240329_160356_comp_both_subj_4-2_t4b4_merged',
       '20240327_165151_comp_both_subj_4-4_t4b4_merged',
       '20240327_165151_comp_both_subj_4-2_t3b3_merged',
       '20240331_164629_comp_both_subj_4-4_t3b3_merged',
       '20240331_164629_comp_both_subj_4-3_t4b4_merged',
       '20240331_152930_comp_both_subj_5-2_t6b6_merged',
       '20240331_152930_comp_both_subj_5-3_t5b5_merged',
       '20240327_153306_comp_both_subj_3-3_t2b2_merged',
       '20240327_153306_comp_both_subj_3-1_t1b1_merged',
       '20240329_144111_comp_both_subj_3-1_t1b1_merged',
       '20240329_144111_comp_both_subj_3-4_t2b2_merged',
       '20240327_141822_comp_both_subj_5-3_t6b6_merged',
       '20240327_141822_comp_both_subj_5-2_t5b5_merged',
       '20240330_153409_comp_both_subj_4-3_t4b4_merged',
       '20240330_153409_comp_bo

## Getting the subject information from the metadata

In [30]:
def split_by_multiple_delimiters(s, delimiters):
    """
    Splits a string by multiple delimiters.

    Parameters:
    - s (str): The string to split.
    - delimiters (list): A list of delimiters to split the string by.

    Returns:
    - list: A list of substrings.
    """
    return re.split('|'.join(map(re.escape, delimiters)), s)


In [31]:
trodes_metadata_df["all_subjects"] = trodes_metadata_df["session_dir"].apply(lambda x: x.split("subj")[-1].strip("_").replace("-", "."))#.split("t")[0].strip("_").replace("_",".").split(".and."))
trodes_metadata_df["all_subjects"] = trodes_metadata_df["all_subjects"].apply(lambda x: sorted(extract_floats(x)))

In [32]:
trodes_metadata_df["session_dir"].iloc[0]

'20240330_141834_comp_both_subj_3-3_and_3-4'

In [33]:
trodes_metadata_df["all_subjects"].apply(lambda x: tuple(x)).unique()

array([('3.3', '3.4'), ('2.0', '4.0', '4.3'), ('4.2', '4.4'),
       ('4.3', '4.4'), ('5.2', '5.3'), ('3.1', '3.3'), ('3.1', '3.4')],
      dtype=object)

In [34]:
trodes_metadata_df["current_subject"] = trodes_metadata_df["recording"].apply(lambda x: x.split("subj")[-1].strip("_").replace("-", ".").replace("_", "."))#.split("t")[0].strip("_").replace("_",".").split(".and."))
trodes_metadata_df["current_subject"] = trodes_metadata_df["current_subject"].apply(lambda x: str(extract_floats(x)[0]).strip())


In [35]:
trodes_metadata_df["current_subject"].unique()

array(['3.4', '3.3', '4.3', '4.2', '4.4', '5.2', '5.3', '3.1'],
      dtype=object)

## Dropping all the rows with unneeded metadata

In [36]:
trodes_metadata_df["metadata_dir"].unique()

array(['DIO', 'time', 'raw', 'video_timestamps'], dtype=object)

In [37]:
METADATA_TO_KEEP = ['raw', 'DIO', 'video_timestamps']

In [38]:
trodes_metadata_df = trodes_metadata_df[trodes_metadata_df["metadata_dir"].isin(METADATA_TO_KEEP)]

In [39]:
trodes_metadata_df = trodes_metadata_df[~trodes_metadata_df["metadata_file"].str.contains("out")]
trodes_metadata_df = trodes_metadata_df[~trodes_metadata_df["metadata_file"].str.contains("coordinates")]


In [40]:
trodes_metadata_df = trodes_metadata_df.reset_index(drop=True)

# Getting the first time stamp of each recording

In [41]:
trodes_raw_df = trodes_metadata_df[(trodes_metadata_df["metadata_dir"] == "raw") & (trodes_metadata_df["metadata_file"] == "timestamps")].copy()


In [42]:
trodes_raw_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,decimation,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
4,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,raw,timestamps,Raw timestamps,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1829348, 1829349, 1829350, 1829351, 1829352, ...",time,"[1829348, 1829349, 1829350, 1829351, 1829352, ...","[3.3, 3.4]",3.4
5,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-3_t5b5_merged,raw,timestamps,Raw timestamps,little endian,20240330_141834_comp_both_subj_3-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1829348, 1829349, 1829350, 1829351, 1829352, ...",time,"[1829348, 1829349, 1829350, 1829351, 1829352, ...","[3.3, 3.4]",3.3
14,20240329_160356_comp_both_subj_4_2_and_4-3,20240329_160356_comp_both_subj_4-3_t3b3_merged,raw,timestamps,Raw timestamps,little endian,20240329_160356_comp_both_subj_4-3_t3b3_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[3116821, 3116822, 3116823, 3116824, 3116825, ...",time,"[3116821, 3116822, 3116823, 3116824, 3116825, ...","[2.0, 4.0, 4.3]",4.3
19,20240329_160356_comp_both_subj_4_2_and_4-3,20240329_160356_comp_both_subj_4-2_t4b4_merged,raw,timestamps,Raw timestamps,little endian,20240329_160356_comp_both_subj_4-2_t4b4_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[3116821, 3116822, 3116823, 3116824, 3116825, ...",time,"[3116821, 3116822, 3116823, 3116824, 3116825, ...","[2.0, 4.0, 4.3]",4.2
20,20240327_165151_comp_both_subj_4-2_and_4-4,20240327_165151_comp_both_subj_4-4_t4b4_merged,raw,timestamps,Raw timestamps,little endian,20240327_165151_comp_both_subj_4-4_t4b4_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[2346278, 2346279, 2346280, 2346281, 2346282, ...",time,"[2346278, 2346279, 2346280, 2346281, 2346282, ...","[4.2, 4.4]",4.4


In [43]:
trodes_raw_df["first_timestamp"] = trodes_raw_df["first_item_data"].apply(lambda x: x[0])

In [44]:
trodes_raw_df["recording"].iloc[0]

'20240330_141834_comp_both_subj_3-4_t6b6_merged'

In [45]:
recording_to_first_timestamp = trodes_raw_df.set_index('session_dir')['first_timestamp'].to_dict()

In [46]:
recording_to_first_timestamp

{'20240330_141834_comp_both_subj_3-3_and_3-4': 1829348,
 '20240329_160356_comp_both_subj_4_2_and_4-3': 3116821,
 '20240327_165151_comp_both_subj_4-2_and_4-4': 2346278,
 '20240331_164629_comp_both_subj_4-3_and_4-4': 1602066,
 '20240331_152930_comp_both_subj_5-2_and_5-3': 1575734,
 '20240327_153306_comp_both_subj_3-1_and_3-3': 1639486,
 '20240329_144111_comp_both_subj_3-1_and_3-4': 1807174,
 '20240327_141822_comp_both_subj_5-2_and_5-3': 1948533,
 '20240330_153409_comp_both_subj_4-3_and_4-4': 2278370}

In [47]:
trodes_metadata_df["first_timestamp"] = trodes_metadata_df["session_dir"].map(recording_to_first_timestamp)

In [48]:
trodes_metadata_df["first_timestamp"]

0      1829348
1      1829348
2      1829348
3      1829348
4      1829348
        ...   
103    1807174
104    1948533
105    1948533
106    2278370
107    2278370
Name: first_timestamp, Length: 108, dtype: int64

# Getting the event timestamps

In [49]:
trodes_metadata_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,decimation,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
0,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1829348, 1906254, 3106667, 3306666, 5106691, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4]",3.4
1,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din4,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,[1829348],state,[0],"[3.3, 3.4]",3.4
2,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1829348, 1906254, 30612400, 30647402, 3167401...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4]",3.4
3,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1829348, 1906254, 1928052, 1929652, 1930052, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4]",3.4
4,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,raw,timestamps,Raw timestamps,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1829348, 1829349, 1829350, 1829351, 1829352, ...",time,"[1829348, 1829349, 1829350, 1829351, 1829352, ...","[3.3, 3.4]",3.4


In [50]:
trodes_metadata_df.tail()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,decimation,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
103,20240329_144111_comp_both_subj_3-1_and_3-4,20240329_144111_comp_both_subj_3-1_and_3-4,video_timestamps,2,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[1808558, 1808558, 1809944, 1811330, 1811330, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[3.1, 3.4]",3.1
104,20240327_141822_comp_both_subj_5-2_and_5-3,20240327_141822_comp_both_subj_5-2_and_5-3,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[1948531, 1949917, 1949917, 1951303, 1952689, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[5.2, 5.3]",5.2
105,20240327_141822_comp_both_subj_5-2_and_5-3,20240327_141822_comp_both_subj_5-2_and_5-3,video_timestamps,2,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[1948531, 1949917, 1951303, 1951303, 1952689, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[5.2, 5.3]",5.2
106,20240330_153409_comp_both_subj_4-3_and_4-4,20240330_153409_comp_both_subj_4-3_and_4-4,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2279754, 2279754, 2281140, 2282526, 2283912, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[4.3, 4.4]",4.3
107,20240330_153409_comp_both_subj_4-3_and_4-4,20240330_153409_comp_both_subj_4-3_and_4-4,video_timestamps,2,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2279754, 2279754, 2281140, 2282526, 2283912, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[4.3, 4.4]",4.3


In [51]:
# trodes_state_df = trodes_metadata_df[trodes_metadata_df["last_dtype_name"] == "state"].copy()

# Filtering for digital IO channels
trodes_state_df = trodes_metadata_df[trodes_metadata_df["metadata_dir"].isin(["DIO"])].copy()
# Filtering for tone and port entry related channels
trodes_state_df = trodes_metadata_df[trodes_metadata_df["id"].isin(["ECU_Din1", "ECU_Din2", "ECU_Din3"])].copy()


In [52]:
trodes_state_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,decimation,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
0,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1829348, 1906254, 3106667, 3306666, 5106691, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4]",3.4
2,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1829348, 1906254, 30612400, 30647402, 3167401...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4]",3.4
3,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1829348, 1906254, 1928052, 1929652, 1930052, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4]",3.4
6,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-3_t5b5_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1829348, 1906254, 3106667, 3306666, 5106691, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4]",3.3
7,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-3_t5b5_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1829348, 1906254, 1928052, 1929652, 1930052, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4]",3.3


In [53]:
trodes_state_df["event_indexes"] = trodes_state_df.apply(lambda x: np.column_stack([np.where(x["last_item_data"] == 1)[0], np.where(x["last_item_data"] == 1)[0]+1]), axis=1)

In [54]:
trodes_state_df["event_indexes"] = trodes_state_df.apply(lambda x: x["event_indexes"][x["event_indexes"][:, 1] <= x["first_item_data"].shape[0] - 1], axis=1)

In [55]:
trodes_state_df["event_timestamps"] = trodes_state_df.apply(lambda x: x["first_item_data"][x["event_indexes"]], axis=1)

## Updating the video timestamps

## Syncing up the video frame data

In [56]:
# Getting the rows that are the metadata for the video timestamps
trodes_video_df = trodes_metadata_df[trodes_metadata_df["metadata_dir"] == "video_timestamps"].copy().reset_index(drop=True)



In [57]:
# Filtering for the first video only
# This only applies to this pilot data where we are only looking the at competition data
# trodes_video_df = trodes_video_df[trodes_video_df["metadata_file"] == "1"].copy()

In [58]:
trodes_video_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,decimation,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
0,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-3_and_3-4,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[1830732, 1830732, 1832118, 1833504, 1834890, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[3.3, 3.4]",3.3
1,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-3_and_3-4,video_timestamps,2,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[1830732, 1830732, 1832118, 1833504, 1833504, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[3.3, 3.4]",3.3
2,20240329_160356_comp_both_subj_4_2_and_4-3,20240329_160356_comp_both_subj_4_2_and_4-3,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[3118205, 3118205, 3119591, 3119591, 3120977, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[2.0, 4.0, 4.3]",4.2
3,20240329_160356_comp_both_subj_4_2_and_4-3,20240329_160356_comp_both_subj_4_2_and_4-3,video_timestamps,2,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[3116819, 3118205, 3119591, 3119591, 3120977, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[2.0, 4.0, 4.3]",4.2
4,20240327_165151_comp_both_subj_4-2_and_4-4,20240327_165151_comp_both_subj_4-2_and_4-4,video_timestamps,2,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2347662, 2347662, 2349047, 2350433, 2350433, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[4.2, 4.4]",4.2


In [59]:
# Making the video timestamps into an evenly distributed array
trodes_video_df["video_timestamps"] = trodes_video_df["first_item_data"]

In [60]:
# Removing the columns that are no longer needed
trodes_video_df = trodes_video_df[["filename", "video_timestamps", "session_dir"]].copy()

In [61]:
# Renaming the filename so that we can merge with other dataframes with the same column name
trodes_video_df = trodes_video_df.rename(columns={"filename": "video_name"})

In [62]:
trodes_video_df.head()

Unnamed: 0,video_name,video_timestamps,session_dir
0,20240330_141834_comp_both_subj_3-3_and_3-4.1.v...,"[1830732, 1830732, 1832118, 1833504, 1834890, ...",20240330_141834_comp_both_subj_3-3_and_3-4
1,20240330_141834_comp_both_subj_3-3_and_3-4.2.v...,"[1830732, 1830732, 1832118, 1833504, 1833504, ...",20240330_141834_comp_both_subj_3-3_and_3-4
2,20240329_160356_comp_both_subj_4-2_and_4-3.1.v...,"[3118205, 3118205, 3119591, 3119591, 3120977, ...",20240329_160356_comp_both_subj_4_2_and_4-3
3,20240329_160356_comp_both_subj_4-2_and_4-3.2.v...,"[3116819, 3118205, 3119591, 3119591, 3120977, ...",20240329_160356_comp_both_subj_4_2_and_4-3
4,20240327_165151_comp_both_subj_4-2_and_4-4.2.v...,"[2347662, 2347662, 2349047, 2350433, 2350433, ...",20240327_165151_comp_both_subj_4-2_and_4-4


- Adding each video as a row to each state row

In [63]:
trodes_state_df = pd.merge(trodes_state_df, trodes_video_df, on=["session_dir"], how="inner")

In [64]:
trodes_state_df.columns

Index(['session_dir', 'recording', 'metadata_dir', 'metadata_file',
       'description', 'byte_order', 'original_file', 'clockrate',
       'trodes_version', 'compile_date', 'compile_time', 'qt_version',
       'commit_tag', 'controller_firmware', 'headstage_firmware',
       'controller_serialnum', 'headstage_serialnum', 'autosettle', 'smartref',
       'gyro', 'accelerometer', 'magnetometer', 'time_offset',
       'system_time_at_creation', 'timestamp_at_creation', 'first_timestamp',
       'direction', 'id', 'display_order', 'fields', 'data', 'filename',
       'decimation', 'clock rate', 'camera_name', 'session_path',
       'first_dtype_name', 'first_item_data', 'last_dtype_name',
       'last_item_data', 'all_subjects', 'current_subject', 'event_indexes',
       'event_timestamps', 'video_name', 'video_timestamps'],
      dtype='object')

## Finding the closest frame to each event

In [65]:
trodes_state_df["event_timestamps"].iloc[1]

array([[ 1829348,  1906254],
       [ 3106667,  3306666],
       [ 5106691,  5306693],
       [ 6306705,  6506708],
       [ 7306717,  7506720],
       [ 8806733,  9006735],
       [10206753, 10406753],
       [11406767, 11606770],
       [13106788, 13306790],
       [14406804, 14606806],
       [15706819, 15906822],
       [16706829, 16906833],
       [17906846, 18106848],
       [18906860, 19106858],
       [20006871, 20206874],
       [22006893, 22206898],
       [23406912, 23606915],
       [24506926, 24706930],
       [26006944, 26206946],
       [27306957, 27506962],
       [28406973, 28606975],
       [30106993, 30306998],
       [36110466, 36310466],
       [37110475, 37310478],
       [38210491, 38410491],
       [39210504, 39410505],
       [40710524, 40910521],
       [42410542, 42610542],
       [44710570, 44910572],
       [45710582, 45910584],
       [46710594, 46910596],
       [48010610, 48210612],
       [49210624, 49410627],
       [51410648, 51610650],
       [527106

In [66]:
trodes_state_df["event_frames"] = trodes_state_df.apply(lambda x: utilities.helper.find_nearest_indices(x["event_timestamps"], x["video_timestamps"]), axis=1)

In [67]:
trodes_state_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject,event_indexes,event_timestamps,video_name,video_timestamps,event_frames
0,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,"[1829348, 1906254, 3106667, 3306666, 5106691, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4]",3.4,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[1829348, 1906254], [3106667, 3306666], [5106...",20240330_141834_comp_both_subj_3-3_and_3-4.1.v...,"[1830732, 1830732, 1832118, 1833504, 1834890, ...","[[0, 76], [1317, 1516], [3313, 3513], [4511, 4..."
1,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,"[1829348, 1906254, 3106667, 3306666, 5106691, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4]",3.4,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[1829348, 1906254], [3106667, 3306666], [5106...",20240330_141834_comp_both_subj_3-3_and_3-4.2.v...,"[1830732, 1830732, 1832118, 1833504, 1833504, ...","[[0, 76], [1322, 1521], [3318, 3518], [4515, 4..."
2,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,"[1829348, 1906254, 30612400, 30647402, 3167401...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4]",3.4,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[1829348, 1906254], [30612400, 30647402], [31...",20240330_141834_comp_both_subj_3-3_and_3-4.1.v...,"[1830732, 1830732, 1832118, 1833504, 1834890, ...","[[0, 76], [28789, 28843], [30373, 30396], [311..."
3,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,"[1829348, 1906254, 30612400, 30647402, 3167401...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4]",3.4,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[1829348, 1906254], [30612400, 30647402], [31...",20240330_141834_comp_both_subj_3-3_and_3-4.2.v...,"[1830732, 1830732, 1832118, 1833504, 1833504, ...","[[0, 76], [28775, 28811], [29835, 29850], [304..."
4,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,"[1829348, 1906254, 1928052, 1929652, 1930052, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4]",3.4,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[1829348, 1906254], [1928052, 1929652], [1930...",20240330_141834_comp_both_subj_3-3_and_3-4.1.v...,"[1830732, 1830732, 1832118, 1833504, 1834890, ...","[[0, 76], [99, 100], [100, 103], [210, 216], [..."


## Combine raw and state dataframes

In [68]:
trodes_state_df

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject,event_indexes,event_timestamps,video_name,video_timestamps,event_frames
0,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,"[1829348, 1906254, 3106667, 3306666, 5106691, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4]",3.4,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[1829348, 1906254], [3106667, 3306666], [5106...",20240330_141834_comp_both_subj_3-3_and_3-4.1.v...,"[1830732, 1830732, 1832118, 1833504, 1834890, ...","[[0, 76], [1317, 1516], [3313, 3513], [4511, 4..."
1,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,"[1829348, 1906254, 3106667, 3306666, 5106691, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4]",3.4,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[1829348, 1906254], [3106667, 3306666], [5106...",20240330_141834_comp_both_subj_3-3_and_3-4.2.v...,"[1830732, 1830732, 1832118, 1833504, 1833504, ...","[[0, 76], [1322, 1521], [3318, 3518], [4515, 4..."
2,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,"[1829348, 1906254, 30612400, 30647402, 3167401...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4]",3.4,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[1829348, 1906254], [30612400, 30647402], [31...",20240330_141834_comp_both_subj_3-3_and_3-4.1.v...,"[1830732, 1830732, 1832118, 1833504, 1834890, ...","[[0, 76], [28789, 28843], [30373, 30396], [311..."
3,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,"[1829348, 1906254, 30612400, 30647402, 3167401...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4]",3.4,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[1829348, 1906254], [30612400, 30647402], [31...",20240330_141834_comp_both_subj_3-3_and_3-4.2.v...,"[1830732, 1830732, 1832118, 1833504, 1833504, ...","[[0, 76], [28775, 28811], [29835, 29850], [304..."
4,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240330_141834_comp_both_subj_3-4_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,"[1829348, 1906254, 1928052, 1929652, 1930052, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4]",3.4,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[1829348, 1906254], [1928052, 1929652], [1930...",20240330_141834_comp_both_subj_3-3_and_3-4.1.v...,"[1830732, 1830732, 1832118, 1833504, 1834890, ...","[[0, 76], [99, 100], [100, 103], [210, 216], [..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
103,20240330_153409_comp_both_subj_4-3_and_4-4,20240330_153409_comp_both_subj_4-4_t3b3_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240330_153409_comp_both_subj_4-4_t3b3_merged...,20000,2.3.4,Nov 28 2022,...,"[2278370, 2823555, 2903756, 2907356, 2908356, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[4.3, 4.4]",4.4,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2278370, 2823555], [2903756, 2907356], [2908...",20240330_153409_comp_both_subj_4-3_and_4-4.2.v...,"[2279754, 2279754, 2281140, 2282526, 2283912, ...","[[0, 544], [624, 627], [629, 673], [756, 783],..."
104,20240330_153409_comp_both_subj_4-3_and_4-4,20240330_153409_comp_both_subj_4-4_t3b3_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240330_153409_comp_both_subj_4-4_t3b3_merged...,20000,2.3.4,Nov 28 2022,...,"[2278370, 2823555, 32644515, 32663718, 3539735...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[4.3, 4.4]",4.4,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2278370, 2823555], [32644515, 32663718], [35...",20240330_153409_comp_both_subj_4-3_and_4-4.1.v...,"[2279754, 2279754, 2281140, 2282526, 2283912, ...","[[0, 544], [30320, 30344], [34391, 34395], [34..."
105,20240330_153409_comp_both_subj_4-3_and_4-4,20240330_153409_comp_both_subj_4-4_t3b3_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240330_153409_comp_both_subj_4-4_t3b3_merged...,20000,2.3.4,Nov 28 2022,...,"[2278370, 2823555, 32644515, 32663718, 3539735...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[4.3, 4.4]",4.4,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2278370, 2823555], [32644515, 32663718], [35...",20240330_153409_comp_both_subj_4-3_and_4-4.2.v...,"[2279754, 2279754, 2281140, 2282526, 2283912, ...","[[0, 544], [0, 0], [0, 0], [0, 0], [0, 0], [0,..."
106,20240330_153409_comp_both_subj_4-3_and_4-4,20240330_153409_comp_both_subj_4-4_t3b3_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240330_153409_comp_both_subj_4-4_t3b3_merged...,20000,2.3.4,Nov 28 2022,...,"[2278370, 2823555, 4023970, 4223972, 6023994, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[4.3, 4.4]",4.4,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2278370, 2823555], [4023970, 4223972], [6023...",20240330_153409_comp_both_subj_4-3_and_4-4.1.v...,"[2279754, 2279754, 2281140, 2282526, 2283912, ...","[[0, 544], [1742, 1941], [3738, 3938], [4936, ..."


In [69]:
trodes_state_df = trodes_state_df[STATE_COLS_TO_KEEP].drop_duplicates(subset=["session_dir", "video_name", "metadata_file"]).sort_values(["session_dir", "video_name", "metadata_file"]).reset_index(drop=True).copy()

In [70]:
trodes_state_df.head()

Unnamed: 0,session_dir,metadata_file,event_timestamps,video_name,video_timestamps,event_frames
0,20240327_141822_comp_both_subj_5-2_and_5-3,dio_ECU_Din1,"[[1948533, 1974154], [3174570, 3374574], [5174...",20240327_141822_comp_both_subj_5-2_and_5-3.1.v...,"[1948531, 1949917, 1949917, 1951303, 1952689, ...","[[1, 26], [1224, 1423], [3220, 3421], [4418, 4..."
1,20240327_141822_comp_both_subj_5-2_and_5-3,dio_ECU_Din2,"[[1948533, 1974154], [31381122, 31400124], [31...",20240327_141822_comp_both_subj_5-2_and_5-3.1.v...,"[1948531, 1949917, 1949917, 1951303, 1952689, ...","[[1, 26], [29421, 29445], [30032, 30049], [301..."
2,20240327_141822_comp_both_subj_5-2_and_5-3,dio_ECU_Din3,"[[1948533, 1974154], [1980154, 1982957], [1983...",20240327_141822_comp_both_subj_5-2_and_5-3.1.v...,"[1948531, 1949917, 1949917, 1951303, 1952689, ...","[[1, 26], [32, 35], [36, 39], [87, 89], [90, 1..."
3,20240327_141822_comp_both_subj_5-2_and_5-3,dio_ECU_Din1,"[[1948533, 1974154], [3174570, 3374574], [5174...",20240327_141822_comp_both_subj_5-2_and_5-3.2.v...,"[1948531, 1949917, 1951303, 1951303, 1952689, ...","[[1, 26], [1224, 1423], [3220, 3420], [4418, 4..."
4,20240327_141822_comp_both_subj_5-2_and_5-3,dio_ECU_Din2,"[[1948533, 1974154], [31381122, 31400124], [31...",20240327_141822_comp_both_subj_5-2_and_5-3.2.v...,"[1948531, 1949917, 1951303, 1951303, 1952689, ...","[[1, 26], [29378, 29397], [29870, 29884], [299..."


In [71]:
trodes_state_df = trodes_state_df.groupby(same_columns).agg({**{col: 'first' for col in trodes_state_df.columns if col not in same_columns + different_columns}, **{col: lambda x: x.tolist() for col in different_columns}}).reset_index()

In [72]:
trodes_state_df.head()

Unnamed: 0,session_dir,video_name,video_timestamps,metadata_file,event_frames,event_timestamps
0,20240327_141822_comp_both_subj_5-2_and_5-3,20240327_141822_comp_both_subj_5-2_and_5-3.1.v...,"[1948531, 1949917, 1949917, 1951303, 1952689, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[1, 26], [1224, 1423], [3220, 3421], [4418, ...","[[[1948533, 1974154], [3174570, 3374574], [517..."
1,20240327_141822_comp_both_subj_5-2_and_5-3,20240327_141822_comp_both_subj_5-2_and_5-3.2.v...,"[1948531, 1949917, 1951303, 1951303, 1952689, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[1, 26], [1224, 1423], [3220, 3420], [4418, ...","[[[1948533, 1974154], [3174570, 3374574], [517..."
2,20240327_153306_comp_both_subj_3-1_and_3-3,20240327_153306_comp_both_subj_3-1_and_3-3.1.v...,"[1639484, 1640870, 1642256, 1642256, 1643642, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[1, 1090], [2288, 2487], [4284, 4483], [5482...","[[[1639486, 2730369], [3930784, 4130787], [593..."
3,20240327_153306_comp_both_subj_3-1_and_3-3,20240327_153306_comp_both_subj_3-1_and_3-3.2.v...,"[1640697, 1640870, 1642256, 1642256, 1643642, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[0, 1323], [2521, 2720], [4517, 4717], [5715...","[[[1639486, 2730369], [3930784, 4130787], [593..."
4,20240327_165151_comp_both_subj_4-2_and_4-4,20240327_165151_comp_both_subj_4-2_and_4-4.1.v...,"[2347662, 2349047, 2350433, 2350433, 2351819, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[0, 73], [1272, 1472], [3269, 3468], [4466, ...","[[[2346278, 2420890], [3621305, 3821308], [562..."


In [73]:
trodes_state_df["tone_timestamps"] = trodes_state_df["event_timestamps"].apply(lambda x: x[0])
trodes_state_df["box_1_port_entry_timestamps"] = trodes_state_df["event_timestamps"].apply(lambda x: x[1])
trodes_state_df["box_2_port_entry_timestamps"] = trodes_state_df["event_timestamps"].apply(lambda x: x[2])

trodes_state_df["tone_frames"] = trodes_state_df["event_frames"].apply(lambda x: x[0])
trodes_state_df["box_1_port_entry_frames"] = trodes_state_df["event_frames"].apply(lambda x: x[1])
trodes_state_df["box_2_port_entry_frames"] = trodes_state_df["event_frames"].apply(lambda x: x[2])


In [74]:
trodes_state_df = trodes_state_df.drop(columns=["event_timestamps", "event_frames", "metadata_file"], errors="ignore")

In [75]:
trodes_state_df.head()

Unnamed: 0,session_dir,video_name,video_timestamps,tone_timestamps,box_1_port_entry_timestamps,box_2_port_entry_timestamps,tone_frames,box_1_port_entry_frames,box_2_port_entry_frames
0,20240327_141822_comp_both_subj_5-2_and_5-3,20240327_141822_comp_both_subj_5-2_and_5-3.1.v...,"[1948531, 1949917, 1949917, 1951303, 1952689, ...","[[1948533, 1974154], [3174570, 3374574], [5174...","[[1948533, 1974154], [31381122, 31400124], [31...","[[1948533, 1974154], [1980154, 1982957], [1983...","[[1, 26], [1224, 1423], [3220, 3421], [4418, 4...","[[1, 26], [29421, 29445], [30032, 30049], [301...","[[1, 26], [32, 35], [36, 39], [87, 89], [90, 1..."
1,20240327_141822_comp_both_subj_5-2_and_5-3,20240327_141822_comp_both_subj_5-2_and_5-3.2.v...,"[1948531, 1949917, 1951303, 1951303, 1952689, ...","[[1948533, 1974154], [3174570, 3374574], [5174...","[[1948533, 1974154], [31381122, 31400124], [31...","[[1948533, 1974154], [1980154, 1982957], [1983...","[[1, 26], [1224, 1423], [3220, 3420], [4418, 4...","[[1, 26], [29378, 29397], [29870, 29884], [299...","[[1, 26], [32, 34], [36, 39], [87, 88], [90, 1..."
2,20240327_153306_comp_both_subj_3-1_and_3-3,20240327_153306_comp_both_subj_3-1_and_3-3.1.v...,"[1639484, 1640870, 1642256, 1642256, 1643642, ...","[[1639486, 2730369], [3930784, 4130787], [5930...","[[1639486, 2730369], [31431726, 31444523], [31...","[[1639486, 2730369], [2859171, 2889573], [2921...","[[1, 1090], [2288, 2487], [4284, 4483], [5482,...","[[1, 1090], [29869, 29885], [29888, 29907], [3...","[[1, 1090], [1218, 1249], [1281, 1291], [1373,..."
3,20240327_153306_comp_both_subj_3-1_and_3-3,20240327_153306_comp_both_subj_3-1_and_3-3.2.v...,"[1640697, 1640870, 1642256, 1642256, 1643642, ...","[[1639486, 2730369], [3930784, 4130787], [5930...","[[1639486, 2730369], [31431726, 31444523], [31...","[[1639486, 2730369], [2859171, 2889573], [2921...","[[0, 1323], [2521, 2720], [4517, 4717], [5715,...","[[0, 1323], [29971, 29983], [29986, 30001], [3...","[[0, 1323], [1452, 1482], [1514, 1523], [1607,..."
4,20240327_165151_comp_both_subj_4-2_and_4-4,20240327_165151_comp_both_subj_4-2_and_4-4.1.v...,"[2347662, 2349047, 2350433, 2350433, 2351819, ...","[[2346278, 2420890], [3621305, 3821308], [5621...","[[2346278, 2420890], [32366261, 32377060], [32...","[[2346278, 2420890], [2421290, 2422488], [2751...","[[0, 73], [1272, 1472], [3269, 3468], [4466, 4...","[[0, 73], [30042, 30056], [30531, 30540], [305...","[[0, 73], [75, 75], [404, 431], [615, 622], [6..."


In [76]:
trodes_raw_df = trodes_raw_df[RAW_COLS_TO_KEEP].reset_index(drop=True).copy()

In [77]:
trodes_raw_df.head()

Unnamed: 0,session_dir,recording,original_file,session_path,current_subject,first_item_data,first_timestamp,all_subjects
0,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-4_t6b6_merged,20240330_141834_comp_both_subj_3-4_t6b6_merged...,/scratch/back_up/reward_competition_extention/...,3.4,"[1829348, 1829349, 1829350, 1829351, 1829352, ...",1829348,"[3.3, 3.4]"
1,20240330_141834_comp_both_subj_3-3_and_3-4,20240330_141834_comp_both_subj_3-3_t5b5_merged,20240330_141834_comp_both_subj_3-3_t5b5_merged...,/scratch/back_up/reward_competition_extention/...,3.3,"[1829348, 1829349, 1829350, 1829351, 1829352, ...",1829348,"[3.3, 3.4]"
2,20240329_160356_comp_both_subj_4_2_and_4-3,20240329_160356_comp_both_subj_4-3_t3b3_merged,20240329_160356_comp_both_subj_4-3_t3b3_merged...,/scratch/back_up/reward_competition_extention/...,4.3,"[3116821, 3116822, 3116823, 3116824, 3116825, ...",3116821,"[2.0, 4.0, 4.3]"
3,20240329_160356_comp_both_subj_4_2_and_4-3,20240329_160356_comp_both_subj_4-2_t4b4_merged,20240329_160356_comp_both_subj_4-2_t4b4_merged...,/scratch/back_up/reward_competition_extention/...,4.2,"[3116821, 3116822, 3116823, 3116824, 3116825, ...",3116821,"[2.0, 4.0, 4.3]"
4,20240327_165151_comp_both_subj_4-2_and_4-4,20240327_165151_comp_both_subj_4-4_t4b4_merged,20240327_165151_comp_both_subj_4-4_t4b4_merged...,/scratch/back_up/reward_competition_extention/...,4.4,"[2346278, 2346279, 2346280, 2346281, 2346282, ...",2346278,"[4.2, 4.4]"


In [78]:
trodes_final_df = pd.merge(trodes_raw_df, trodes_state_df, on=["session_dir"], how="inner")

In [79]:
trodes_final_df.shape

(36, 16)

In [80]:
trodes_final_df = trodes_final_df.rename(columns={"first_item_data": "raw_timestamps"})
trodes_final_df = trodes_final_df.drop(columns=["metadata_file"], errors="ignore")
trodes_final_df = trodes_final_df.sort_values(["session_dir", "recording"]).reset_index(drop=True).copy()

## Making the timestamps 0 indexed

In [81]:
trodes_final_df[[col for col in trodes_final_df.columns if "timestamps" in col]].head()

Unnamed: 0,raw_timestamps,video_timestamps,tone_timestamps,box_1_port_entry_timestamps,box_2_port_entry_timestamps
0,"[1948533, 1948534, 1948535, 1948536, 1948537, ...","[1948531, 1949917, 1949917, 1951303, 1952689, ...","[[1948533, 1974154], [3174570, 3374574], [5174...","[[1948533, 1974154], [31381122, 31400124], [31...","[[1948533, 1974154], [1980154, 1982957], [1983..."
1,"[1948533, 1948534, 1948535, 1948536, 1948537, ...","[1948531, 1949917, 1951303, 1951303, 1952689, ...","[[1948533, 1974154], [3174570, 3374574], [5174...","[[1948533, 1974154], [31381122, 31400124], [31...","[[1948533, 1974154], [1980154, 1982957], [1983..."
2,"[1948533, 1948534, 1948535, 1948536, 1948537, ...","[1948531, 1949917, 1949917, 1951303, 1952689, ...","[[1948533, 1974154], [3174570, 3374574], [5174...","[[1948533, 1974154], [31381122, 31400124], [31...","[[1948533, 1974154], [1980154, 1982957], [1983..."
3,"[1948533, 1948534, 1948535, 1948536, 1948537, ...","[1948531, 1949917, 1951303, 1951303, 1952689, ...","[[1948533, 1974154], [3174570, 3374574], [5174...","[[1948533, 1974154], [31381122, 31400124], [31...","[[1948533, 1974154], [1980154, 1982957], [1983..."
4,"[1639486, 1639487, 1639488, 1639489, 1639490, ...","[1639484, 1640870, 1642256, 1642256, 1643642, ...","[[1639486, 2730369], [3930784, 4130787], [5930...","[[1639486, 2730369], [31431726, 31444523], [31...","[[1639486, 2730369], [2859171, 2889573], [2921..."


In [82]:
trodes_final_df["last_timestamp"] = trodes_final_df["raw_timestamps"].apply(lambda x: x[-1])

- Dropping raw timestamps because of memory issues

In [83]:
trodes_final_df = trodes_final_df.drop(columns=["raw_timestamps", "original_file"], errors="ignore")

In [84]:
copy_trodes_final_df = trodes_final_df.copy

In [85]:
for col in [col for col in trodes_final_df.columns if "timestamps" in col]:
    trodes_final_df[col] = trodes_final_df.apply(lambda x: x[col].astype(np.int32) - np.int32(x["first_timestamp"]), axis=1)

for col in [col for col in trodes_final_df.columns if "frames" in col]:
    trodes_final_df[col] = trodes_final_df[col].apply(lambda x: x.astype(np.int32))

In [86]:
sorted_columns = sorted(trodes_final_df.columns
, key=lambda x: x.split("_")[-1])
trodes_final_df = trodes_final_df[sorted_columns].copy()

## Saving to a file

In [87]:
trodes_final_df.to_pickle(os.path.join(OUTPUT_DIR, "{}_00_trodes_metadata.pkl".format(OUTPUT_PREFIX)))

In [88]:
trodes_final_df.head()

Unnamed: 0,session_dir,tone_frames,box_1_port_entry_frames,box_2_port_entry_frames,video_name,session_path,recording,current_subject,all_subjects,first_timestamp,last_timestamp,video_timestamps,tone_timestamps,box_1_port_entry_timestamps,box_2_port_entry_timestamps
0,20240327_141822_comp_both_subj_5-2_and_5-3,"[[1, 26], [1224, 1423], [3220, 3421], [4418, 4...","[[1, 26], [29421, 29445], [30032, 30049], [301...","[[1, 26], [32, 35], [36, 39], [87, 89], [90, 1...",20240327_141822_comp_both_subj_5-2_and_5-3.1.v...,/scratch/back_up/reward_competition_extention/...,20240327_141822_comp_both_subj_5-2_t5b5_merged,5.2,"[5.2, 5.3]",1948533,66385922,"[-2, 1384, 1384, 2770, 4156, 4156, 5542, 6927,...","[[0, 25621], [1226037, 1426041], [3226062, 342...","[[0, 25621], [29432589, 29451591], [29926394, ...","[[0, 25621], [31621, 34424], [34821, 37424], [..."
1,20240327_141822_comp_both_subj_5-2_and_5-3,"[[1, 26], [1224, 1423], [3220, 3420], [4418, 4...","[[1, 26], [29378, 29397], [29870, 29884], [299...","[[1, 26], [32, 34], [36, 39], [87, 88], [90, 1...",20240327_141822_comp_both_subj_5-2_and_5-3.2.v...,/scratch/back_up/reward_competition_extention/...,20240327_141822_comp_both_subj_5-2_t5b5_merged,5.2,"[5.2, 5.3]",1948533,66385922,"[-2, 1384, 2770, 2770, 4156, 5542, 5542, 6927,...","[[0, 25621], [1226037, 1426041], [3226062, 342...","[[0, 25621], [29432589, 29451591], [29926394, ...","[[0, 25621], [31621, 34424], [34821, 37424], [..."
2,20240327_141822_comp_both_subj_5-2_and_5-3,"[[1, 26], [1224, 1423], [3220, 3421], [4418, 4...","[[1, 26], [29421, 29445], [30032, 30049], [301...","[[1, 26], [32, 35], [36, 39], [87, 89], [90, 1...",20240327_141822_comp_both_subj_5-2_and_5-3.1.v...,/scratch/back_up/reward_competition_extention/...,20240327_141822_comp_both_subj_5-3_t6b6_merged,5.3,"[5.2, 5.3]",1948533,66385922,"[-2, 1384, 1384, 2770, 4156, 4156, 5542, 6927,...","[[0, 25621], [1226037, 1426041], [3226062, 342...","[[0, 25621], [29432589, 29451591], [29926394, ...","[[0, 25621], [31621, 34424], [34821, 37424], [..."
3,20240327_141822_comp_both_subj_5-2_and_5-3,"[[1, 26], [1224, 1423], [3220, 3420], [4418, 4...","[[1, 26], [29378, 29397], [29870, 29884], [299...","[[1, 26], [32, 34], [36, 39], [87, 88], [90, 1...",20240327_141822_comp_both_subj_5-2_and_5-3.2.v...,/scratch/back_up/reward_competition_extention/...,20240327_141822_comp_both_subj_5-3_t6b6_merged,5.3,"[5.2, 5.3]",1948533,66385922,"[-2, 1384, 2770, 2770, 4156, 5542, 5542, 6927,...","[[0, 25621], [1226037, 1426041], [3226062, 342...","[[0, 25621], [29432589, 29451591], [29926394, ...","[[0, 25621], [31621, 34424], [34821, 37424], [..."
4,20240327_153306_comp_both_subj_3-1_and_3-3,"[[1, 1090], [2288, 2487], [4284, 4483], [5482,...","[[1, 1090], [29869, 29885], [29888, 29907], [3...","[[1, 1090], [1218, 1249], [1281, 1291], [1373,...",20240327_153306_comp_both_subj_3-1_and_3-3.1.v...,/scratch/back_up/reward_competition_extention/...,20240327_153306_comp_both_subj_3-1_t1b1_merged,3.1,"[3.1, 3.3]",1639486,66890381,"[-2, 1384, 2770, 2770, 4156, 5542, 5542, 6928,...","[[0, 1090883], [2291298, 2491301], [4291323, 4...","[[0, 1090883], [29792240, 29805037], [29808437...","[[0, 1090883], [1219685, 1250087], [1282283, 1..."


In [89]:
trodes_final_df["session_dir"].unique()

array(['20240327_141822_comp_both_subj_5-2_and_5-3',
       '20240327_153306_comp_both_subj_3-1_and_3-3',
       '20240327_165151_comp_both_subj_4-2_and_4-4',
       '20240329_144111_comp_both_subj_3-1_and_3-4',
       '20240329_160356_comp_both_subj_4_2_and_4-3',
       '20240330_141834_comp_both_subj_3-3_and_3-4',
       '20240330_153409_comp_both_subj_4-3_and_4-4',
       '20240331_152930_comp_both_subj_5-2_and_5-3',
       '20240331_164629_comp_both_subj_4-3_and_4-4'], dtype=object)

In [90]:
trodes_final_df["video_name"].unique()

array(['20240327_141822_comp_both_subj_5-2_and_5-3.1.videoTimeStamps.cameraHWSync',
       '20240327_141822_comp_both_subj_5-2_and_5-3.2.videoTimeStamps.cameraHWSync',
       '20240327_153306_comp_both_subj_3-1_and_3-3.1.videoTimeStamps.cameraHWSync',
       '20240327_153306_comp_both_subj_3-1_and_3-3.2.videoTimeStamps.cameraHWSync',
       '20240327_165151_comp_both_subj_4-2_and_4-4.1.videoTimeStamps.cameraHWSync',
       '20240327_165151_comp_both_subj_4-2_and_4-4.2.videoTimeStamps.cameraHWSync',
       '20240329_144111_comp_both_subj_3-1_and_3-4.1.videoTimeStamps.cameraHWSync',
       '20240329_144111_comp_both_subj_3-1_and_3-4.2.videoTimeStamps.cameraHWSync',
       '20240329_160356_comp_both_subj_4-2_and_4-3.1.videoTimeStamps.cameraHWSync',
       '20240329_160356_comp_both_subj_4-2_and_4-3.2.videoTimeStamps.cameraHWSync',
       '20240330_141834_comp_both_subj_3-3_and_3-4.1.videoTimeStamps.cameraHWSync',
       '20240330_141834_comp_both_subj_3-3_and_3-4.2.videoTimeStamps.cameraH