# Time Stamp Extract

Brief 1-2 sentence description of notebook.

TODO: Supplement the description
- Notebook that extracts the timestamps and gets the time that tones played

In [1]:
# Imports of all used packages and libraries
import sys
import os
import git
import glob
from collections import defaultdict

In [2]:
git_repo = git.Repo(".", search_parent_directories=True)
git_root = git_repo.git.rev_parse("--show-toplevel")

In [3]:
git_root

'/nancy/user/riwata/projects/reward_comp_ext'

In [4]:
sys.path.insert(0, os.path.join(git_root, 'src'))

In [5]:
# Imports of all used packages and libraries
import glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [6]:
import spikeinterface.extractors as se
import spikeinterface.preprocessing as sp

In [7]:
import utilities.helper
import trodes.read_exported

# Functions

In [8]:
import re

def extract_floats(s):
    """
    Extracts all floats from a string and returns them as a list of strings.

    Parameters:
    - s (str): The string to extract floats from.

    Returns:
    - list: A list of strings, each representing a float found in the input string.
    """
    float_pattern = r"[-+]?\d*\.\d+|\d+"
    return [str(float(num)) for num in re.findall(float_pattern, s)]

## Inputs & Data

- Explanation of each input and where it comes from.

Inputs and Required data loading
- input variable names are in all caps snake case
- Whenever an input changes or is used for processing 
- The variables are all lower in snake case

In [9]:
# Path of the directory that contains the Spike Gadgets recording and the exported timestamp files
# Exported with this tool https://docs.spikegadgets.com/en/latest/basic/ExportFunctions.html
# Export these files:
    # -raw – Continuous raw band export.
    # -dio – Digital IO channel state change export.
    # -analogio – Continuous analog IO export.
INPUT_DIR = "/scratch/back_up/reward_competition_extention/data/rce_cohort_3"
OUTPUT_DIR = r"./proc" # where data is saved should always be shown in the inputs
TONE_DIN = "dio_ECU_Din1"
TONE_STATE = 1
os.makedirs(OUTPUT_DIR, exist_ok=True)
OUTPUT_PREFIX = "rce_pilot_3_long_comp"

In [10]:
COLS_TO_KEEP = ['session_dir', 'recording', 'metadata_dir', 'metadata_file',
'original_file', 'filename', 'session_path', 'all_subjects',
       'current_subject', 'event_timestamps', 'video_name',
       'video_timestamps', 'event_frames', 'first_item_data']

In [11]:
RAW_COLS_TO_KEEP = ['session_dir',
 'recording',
 'original_file',
 'session_path',
 'current_subject',
 'first_item_data',
 'first_timestamp',
 'all_subjects']

In [12]:
STATE_COLS_TO_KEEP = ['session_dir',
 'metadata_file',
 'event_timestamps',
 'video_name',
 'video_timestamps',
 'event_frames',]

In [13]:
same_columns = ['session_dir', 'video_name']
different_columns = ['metadata_file', 'event_frames', 'event_timestamps']

In [14]:
# TODO: Find way not to hard code this
# ALL_SESSION_DIR = glob.glob("/scratch/back_up/reward_competition_extention/data/standard/2023_06_*/*.rec")
ALL_SESSION_DIR = glob.glob("/scratch/back_up/reward_competition_extention/data/rce_cohort_3/*long*/*.rec")



In [15]:
ALL_SESSION_DIR

['/scratch/back_up/reward_competition_extention/data/rce_cohort_3/long_comp/20240321_114851_long_comp_subj_5-2_and_5-3.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/long_comp/20240317_172017_long_comp_subj_4-2_and_4-3.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/long_comp/20240317_151922_long_comp_subj_3-1_and_3-3.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/long_comp/20240319_160457_long_comp_subj_4-2_and_4-4.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/long_comp/20240319_134914_long_comp_subj_3-1_and_3-4.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/long_comp/20240318_143819_long_comp_subj_3-3_and_3-4.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/long_comp/20240320_114629_long_comp_subj_5-3_and_5-4.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/long_comp/20240318_170933_long_comp_subj_4-3_and_4-4.rec']

## Outputs

Describe each output that the notebook creates. 

- Is it a plot or is it data?

- How valuable is the output and why is it valuable or useful?

## Other documentation

raw directory
- raw_group0.dat
    - voltage_value: Array with voltage measurement for each channel at each timestamp
- timestamps.dat
    - voltage_time_stamp: The time stamp of each voltage measurement

parent directory
- 1.videoTimeStamps.cameraHWSync
    - frame_number: Calculated by getting the index of each video time stamp tuple 
    - PosTimestamp: The time stamp of each video frame
    - HWframeCount: Unknown value. Starts at 30742 and increases by 1 for each tuple  
    - HWTimestamp: Unknown value. All zeroes
    - video_time: Calculated by dividing the frame number by the fps(frames per second) 
    - video_seconds: video_time, but rounded to seconds  	
    - These are filled in versions of the above collumns with the value from the most recent previous cell
        - filled_PosTimestamp 	
        - filledHWframeCount 	
        - filled_frame_number 	
        - filled_video_time 	
        - filled_video_seconds 	

DIO directory
- dio_ECU_Din1.dat
    - time: The time stamp the corresponds to the DIN input
    - state: Binary state of whether there is input from DIN or not 	
    - trial_number: Calculated by adding 1 to every time there is a DIN input
    - These are filled in versions of the above collumns with the value from the most recent previous cell
        - filled_state 	
        - filled_trial_number

ss_output directory (Spike sorting with Spike interface)
- firings.npz
    - unit_id: All the units that had a spike train for the given timestamp 	
    - number_of_units: Calculated by counting the number of units that had a spike train

## Functions

- function names are short and in snake case all lowercase
- a function name should be unique but does not have to describe the function
- doc strings describe functions not function names

## Processing

Describe what is done to the data here and how inputs are manipulated to generate outputs. 

In [16]:
# As much code and as many cells as required
# includes EDA and playing with data
# GO HAM!

# LOOP 1: Extracting all the Trodes

- Getting all the data from all the exported Trodes files

- Getting all the data from all the exported Trodes files and saving it to `session_to_trodes_data`
    - Creates a dictionary with the structure of:
        - `{dir_name: {file_name: metadata, file_name_2: metadata_2}, dir_name_2: {file_name_3: metadata_3, file_name_4: metadata_4}}`

In [17]:
# Saving the trodes data for each session
# Each key is a session name
# Each value is a dictionary of every recording file in that session
session_to_trodes_data = utilities.helper.create_recursive_dict()


# Saving the path of the session recording
session_to_path = {}

# Going through each session recording
# Which includes all the recordings from all the miniloggers and cameras
for session_path in ALL_SESSION_DIR:   
    try:
        # Getting the name of the session from the path
        session_basename = os.path.splitext(os.path.basename(session_path))[0]
        print("Current Session: {}".format(session_basename))
        # Reading the trodes data for every recording file in the session directory
        session_to_trodes_data[session_basename] = trodes.read_exported.organize_all_trodes_export(session_path)
        
        session_to_path[session_basename] = session_path
    except Exception as e: 
        print(e)


Current Session: 20240321_114851_long_comp_subj_5-2_and_5-3
Skipping file 20240321_114851_long_comp_subj_5-3_t5b5_merged.timestampoffset.txt due to error: Settings format not supported


  return np.dtype(dtype_spec)


Skipping file 20240321_114851_long_comp_subj_5-2_t6b6_merged.timestampoffset.txt due to error: Settings format not supported
Current Session: 20240317_172017_long_comp_subj_4-2_and_4-3
Skipping file logger_raw.dat due to error: 'ascii' codec can't decode byte 0xa0 in position 14: ordinal not in range(128)
Skipping file logger_raw.dat due to error: 'ascii' codec can't decode byte 0xe0 in position 18: ordinal not in range(128)
Skipping file 20240317_172017_long_comp_subj_4-2_t6b6_merged.timestampoffset.txt due to error: Settings format not supported
Skipping file 20240317_172017_long_comp_subj_4-3_t5b5_merged.timestampoffset.txt due to error: Settings format not supported
Current Session: 20240317_151922_long_comp_subj_3-1_and_3-3
Skipping file 20240317_151922_long_comp_subj_3-1_t6b6_merged.timestampoffset.txt due to error: Settings format not supported
Skipping file 20240317_151922_long_comp_subj_3-3_t5b5_merged.timestampoffset.txt due to error: Settings format not supported
Current Ses

In [18]:
session_to_trodes_data

defaultdict(<function utilities.helper.create_recursive_dict()>,
            {'20240321_114851_long_comp_subj_5-2_and_5-3': defaultdict(dict,
                         {'20240321_114851_long_comp_subj_5-3_t5b5_merged': {'timestampoffset': {},
                           'DIO': {'dio_ECU_Din1': {'description': 'State change data for one digital channel. Display_order is 1-based',
                             'byte_order': 'little endian',
                             'original_file': '20240321_114851_long_comp_subj_5-3_t5b5_merged.rec',
                             'clockrate': '20000',
                             'trodes_version': '2.3.4',
                             'compile_date': 'Nov 28 2022',
                             'compile_time': '15:10:45',
                             'qt_version': '6.2.2',
                             'commit_tag': 'heads/Release_2.3.4-0-gd5a58cd9-dirty',
                             'controller_firmware': '3.17',
                             'headstage_

- Adding the video timestamps

In [19]:
for session_path in ALL_SESSION_DIR:   
    try:
        session_basename = os.path.splitext(os.path.basename(session_path))[0]
        print("Current Session: {}".format(session_basename))
        file_to_video_timestamps = {}
        for video_timestamps in glob.glob(os.path.join(session_path, "*cameraHWSync")):
            video_basename = os.path.basename(video_timestamps)
            print("Current Video Name: {}".format(video_basename))
            timestamp_array = trodes.read_exported.read_trodes_extracted_data_file(video_timestamps)
            if "video_timestamps" not in session_to_trodes_data[session_basename][session_basename]:
                session_to_trodes_data[session_basename][session_basename]["video_timestamps"] = defaultdict(dict)
            session_to_trodes_data[session_basename][session_basename]["video_timestamps"][video_basename.split(".")[-3]] = timestamp_array
    
    
    except Exception as e: 
        print(e)

Current Session: 20240321_114851_long_comp_subj_5-2_and_5-3
Current Video Name: 20240321_114851_long_comp_subj_5-2_and_5-3.1.videoTimeStamps.cameraHWSync
Current Session: 20240317_172017_long_comp_subj_4-2_and_4-3
Current Video Name: 20240317_172017_long_comp_subj_4-2_and_4-3.2.videoTimeStamps.cameraHWSync
Current Video Name: 20240317_172017_long_comp_subj_4-2_and_4-3.1.videoTimeStamps.cameraHWSync
Current Session: 20240317_151922_long_comp_subj_3-1_and_3-3
Current Video Name: 20240317_151922_long_comp_subj_3-1_and_3-3.1.videoTimeStamps.cameraHWSync
Current Video Name: 20240317_151922_long_comp_subj_3-1_and_3-3.2.videoTimeStamps.cameraHWSync
Current Session: 20240319_160457_long_comp_subj_4-2_and_4-4
Current Video Name: 20240319_160457_long_comp_subj_4-2_and_4-4.1.videoTimeStamps.cameraHWSync
Current Session: 20240319_134914_long_comp_subj_3-1_and_3-4
Current Video Name: 20240319_134914_long_comp_subj_3-1_and_3-4.1.videoTimeStamps.cameraHWSync
Current Session: 20240318_143819_long_comp

In [20]:
session_to_trodes_data[session_basename][session_basename]["video_timestamps"]

defaultdict(dict,
            {'1': {'clock rate': '20000',
              'camera_name': 'HD USB Camera (\\\\?\\usb#vid_32e4&pid_9230&mi_00#6&bec0719&2&0000#{e5323777-f976-4f5b-9b55-b94699c46e44}\\global)',
              'fields': '<PosTimestamp uint32><HWframeCount uint32><HWTimestamp uint64>',
              'data': array([(  992281, 0, 0), (  993667, 0, 0), (  995053, 0, 0), ...,
                     (59462118, 0, 0), (59463504, 0, 0), (59464890, 0, 0)],
                    dtype=[('PosTimestamp', '<u4'), ('HWframeCount', '<u4'), ('HWTimestamp', '<u8')]),
              'filename': '20240318_170933_long_comp_subj_4-3_and_4-4.1.videoTimeStamps.cameraHWSync'}})

- Creating a dataframe the dictionary with a column for:
  - Session directory
  - Recording name
  - Metadata directory
  - Metadata file
  - And a column for each metadata

In [21]:
# Creating a dataframe from the nested dictionary
trodes_metadata_df = pd.DataFrame.from_dict({(i,j,k,l): session_to_trodes_data[i][j][k][l] 
                           for i in session_to_trodes_data.keys() 
                           for j in session_to_trodes_data[i].keys()
                           for k in session_to_trodes_data[i][j].keys()
                           for l in session_to_trodes_data[i][j][k].keys()},
                           orient='index')

# Resetting the index and renaming the columns
trodes_metadata_df = trodes_metadata_df.reset_index()
trodes_metadata_df = trodes_metadata_df.rename(columns={'level_0': 'session_dir', 'level_1': 'recording', 'level_2': 'metadata_dir', 'level_3': 'metadata_file'}, errors="ignore")

# Adding the session path to the dataframe
trodes_metadata_df["session_path"] = trodes_metadata_df["session_dir"].map(session_to_path)

In [22]:
trodes_metadata_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,direction,id,display_order,fields,data,filename,decimation,clock rate,camera_name,session_path
0,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,input,ECU_Din1,7,<time uint32><state uint8>,"[[2981006, 1], [3086246, 0], [4286663, 1], [44...",20240321_114851_long_comp_subj_5-3_t5b5_merged...,,,,/scratch/back_up/reward_competition_extention/...
1,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Dout4,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,output,ECU_Dout4,5,<time uint32><state uint8>,"[[2981006, 0]]",20240321_114851_long_comp_subj_5-3_t5b5_merged...,,,,/scratch/back_up/reward_competition_extention/...
2,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,input,ECU_Din3,8,<time uint32><state uint8>,"[[2981006, 1]]",20240321_114851_long_comp_subj_5-3_t5b5_merged...,,,,/scratch/back_up/reward_competition_extention/...
3,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Dout1,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,output,ECU_Dout1,2,<time uint32><state uint8>,"[[2981006, 0]]",20240321_114851_long_comp_subj_5-3_t5b5_merged...,,,,/scratch/back_up/reward_competition_extention/...
4,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Dout3,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,output,ECU_Dout3,4,<time uint32><state uint8>,"[[2981006, 0]]",20240321_114851_long_comp_subj_5-3_t5b5_merged...,,,,/scratch/back_up/reward_competition_extention/...


In [23]:
trodes_metadata_df.tail()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,direction,id,display_order,fields,data,filename,decimation,clock rate,camera_name,session_path
181,20240319_160457_long_comp_subj_4-2_and_4-4,20240319_160457_long_comp_subj_4-2_and_4-4,video_timestamps,1,,,,,,,...,,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[2416955, 0, 0], [2418341, 0, 0], [2419727, 0...",20240319_160457_long_comp_subj_4-2_and_4-4.1.v...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...
182,20240319_134914_long_comp_subj_3-1_and_3-4,20240319_134914_long_comp_subj_3-1_and_3-4,video_timestamps,1,,,,,,,...,,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[873096, 0, 0], [873096, 0, 0], [874482, 0, 0...",20240319_134914_long_comp_subj_3-1_and_3-4.1.v...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...
183,20240318_143819_long_comp_subj_3-3_and_3-4,20240318_143819_long_comp_subj_3-3_and_3-4,video_timestamps,1,,,,,,,...,,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[1938830, 0, 0], [1940216, 0, 0], [1940216, 0...",20240318_143819_long_comp_subj_3-3_and_3-4.1.v...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...
184,20240320_114629_long_comp_subj_5-3_and_5-4,20240320_114629_long_comp_subj_5-3_and_5-4,video_timestamps,1,,,,,,,...,,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[1434374, 0, 0], [1434374, 0, 0], [1435759, 0...",20240320_114629_long_comp_subj_5-3_and_5-4.1.v...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...
185,20240318_170933_long_comp_subj_4-3_and_4-4,20240318_170933_long_comp_subj_4-3_and_4-4,video_timestamps,1,,,,,,,...,,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[992281, 0, 0], [993667, 0, 0], [995053, 0, 0...",20240318_170933_long_comp_subj_4-3_and_4-4.1.v...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...


- Getting the first item from each tuple in the arrays in the `data` column
  - This first item is usually just the timestamp

In [24]:
trodes_metadata_df["data"].iloc[0]

array([( 2981006, 1), ( 3086246, 0), ( 4286663, 1), ( 4486666, 0),
       ( 6286688, 1), ( 6486693, 0), ( 7486705, 1), ( 7686705, 0),
       ( 8486713, 1), ( 8686715, 0), ( 9986731, 1), (10186734, 0),
       (11386749, 1), (11586751, 0), (12586768, 1), (12786766, 0),
       (14286784, 1), (14486789, 0), (15586803, 1), (15786803, 0),
       (16886817, 1), (17086821, 0), (17886831, 1), (18086834, 0),
       (19086844, 1), (19286848, 0), (20086858, 1), (20286858, 0),
       (21186869, 1), (21386874, 0), (23186896, 1), (23386896, 0),
       (24586910, 1), (24786913, 0), (25686927, 1), (25886927, 0),
       (27186942, 1), (27386948, 0), (28486961, 1), (28686963, 0),
       (29586977, 1), (29786974, 0), (31286992, 1), (31486995, 0),
       (32287005, 1), (32487010, 0), (33387018, 1), (33587021, 0),
       (34387033, 1), (34587035, 0), (35887049, 1), (36087054, 0),
       (37587070, 1), (37787072, 0), (39887101, 1), (40087100, 0),
       (40887111, 1), (41087115, 0), (41887122, 1), (42087125,

In [25]:
# Getting the dtype name of each column in the numpy array
trodes_metadata_df["first_dtype_name"] = trodes_metadata_df["data"].apply(lambda x: x.dtype.names[0])
# Getting the first item of each column in the numpy array
trodes_metadata_df["first_item_data"] = trodes_metadata_df["data"].apply(lambda x: x[x.dtype.names[0]])


In [26]:
# Same as above but for the last column
trodes_metadata_df["last_dtype_name"] = trodes_metadata_df["data"].apply(lambda x: x.dtype.names[-1])
trodes_metadata_df["last_item_data"] = trodes_metadata_df["data"].apply(lambda x: x[x.dtype.names[-1]])

In [27]:
trodes_metadata_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,data,filename,decimation,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data
0,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,"[[2981006, 1], [3086246, 0], [4286663, 1], [44...",20240321_114851_long_comp_subj_5-3_t5b5_merged...,,,,/scratch/back_up/reward_competition_extention/...,time,"[2981006, 3086246, 4286663, 4486666, 6286688, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ..."
1,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Dout4,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,"[[2981006, 0]]",20240321_114851_long_comp_subj_5-3_t5b5_merged...,,,,/scratch/back_up/reward_competition_extention/...,time,[2981006],state,[0]
2,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,"[[2981006, 1]]",20240321_114851_long_comp_subj_5-3_t5b5_merged...,,,,/scratch/back_up/reward_competition_extention/...,time,[2981006],state,[1]
3,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Dout1,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,"[[2981006, 0]]",20240321_114851_long_comp_subj_5-3_t5b5_merged...,,,,/scratch/back_up/reward_competition_extention/...,time,[2981006],state,[0]
4,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Dout3,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,"[[2981006, 0]]",20240321_114851_long_comp_subj_5-3_t5b5_merged...,,,,/scratch/back_up/reward_competition_extention/...,time,[2981006],state,[0]


In [28]:
trodes_metadata_df.tail()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,data,filename,decimation,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data
181,20240319_160457_long_comp_subj_4-2_and_4-4,20240319_160457_long_comp_subj_4-2_and_4-4,video_timestamps,1,,,,,,,...,"[[2416955, 0, 0], [2418341, 0, 0], [2419727, 0...",20240319_160457_long_comp_subj_4-2_and_4-4.1.v...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2416955, 2418341, 2419727, 2419727, 2421113, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
182,20240319_134914_long_comp_subj_3-1_and_3-4,20240319_134914_long_comp_subj_3-1_and_3-4,video_timestamps,1,,,,,,,...,"[[873096, 0, 0], [873096, 0, 0], [874482, 0, 0...",20240319_134914_long_comp_subj_3-1_and_3-4.1.v...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[873096, 873096, 874482, 875868, 875868, 87725...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
183,20240318_143819_long_comp_subj_3-3_and_3-4,20240318_143819_long_comp_subj_3-3_and_3-4,video_timestamps,1,,,,,,,...,"[[1938830, 0, 0], [1940216, 0, 0], [1940216, 0...",20240318_143819_long_comp_subj_3-3_and_3-4.1.v...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[1938830, 1940216, 1940216, 1941602, 1941602, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
184,20240320_114629_long_comp_subj_5-3_and_5-4,20240320_114629_long_comp_subj_5-3_and_5-4,video_timestamps,1,,,,,,,...,"[[1434374, 0, 0], [1434374, 0, 0], [1435759, 0...",20240320_114629_long_comp_subj_5-3_and_5-4.1.v...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[1434374, 1434374, 1435759, 1437145, 1437715, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
185,20240318_170933_long_comp_subj_4-3_and_4-4,20240318_170933_long_comp_subj_4-3_and_4-4,video_timestamps,1,,,,,,,...,"[[992281, 0, 0], [993667, 0, 0], [995053, 0, 0...",20240318_170933_long_comp_subj_4-3_and_4-4.1.v...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[992281, 993667, 995053, 995053, 996439, 99782...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."


In [29]:
trodes_metadata_df["recording"].unique()

array(['20240321_114851_long_comp_subj_5-3_t5b5_merged',
       '20240321_114851_long_comp_subj_5-2_t6b6_merged',
       '20240317_172017_long_comp_subj_4-3_t5b5_merged',
       '20240317_172017_long_comp_subj_4-2_t6b6_merged',
       '20240317_151922_long_comp_subj_3-1_t6b6_merged',
       '20240317_151922_long_comp_subj_3-3_t5b5_merged',
       '20240319_160457_long_comp_subj_4-4_t6b6_merged',
       '20240319_160457_long_comp_subj_4-2_t5b5_merged',
       '20240319_134914_long_comp_subj_3-4_t6b6_merged',
       '20240319_134914_long_comp_subj_3-1_t5b5_merged',
       '20240318_143819_long_comp_subj_3-3_t6b6_merged',
       '20240318_143819_long_comp_subj_3-4_t5b5_merged',
       '20240320_114629_long_comp_subj_5-4_t5b5_merged',
       '20240320_114629_long_comp_subj_5-3_t6b6_merged',
       '20240318_170933_long_comp_subj_4-4_t5b5_merged',
       '20240318_170933_long_comp_subj_4-3_t6b6_merged',
       '20240321_114851_long_comp_subj_5-2_and_5-3',
       '20240317_172017_long_comp_s

## Getting the subject information from the metadata

In [30]:
def split_by_multiple_delimiters(s, delimiters):
    """
    Splits a string by multiple delimiters.

    Parameters:
    - s (str): The string to split.
    - delimiters (list): A list of delimiters to split the string by.

    Returns:
    - list: A list of substrings.
    """
    return re.split('|'.join(map(re.escape, delimiters)), s)


In [31]:
trodes_metadata_df["all_subjects"] = trodes_metadata_df["session_dir"].apply(lambda x: x.split("subj")[-1].strip("_").replace("-", "."))#.split("t")[0].strip("_").replace("_",".").split(".and."))
trodes_metadata_df["all_subjects"] = trodes_metadata_df["all_subjects"].apply(lambda x: sorted(extract_floats(x)))

In [32]:
trodes_metadata_df["session_dir"].iloc[0]

'20240321_114851_long_comp_subj_5-2_and_5-3'

In [33]:
trodes_metadata_df["all_subjects"].apply(lambda x: tuple(x)).unique()

array([('5.2', '5.3'), ('4.2', '4.3'), ('3.1', '3.3'), ('4.2', '4.4'),
       ('3.1', '3.4'), ('3.3', '3.4'), ('5.3', '5.4'), ('4.3', '4.4')],
      dtype=object)

In [34]:
trodes_metadata_df["current_subject"] = trodes_metadata_df["recording"].apply(lambda x: x.split("subj")[-1].strip("_").replace("-", ".").replace("_", "."))#.split("t")[0].strip("_").replace("_",".").split(".and."))
trodes_metadata_df["current_subject"] = trodes_metadata_df["current_subject"].apply(lambda x: str(extract_floats(x)[0]).strip())


In [35]:
trodes_metadata_df["current_subject"].unique()

array(['5.3', '5.2', '4.3', '4.2', '3.1', '3.3', '4.4', '3.4', '5.4'],
      dtype=object)

## Dropping all the rows with unneeded metadata

In [36]:
trodes_metadata_df["metadata_dir"].unique()

array(['DIO', 'raw', 'time', 'video_timestamps'], dtype=object)

In [37]:
METADATA_TO_KEEP = ['raw', 'DIO', 'video_timestamps']

In [38]:
trodes_metadata_df = trodes_metadata_df[trodes_metadata_df["metadata_dir"].isin(METADATA_TO_KEEP)]

In [39]:
trodes_metadata_df = trodes_metadata_df[~trodes_metadata_df["metadata_file"].str.contains("out")]
trodes_metadata_df = trodes_metadata_df[~trodes_metadata_df["metadata_file"].str.contains("coordinates")]


In [40]:
trodes_metadata_df = trodes_metadata_df.reset_index(drop=True)

# Getting the first time stamp of each recording

In [41]:
trodes_raw_df = trodes_metadata_df[(trodes_metadata_df["metadata_dir"] == "raw") & (trodes_metadata_df["metadata_file"] == "timestamps")].copy()


In [42]:
trodes_raw_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,decimation,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
4,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,raw,timestamps,Raw timestamps,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[2981006, 2981007, 2981008, 2981009, 2981010, ...",time,"[2981006, 2981007, 2981008, 2981009, 2981010, ...","[5.2, 5.3]",5.3
9,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-2_t6b6_merged,raw,timestamps,Raw timestamps,little endian,20240321_114851_long_comp_subj_5-2_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[2981006, 2981007, 2981008, 2981009, 2981010, ...",time,"[2981006, 2981007, 2981008, 2981009, 2981010, ...","[5.2, 5.3]",5.2
14,20240317_172017_long_comp_subj_4-2_and_4-3,20240317_172017_long_comp_subj_4-3_t5b5_merged,raw,timestamps,Raw timestamps,little endian,20240317_172017_long_comp_subj_4-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[2020598, 2020599, 2020600, 2020601, 2020602, ...",time,"[2020598, 2020599, 2020600, 2020601, 2020602, ...","[4.2, 4.3]",4.3
15,20240317_172017_long_comp_subj_4-2_and_4-3,20240317_172017_long_comp_subj_4-2_t6b6_merged,raw,timestamps,Raw timestamps,little endian,20240317_172017_long_comp_subj_4-2_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[2020598, 2020599, 2020600, 2020601, 2020602, ...",time,"[2020598, 2020599, 2020600, 2020601, 2020602, ...","[4.2, 4.3]",4.2
24,20240317_151922_long_comp_subj_3-1_and_3-3,20240317_151922_long_comp_subj_3-1_t6b6_merged,raw,timestamps,Raw timestamps,little endian,20240317_151922_long_comp_subj_3-1_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[2415571, 2415572, 2415573, 2415574, 2415575, ...",time,"[2415571, 2415572, 2415573, 2415574, 2415575, ...","[3.1, 3.3]",3.1


In [43]:
trodes_raw_df["first_timestamp"] = trodes_raw_df["first_item_data"].apply(lambda x: x[0])

In [44]:
trodes_raw_df["recording"].iloc[0]

'20240321_114851_long_comp_subj_5-3_t5b5_merged'

In [45]:
recording_to_first_timestamp = trodes_raw_df.set_index('session_dir')['first_timestamp'].to_dict()

In [46]:
recording_to_first_timestamp

{'20240321_114851_long_comp_subj_5-2_and_5-3': 2981006,
 '20240317_172017_long_comp_subj_4-2_and_4-3': 2020598,
 '20240317_151922_long_comp_subj_3-1_and_3-3': 2415571,
 '20240319_160457_long_comp_subj_4-2_and_4-4': 2416957,
 '20240319_134914_long_comp_subj_3-1_and_3-4': 871712,
 '20240318_143819_long_comp_subj_3-3_and_3-4': 1938832,
 '20240320_114629_long_comp_subj_5-3_and_5-4': 1432990,
 '20240318_170933_long_comp_subj_4-3_and_4-4': 990897}

In [47]:
trodes_metadata_df["first_timestamp"] = trodes_metadata_df["session_dir"].map(recording_to_first_timestamp)

In [48]:
trodes_metadata_df["first_timestamp"]

0     2981006
1     2981006
2     2981006
3     2981006
4     2981006
       ...   
85    2416957
86     871712
87    1938832
88    1432990
89     990897
Name: first_timestamp, Length: 90, dtype: int64

# Getting the event timestamps

In [49]:
trodes_metadata_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,decimation,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
0,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[2981006, 3086246, 4286663, 4486666, 6286688, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[5.2, 5.3]",5.3
1,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,[2981006],state,[1],"[5.2, 5.3]",5.3
2,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[2981006, 3086246, 3129449, 3133649, 3389450, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[5.2, 5.3]",5.3
3,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Din4,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,[2981006],state,[0],"[5.2, 5.3]",5.3
4,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,raw,timestamps,Raw timestamps,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[2981006, 2981007, 2981008, 2981009, 2981010, ...",time,"[2981006, 2981007, 2981008, 2981009, 2981010, ...","[5.2, 5.3]",5.3


In [50]:
trodes_metadata_df.tail()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,decimation,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
85,20240319_160457_long_comp_subj_4-2_and_4-4,20240319_160457_long_comp_subj_4-2_and_4-4,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2416955, 2418341, 2419727, 2419727, 2421113, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[4.2, 4.4]",4.2
86,20240319_134914_long_comp_subj_3-1_and_3-4,20240319_134914_long_comp_subj_3-1_and_3-4,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[873096, 873096, 874482, 875868, 875868, 87725...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[3.1, 3.4]",3.1
87,20240318_143819_long_comp_subj_3-3_and_3-4,20240318_143819_long_comp_subj_3-3_and_3-4,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[1938830, 1940216, 1940216, 1941602, 1941602, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[3.3, 3.4]",3.3
88,20240320_114629_long_comp_subj_5-3_and_5-4,20240320_114629_long_comp_subj_5-3_and_5-4,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[1434374, 1434374, 1435759, 1437145, 1437715, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[5.3, 5.4]",5.3
89,20240318_170933_long_comp_subj_4-3_and_4-4,20240318_170933_long_comp_subj_4-3_and_4-4,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[992281, 993667, 995053, 995053, 996439, 99782...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[4.3, 4.4]",4.3


In [51]:
# trodes_state_df = trodes_metadata_df[trodes_metadata_df["last_dtype_name"] == "state"].copy()

# Filtering for digital IO channels
trodes_state_df = trodes_metadata_df[trodes_metadata_df["metadata_dir"].isin(["DIO"])].copy()
# Filtering for tone and port entry related channels
trodes_state_df = trodes_metadata_df[trodes_metadata_df["id"].isin(["ECU_Din1", "ECU_Din2", "ECU_Din3"])].copy()


In [52]:
trodes_state_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,decimation,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
0,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[2981006, 3086246, 4286663, 4486666, 6286688, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[5.2, 5.3]",5.3
1,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,[2981006],state,[1],"[5.2, 5.3]",5.3
2,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[2981006, 3086246, 3129449, 3133649, 3389450, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[5.2, 5.3]",5.3
5,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-2_t6b6_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-2_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[2981006, 3086246, 4286663, 4486666, 6286688, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[5.2, 5.3]",5.2
7,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-2_t6b6_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-2_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[2981006, 3086246, 3129449, 3133649, 3389450, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[5.2, 5.3]",5.2


In [53]:
trodes_state_df["event_indexes"] = trodes_state_df.apply(lambda x: np.column_stack([np.where(x["last_item_data"] == 1)[0], np.where(x["last_item_data"] == 1)[0]+1]), axis=1)

In [54]:
trodes_state_df["event_indexes"] = trodes_state_df.apply(lambda x: x["event_indexes"][x["event_indexes"][:, 1] <= x["first_item_data"].shape[0] - 1], axis=1)

In [55]:
trodes_state_df["event_timestamps"] = trodes_state_df.apply(lambda x: x["first_item_data"][x["event_indexes"]], axis=1)

## Updating the video timestamps

## Syncing up the video frame data

In [56]:
# Getting the rows that are the metadata for the video timestamps
trodes_video_df = trodes_metadata_df[trodes_metadata_df["metadata_dir"] == "video_timestamps"].copy().reset_index(drop=True)



In [57]:
# Filtering for the first video only
# This only applies to this pilot data where we are only looking the at competition data
# trodes_video_df = trodes_video_df[trodes_video_df["metadata_file"] == "1"].copy()

In [58]:
trodes_video_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,decimation,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
0,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-2_and_5-3,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2982390, 2982390, 2983776, 2985162, 2985162, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[5.2, 5.3]",5.2
1,20240317_172017_long_comp_subj_4-2_and_4-3,20240317_172017_long_comp_subj_4-2_and_4-3,video_timestamps,2,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2020596, 2020596, 2021982, 2021982, 2021982, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[4.2, 4.3]",4.2
2,20240317_172017_long_comp_subj_4-2_and_4-3,20240317_172017_long_comp_subj_4-2_and_4-3,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2020596, 2021982, 2023368, 2023368, 2024754, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[4.2, 4.3]",4.2
3,20240317_151922_long_comp_subj_3-1_and_3-3,20240317_151922_long_comp_subj_3-1_and_3-3,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2415569, 2416955, 2416955, 2418341, 2419727, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[3.1, 3.3]",3.1
4,20240317_151922_long_comp_subj_3-1_and_3-3,20240317_151922_long_comp_subj_3-1_and_3-3,video_timestamps,2,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2415569, 2415569, 2416955, 2416955, 2418341, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[3.1, 3.3]",3.1


In [59]:
# Making the video timestamps into an evenly distributed array
trodes_video_df["video_timestamps"] = trodes_video_df["first_item_data"]

In [60]:
# Removing the columns that are no longer needed
trodes_video_df = trodes_video_df[["filename", "video_timestamps", "session_dir"]].copy()

In [61]:
# Renaming the filename so that we can merge with other dataframes with the same column name
trodes_video_df = trodes_video_df.rename(columns={"filename": "video_name"})

In [62]:
trodes_video_df.head()

Unnamed: 0,video_name,video_timestamps,session_dir
0,20240321_114851_long_comp_subj_5-2_and_5-3.1.v...,"[2982390, 2982390, 2983776, 2985162, 2985162, ...",20240321_114851_long_comp_subj_5-2_and_5-3
1,20240317_172017_long_comp_subj_4-2_and_4-3.2.v...,"[2020596, 2020596, 2021982, 2021982, 2021982, ...",20240317_172017_long_comp_subj_4-2_and_4-3
2,20240317_172017_long_comp_subj_4-2_and_4-3.1.v...,"[2020596, 2021982, 2023368, 2023368, 2024754, ...",20240317_172017_long_comp_subj_4-2_and_4-3
3,20240317_151922_long_comp_subj_3-1_and_3-3.1.v...,"[2415569, 2416955, 2416955, 2418341, 2419727, ...",20240317_151922_long_comp_subj_3-1_and_3-3
4,20240317_151922_long_comp_subj_3-1_and_3-3.2.v...,"[2415569, 2415569, 2416955, 2416955, 2418341, ...",20240317_151922_long_comp_subj_3-1_and_3-3


- Adding each video as a row to each state row

In [63]:
trodes_state_df = pd.merge(trodes_state_df, trodes_video_df, on=["session_dir"], how="inner")

In [64]:
trodes_state_df.columns

Index(['session_dir', 'recording', 'metadata_dir', 'metadata_file',
       'description', 'byte_order', 'original_file', 'clockrate',
       'trodes_version', 'compile_date', 'compile_time', 'qt_version',
       'commit_tag', 'controller_firmware', 'headstage_firmware',
       'controller_serialnum', 'headstage_serialnum', 'autosettle', 'smartref',
       'gyro', 'accelerometer', 'magnetometer', 'time_offset',
       'system_time_at_creation', 'timestamp_at_creation', 'first_timestamp',
       'direction', 'id', 'display_order', 'fields', 'data', 'filename',
       'decimation', 'clock rate', 'camera_name', 'session_path',
       'first_dtype_name', 'first_item_data', 'last_dtype_name',
       'last_item_data', 'all_subjects', 'current_subject', 'event_indexes',
       'event_timestamps', 'video_name', 'video_timestamps'],
      dtype='object')

## Finding the closest frame to each event

In [65]:
trodes_state_df["event_timestamps"].iloc[1]

array([], shape=(0, 2), dtype=uint32)

In [66]:
trodes_state_df["event_frames"] = trodes_state_df.apply(lambda x: utilities.helper.find_nearest_indices(x["event_timestamps"], x["video_timestamps"]), axis=1)

In [67]:
trodes_state_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject,event_indexes,event_timestamps,video_name,video_timestamps,event_frames
0,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,"[2981006, 3086246, 4286663, 4486666, 6286688, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[5.2, 5.3]",5.3,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2981006, 3086246], [4286663, 4486666], [6286...",20240321_114851_long_comp_subj_5-2_and_5-3.1.v...,"[2982390, 2982390, 2983776, 2985162, 2985162, ...","[[0, 104], [1304, 1503], [3300, 3499], [4498, ..."
1,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,[2981006],state,[1],"[5.2, 5.3]",5.3,[],[],20240321_114851_long_comp_subj_5-2_and_5-3.1.v...,"[2982390, 2982390, 2983776, 2985162, 2985162, ...",[]
2,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,"[2981006, 3086246, 3129449, 3133649, 3389450, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[5.2, 5.3]",5.3,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2981006, 3086246], [3129449, 3133649], [3389...",20240321_114851_long_comp_subj_5-2_and_5-3.1.v...,"[2982390, 2982390, 2983776, 2985162, 2985162, ...","[[0, 104], [149, 153], [407, 413], [702, 705],..."
3,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-2_t6b6_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-2_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,"[2981006, 3086246, 4286663, 4486666, 6286688, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[5.2, 5.3]",5.2,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2981006, 3086246], [4286663, 4486666], [6286...",20240321_114851_long_comp_subj_5-2_and_5-3.1.v...,"[2982390, 2982390, 2983776, 2985162, 2985162, ...","[[0, 104], [1304, 1503], [3300, 3499], [4498, ..."
4,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-2_t6b6_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-2_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,"[2981006, 3086246, 3129449, 3133649, 3389450, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[5.2, 5.3]",5.2,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2981006, 3086246], [3129449, 3133649], [3389...",20240321_114851_long_comp_subj_5-2_and_5-3.1.v...,"[2982390, 2982390, 2983776, 2985162, 2985162, ...","[[0, 104], [149, 153], [407, 413], [702, 705],..."


## Combine raw and state dataframes

In [68]:
trodes_state_df

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject,event_indexes,event_timestamps,video_name,video_timestamps,event_frames
0,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,"[2981006, 3086246, 4286663, 4486666, 6286688, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[5.2, 5.3]",5.3,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2981006, 3086246], [4286663, 4486666], [6286...",20240321_114851_long_comp_subj_5-2_and_5-3.1.v...,"[2982390, 2982390, 2983776, 2985162, 2985162, ...","[[0, 104], [1304, 1503], [3300, 3499], [4498, ..."
1,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,[2981006],state,[1],"[5.2, 5.3]",5.3,[],[],20240321_114851_long_comp_subj_5-2_and_5-3.1.v...,"[2982390, 2982390, 2983776, 2985162, 2985162, ...",[]
2,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,"[2981006, 3086246, 3129449, 3133649, 3389450, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[5.2, 5.3]",5.3,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2981006, 3086246], [3129449, 3133649], [3389...",20240321_114851_long_comp_subj_5-2_and_5-3.1.v...,"[2982390, 2982390, 2983776, 2985162, 2985162, ...","[[0, 104], [149, 153], [407, 413], [702, 705],..."
3,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-2_t6b6_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-2_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,"[2981006, 3086246, 4286663, 4486666, 6286688, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[5.2, 5.3]",5.2,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2981006, 3086246], [4286663, 4486666], [6286...",20240321_114851_long_comp_subj_5-2_and_5-3.1.v...,"[2982390, 2982390, 2983776, 2985162, 2985162, ...","[[0, 104], [1304, 1503], [3300, 3499], [4498, ..."
4,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-2_t6b6_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-2_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,"[2981006, 3086246, 3129449, 3133649, 3389450, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[5.2, 5.3]",5.2,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2981006, 3086246], [3129449, 3133649], [3389...",20240321_114851_long_comp_subj_5-2_and_5-3.1.v...,"[2982390, 2982390, 2983776, 2985162, 2985162, ...","[[0, 104], [149, 153], [407, 413], [702, 705],..."
5,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-2_t6b6_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240321_114851_long_comp_subj_5-2_t6b6_merged...,20000,2.3.4,Nov 28 2022,...,[2981006],state,[1],"[5.2, 5.3]",5.2,[],[],20240321_114851_long_comp_subj_5-2_and_5-3.1.v...,"[2982390, 2982390, 2983776, 2985162, 2985162, ...",[]
6,20240317_172017_long_comp_subj_4-2_and_4-3,20240317_172017_long_comp_subj_4-3_t5b5_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240317_172017_long_comp_subj_4-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,[2020598],state,[1],"[4.2, 4.3]",4.3,[],[],20240317_172017_long_comp_subj_4-2_and_4-3.2.v...,"[2020596, 2020596, 2021982, 2021982, 2021982, ...",[]
7,20240317_172017_long_comp_subj_4-2_and_4-3,20240317_172017_long_comp_subj_4-3_t5b5_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240317_172017_long_comp_subj_4-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,[2020598],state,[1],"[4.2, 4.3]",4.3,[],[],20240317_172017_long_comp_subj_4-2_and_4-3.1.v...,"[2020596, 2021982, 2023368, 2023368, 2024754, ...",[]
8,20240317_172017_long_comp_subj_4-2_and_4-3,20240317_172017_long_comp_subj_4-3_t5b5_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240317_172017_long_comp_subj_4-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,"[2020598, 2122818, 2135215, 2173816, 2211619, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[4.2, 4.3]",4.3,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2020598, 2122818], [2135215, 2173816], [2211...",20240317_172017_long_comp_subj_4-2_and_4-3.2.v...,"[2020596, 2020596, 2021982, 2021982, 2021982, ...","[[2, 153], [172, 230], [286, 366], [391, 393],..."
9,20240317_172017_long_comp_subj_4-2_and_4-3,20240317_172017_long_comp_subj_4-3_t5b5_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240317_172017_long_comp_subj_4-3_t5b5_merged...,20000,2.3.4,Nov 28 2022,...,"[2020598, 2122818, 2135215, 2173816, 2211619, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[4.2, 4.3]",4.3,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2020598, 2122818], [2135215, 2173816], [2211...",20240317_172017_long_comp_subj_4-2_and_4-3.1.v...,"[2020596, 2021982, 2023368, 2023368, 2024754, ...","[[1, 102], [115, 153], [191, 245], [261, 263],..."


In [69]:
trodes_state_df = trodes_state_df[STATE_COLS_TO_KEEP].drop_duplicates(subset=["session_dir", "video_name", "metadata_file"]).sort_values(["session_dir", "video_name", "metadata_file"]).reset_index(drop=True).copy()

In [70]:
trodes_state_df.head()

Unnamed: 0,session_dir,metadata_file,event_timestamps,video_name,video_timestamps,event_frames
0,20240317_151922_long_comp_subj_3-1_and_3-3,dio_ECU_Din1,"[[2415571, 2503111], [3703526, 3903529], [5703...",20240317_151922_long_comp_subj_3-1_and_3-3.1.v...,"[2415569, 2416955, 2416955, 2418341, 2419727, ...","[[1, 88], [1286, 1486], [3282, 3482], [4480, 4..."
1,20240317_151922_long_comp_subj_3-1_and_3-3,dio_ECU_Din2,"[[2415571, 2503111], [2674713, 2693311], [2725...",20240317_151922_long_comp_subj_3-1_and_3-3.1.v...,"[2415569, 2416955, 2416955, 2418341, 2419727, ...","[[1, 88], [259, 278], [310, 365], [366, 382], ..."
2,20240317_151922_long_comp_subj_3-1_and_3-3,dio_ECU_Din3,[],20240317_151922_long_comp_subj_3-1_and_3-3.1.v...,"[2415569, 2416955, 2416955, 2418341, 2419727, ...",[]
3,20240317_151922_long_comp_subj_3-1_and_3-3,dio_ECU_Din1,"[[2415571, 2503111], [3703526, 3903529], [5703...",20240317_151922_long_comp_subj_3-1_and_3-3.2.v...,"[2415569, 2415569, 2416955, 2416955, 2418341, ...","[[2, 132], [1922, 2220], [4906, 5203], [6696, ..."
4,20240317_151922_long_comp_subj_3-1_and_3-3,dio_ECU_Din2,"[[2415571, 2503111], [2674713, 2693311], [2725...",20240317_151922_long_comp_subj_3-1_and_3-3.2.v...,"[2415569, 2415569, 2416955, 2416955, 2418341, ...","[[2, 132], [387, 415], [463, 546], [548, 571],..."


In [71]:
trodes_state_df = trodes_state_df.groupby(same_columns).agg({**{col: 'first' for col in trodes_state_df.columns if col not in same_columns + different_columns}, **{col: lambda x: x.tolist() for col in different_columns}}).reset_index()

In [72]:
trodes_state_df.head()

Unnamed: 0,session_dir,video_name,video_timestamps,metadata_file,event_frames,event_timestamps
0,20240317_151922_long_comp_subj_3-1_and_3-3,20240317_151922_long_comp_subj_3-1_and_3-3.1.v...,"[2415569, 2416955, 2416955, 2418341, 2419727, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[1, 88], [1286, 1486], [3282, 3482], [4480, ...","[[[2415571, 2503111], [3703526, 3903529], [570..."
1,20240317_151922_long_comp_subj_3-1_and_3-3,20240317_151922_long_comp_subj_3-1_and_3-3.2.v...,"[2415569, 2415569, 2416955, 2416955, 2418341, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[2, 132], [1922, 2220], [4906, 5203], [6696,...","[[[2415571, 2503111], [3703526, 3903529], [570..."
2,20240317_172017_long_comp_subj_4-2_and_4-3,20240317_172017_long_comp_subj_4-2_and_4-3.1.v...,"[2020596, 2021982, 2023368, 2023368, 2024754, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[1, 102], [1300, 1501], [3297, 3497], [4494,...","[[[2020598, 2122818], [3323233, 3523234], [532..."
3,20240317_172017_long_comp_subj_4-2_and_4-3,20240317_172017_long_comp_subj_4-2_and_4-3.2.v...,"[2020596, 2020596, 2021982, 2021982, 2021982, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[2, 153], [1943, 2243], [4929, 5227], [6717,...","[[[2020598, 2122818], [3323233, 3523234], [532..."
4,20240318_143819_long_comp_subj_3-3_and_3-4,20240318_143819_long_comp_subj_3-3_and_3-4.1.v...,"[1938830, 1940216, 1940216, 1941602, 1941602, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[1, 125], [1334, 1533], [3330, 3529], [4528,...","[[[1938832, 2038912], [3239330, 3439330], [523..."


In [73]:
trodes_state_df["tone_timestamps"] = trodes_state_df["event_timestamps"].apply(lambda x: x[0])
trodes_state_df["box_1_port_entry_timestamps"] = trodes_state_df["event_timestamps"].apply(lambda x: x[1])
trodes_state_df["box_2_port_entry_timestamps"] = trodes_state_df["event_timestamps"].apply(lambda x: x[2])

trodes_state_df["tone_frames"] = trodes_state_df["event_frames"].apply(lambda x: x[0])
trodes_state_df["box_1_port_entry_frames"] = trodes_state_df["event_frames"].apply(lambda x: x[1])
trodes_state_df["box_2_port_entry_frames"] = trodes_state_df["event_frames"].apply(lambda x: x[2])


In [74]:
trodes_state_df = trodes_state_df.drop(columns=["event_timestamps", "event_frames", "metadata_file"], errors="ignore")

In [75]:
trodes_state_df.head()

Unnamed: 0,session_dir,video_name,video_timestamps,tone_timestamps,box_1_port_entry_timestamps,box_2_port_entry_timestamps,tone_frames,box_1_port_entry_frames,box_2_port_entry_frames
0,20240317_151922_long_comp_subj_3-1_and_3-3,20240317_151922_long_comp_subj_3-1_and_3-3.1.v...,"[2415569, 2416955, 2416955, 2418341, 2419727, ...","[[2415571, 2503111], [3703526, 3903529], [5703...","[[2415571, 2503111], [2674713, 2693311], [2725...",[],"[[1, 88], [1286, 1486], [3282, 3482], [4480, 4...","[[1, 88], [259, 278], [310, 365], [366, 382], ...",[]
1,20240317_151922_long_comp_subj_3-1_and_3-3,20240317_151922_long_comp_subj_3-1_and_3-3.2.v...,"[2415569, 2415569, 2416955, 2416955, 2418341, ...","[[2415571, 2503111], [3703526, 3903529], [5703...","[[2415571, 2503111], [2674713, 2693311], [2725...",[],"[[2, 132], [1922, 2220], [4906, 5203], [6696, ...","[[2, 132], [387, 415], [463, 546], [548, 571],...",[]
2,20240317_172017_long_comp_subj_4-2_and_4-3,20240317_172017_long_comp_subj_4-2_and_4-3.1.v...,"[2020596, 2021982, 2023368, 2023368, 2024754, ...","[[2020598, 2122818], [3323233, 3523234], [5323...","[[2020598, 2122818], [2135215, 2173816], [2211...",[],"[[1, 102], [1300, 1501], [3297, 3497], [4494, ...","[[1, 102], [115, 153], [191, 245], [261, 263],...",[]
3,20240317_172017_long_comp_subj_4-2_and_4-3,20240317_172017_long_comp_subj_4-2_and_4-3.2.v...,"[2020596, 2020596, 2021982, 2021982, 2021982, ...","[[2020598, 2122818], [3323233, 3523234], [5323...","[[2020598, 2122818], [2135215, 2173816], [2211...",[],"[[2, 153], [1943, 2243], [4929, 5227], [6717, ...","[[2, 153], [172, 230], [286, 366], [391, 393],...",[]
4,20240318_143819_long_comp_subj_3-3_and_3-4,20240318_143819_long_comp_subj_3-3_and_3-4.1.v...,"[1938830, 1940216, 1940216, 1941602, 1941602, ...","[[1938832, 2038912], [3239330, 3439330], [5239...","[[1938832, 2038912], [2039312, 2085313], [2272...",[],"[[1, 125], [1334, 1533], [3330, 3529], [4528, ...","[[1, 125], [125, 181], [369, 401], [403, 446],...",[]


In [76]:
trodes_raw_df = trodes_raw_df[RAW_COLS_TO_KEEP].reset_index(drop=True).copy()

In [77]:
trodes_raw_df.head()

Unnamed: 0,session_dir,recording,original_file,session_path,current_subject,first_item_data,first_timestamp,all_subjects
0,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-3_t5b5_merged,20240321_114851_long_comp_subj_5-3_t5b5_merged...,/scratch/back_up/reward_competition_extention/...,5.3,"[2981006, 2981007, 2981008, 2981009, 2981010, ...",2981006,"[5.2, 5.3]"
1,20240321_114851_long_comp_subj_5-2_and_5-3,20240321_114851_long_comp_subj_5-2_t6b6_merged,20240321_114851_long_comp_subj_5-2_t6b6_merged...,/scratch/back_up/reward_competition_extention/...,5.2,"[2981006, 2981007, 2981008, 2981009, 2981010, ...",2981006,"[5.2, 5.3]"
2,20240317_172017_long_comp_subj_4-2_and_4-3,20240317_172017_long_comp_subj_4-3_t5b5_merged,20240317_172017_long_comp_subj_4-3_t5b5_merged...,/scratch/back_up/reward_competition_extention/...,4.3,"[2020598, 2020599, 2020600, 2020601, 2020602, ...",2020598,"[4.2, 4.3]"
3,20240317_172017_long_comp_subj_4-2_and_4-3,20240317_172017_long_comp_subj_4-2_t6b6_merged,20240317_172017_long_comp_subj_4-2_t6b6_merged...,/scratch/back_up/reward_competition_extention/...,4.2,"[2020598, 2020599, 2020600, 2020601, 2020602, ...",2020598,"[4.2, 4.3]"
4,20240317_151922_long_comp_subj_3-1_and_3-3,20240317_151922_long_comp_subj_3-1_t6b6_merged,20240317_151922_long_comp_subj_3-1_t6b6_merged...,/scratch/back_up/reward_competition_extention/...,3.1,"[2415571, 2415572, 2415573, 2415574, 2415575, ...",2415571,"[3.1, 3.3]"


In [78]:
trodes_final_df = pd.merge(trodes_raw_df, trodes_state_df, on=["session_dir"], how="inner")

In [79]:
trodes_final_df.shape

(20, 16)

In [80]:
trodes_final_df = trodes_final_df.rename(columns={"first_item_data": "raw_timestamps"})
trodes_final_df = trodes_final_df.drop(columns=["metadata_file"], errors="ignore")
trodes_final_df = trodes_final_df.sort_values(["session_dir", "recording"]).reset_index(drop=True).copy()

## Making the timestamps 0 indexed

In [81]:
trodes_final_df[[col for col in trodes_final_df.columns if "timestamps" in col]].head()

Unnamed: 0,raw_timestamps,video_timestamps,tone_timestamps,box_1_port_entry_timestamps,box_2_port_entry_timestamps
0,"[2415571, 2415572, 2415573, 2415574, 2415575, ...","[2415569, 2416955, 2416955, 2418341, 2419727, ...","[[2415571, 2503111], [3703526, 3903529], [5703...","[[2415571, 2503111], [2674713, 2693311], [2725...",[]
1,"[2415571, 2415572, 2415573, 2415574, 2415575, ...","[2415569, 2415569, 2416955, 2416955, 2418341, ...","[[2415571, 2503111], [3703526, 3903529], [5703...","[[2415571, 2503111], [2674713, 2693311], [2725...",[]
2,"[2415571, 2415572, 2415573, 2415574, 2415575, ...","[2415569, 2416955, 2416955, 2418341, 2419727, ...","[[2415571, 2503111], [3703526, 3903529], [5703...","[[2415571, 2503111], [2674713, 2693311], [2725...",[]
3,"[2415571, 2415572, 2415573, 2415574, 2415575, ...","[2415569, 2415569, 2416955, 2416955, 2418341, ...","[[2415571, 2503111], [3703526, 3903529], [5703...","[[2415571, 2503111], [2674713, 2693311], [2725...",[]
4,"[2020598, 2020599, 2020600, 2020601, 2020602, ...","[2020596, 2021982, 2023368, 2023368, 2024754, ...","[[2020598, 2122818], [3323233, 3523234], [5323...","[[2020598, 2122818], [2135215, 2173816], [2211...",[]


In [82]:
trodes_final_df["last_timestamp"] = trodes_final_df["raw_timestamps"].apply(lambda x: x[-1])

- Dropping raw timestamps because of memory issues

In [83]:
trodes_final_df = trodes_final_df.drop(columns=["raw_timestamps", "original_file"], errors="ignore")

In [84]:
copy_trodes_final_df = trodes_final_df.copy

In [85]:
for col in [col for col in trodes_final_df.columns if "timestamps" in col]:
    trodes_final_df[col] = trodes_final_df.apply(lambda x: x[col].astype(np.int32) - np.int32(x["first_timestamp"]), axis=1)

for col in [col for col in trodes_final_df.columns if "frames" in col]:
    trodes_final_df[col] = trodes_final_df[col].apply(lambda x: x.astype(np.int32))

In [86]:
sorted_columns = sorted(trodes_final_df.columns
, key=lambda x: x.split("_")[-1])
trodes_final_df = trodes_final_df[sorted_columns].copy()

## Saving to a file

In [87]:
trodes_final_df.to_pickle(os.path.join(OUTPUT_DIR, "{}_00_trodes_metadata.pkl".format(OUTPUT_PREFIX)))

In [88]:
trodes_final_df.head()

Unnamed: 0,session_dir,tone_frames,box_1_port_entry_frames,box_2_port_entry_frames,video_name,session_path,recording,current_subject,all_subjects,first_timestamp,last_timestamp,video_timestamps,tone_timestamps,box_1_port_entry_timestamps,box_2_port_entry_timestamps
0,20240317_151922_long_comp_subj_3-1_and_3-3,"[[1, 88], [1286, 1486], [3282, 3482], [4480, 4...","[[1, 88], [259, 278], [310, 365], [366, 382], ...",[],20240317_151922_long_comp_subj_3-1_and_3-3.1.v...,/scratch/back_up/reward_competition_extention/...,20240317_151922_long_comp_subj_3-1_t6b6_merged,3.1,"[3.1, 3.3]",2415571,48236575,"[-2, 1384, 1384, 2770, 4156, 5542, 5542, 6928,...","[[0, 87540], [1287955, 1487958], [3287980, 348...","[[0, 87540], [259142, 277740], [309743, 364544...",[]
1,20240317_151922_long_comp_subj_3-1_and_3-3,"[[2, 132], [1922, 2220], [4906, 5203], [6696, ...","[[2, 132], [387, 415], [463, 546], [548, 571],...",[],20240317_151922_long_comp_subj_3-1_and_3-3.2.v...,/scratch/back_up/reward_competition_extention/...,20240317_151922_long_comp_subj_3-1_t6b6_merged,3.1,"[3.1, 3.3]",2415571,48236575,"[-2, -2, 1384, 1384, 2770, 2770, 4156, 4156, 5...","[[0, 87540], [1287955, 1487958], [3287980, 348...","[[0, 87540], [259142, 277740], [309743, 364544...",[]
2,20240317_151922_long_comp_subj_3-1_and_3-3,"[[1, 88], [1286, 1486], [3282, 3482], [4480, 4...","[[1, 88], [259, 278], [310, 365], [366, 382], ...",[],20240317_151922_long_comp_subj_3-1_and_3-3.1.v...,/scratch/back_up/reward_competition_extention/...,20240317_151922_long_comp_subj_3-3_t5b5_merged,3.3,"[3.1, 3.3]",2415571,48236575,"[-2, 1384, 1384, 2770, 4156, 5542, 5542, 6928,...","[[0, 87540], [1287955, 1487958], [3287980, 348...","[[0, 87540], [259142, 277740], [309743, 364544...",[]
3,20240317_151922_long_comp_subj_3-1_and_3-3,"[[2, 132], [1922, 2220], [4906, 5203], [6696, ...","[[2, 132], [387, 415], [463, 546], [548, 571],...",[],20240317_151922_long_comp_subj_3-1_and_3-3.2.v...,/scratch/back_up/reward_competition_extention/...,20240317_151922_long_comp_subj_3-3_t5b5_merged,3.3,"[3.1, 3.3]",2415571,48236575,"[-2, -2, 1384, 1384, 2770, 2770, 4156, 4156, 5...","[[0, 87540], [1287955, 1487958], [3287980, 348...","[[0, 87540], [259142, 277740], [309743, 364544...",[]
4,20240317_172017_long_comp_subj_4-2_and_4-3,"[[1, 102], [1300, 1501], [3297, 3497], [4494, ...","[[1, 102], [115, 153], [191, 245], [261, 263],...",[],20240317_172017_long_comp_subj_4-2_and_4-3.1.v...,/scratch/back_up/reward_competition_extention/...,20240317_172017_long_comp_subj_4-2_t6b6_merged,4.2,"[4.2, 4.3]",2020598,61007363,"[-2, 1384, 2770, 2770, 4156, 5542, 5542, 6928,...","[[0, 102220], [1302635, 1502636], [3302656, 35...","[[0, 102220], [114617, 153218], [191021, 24402...",[]


In [89]:
trodes_final_df["session_dir"].unique()

array(['20240317_151922_long_comp_subj_3-1_and_3-3',
       '20240317_172017_long_comp_subj_4-2_and_4-3',
       '20240318_143819_long_comp_subj_3-3_and_3-4',
       '20240318_170933_long_comp_subj_4-3_and_4-4',
       '20240319_134914_long_comp_subj_3-1_and_3-4',
       '20240319_160457_long_comp_subj_4-2_and_4-4',
       '20240320_114629_long_comp_subj_5-3_and_5-4',
       '20240321_114851_long_comp_subj_5-2_and_5-3'], dtype=object)

In [90]:
trodes_final_df["video_name"].unique()

array(['20240317_151922_long_comp_subj_3-1_and_3-3.1.videoTimeStamps.cameraHWSync',
       '20240317_151922_long_comp_subj_3-1_and_3-3.2.videoTimeStamps.cameraHWSync',
       '20240317_172017_long_comp_subj_4-2_and_4-3.1.videoTimeStamps.cameraHWSync',
       '20240317_172017_long_comp_subj_4-2_and_4-3.2.videoTimeStamps.cameraHWSync',
       '20240318_143819_long_comp_subj_3-3_and_3-4.1.videoTimeStamps.cameraHWSync',
       '20240318_170933_long_comp_subj_4-3_and_4-4.1.videoTimeStamps.cameraHWSync',
       '20240319_134914_long_comp_subj_3-1_and_3-4.1.videoTimeStamps.cameraHWSync',
       '20240319_160457_long_comp_subj_4-2_and_4-4.1.videoTimeStamps.cameraHWSync',
       '20240320_114629_long_comp_subj_5-3_and_5-4.1.videoTimeStamps.cameraHWSync',
       '20240321_114851_long_comp_subj_5-2_and_5-3.1.videoTimeStamps.cameraHWSync'],
      dtype=object)