# Time Stamp Extract

Brief 1-2 sentence description of notebook.

TODO: Supplement the description
- Notebook that extracts the timestamps and gets the time that tones played

In [1]:
# Imports of all used packages and libraries
import sys
import os
import git
import glob
from collections import defaultdict

In [2]:
git_repo = git.Repo(".", search_parent_directories=True)
git_root = git_repo.git.rev_parse("--show-toplevel")

In [3]:
git_root

'/nancy/user/riwata/projects/reward_comp_ext'

In [4]:
sys.path.insert(0, os.path.join(git_root, 'src'))

In [5]:
# Imports of all used packages and libraries
import glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [6]:
import spikeinterface.extractors as se
import spikeinterface.preprocessing as sp

In [7]:
import utilities.helper
import trodes.read_exported

# Functions

In [8]:
import re

def extract_floats(s):
    """
    Extracts all floats from a string and returns them as a list of strings.

    Parameters:
    - s (str): The string to extract floats from.

    Returns:
    - list: A list of strings, each representing a float found in the input string.
    """
    float_pattern = r"[-+]?\d*\.\d+|\d+"
    return [str(float(num)) for num in re.findall(float_pattern, s)]

## Inputs & Data

- Explanation of each input and where it comes from.

Inputs and Required data loading
- input variable names are in all caps snake case
- Whenever an input changes or is used for processing 
- The variables are all lower in snake case

In [9]:
# Path of the directory that contains the Spike Gadgets recording and the exported timestamp files
# Exported with this tool https://docs.spikegadgets.com/en/latest/basic/ExportFunctions.html
# Export these files:
    # -raw – Continuous raw band export.
    # -dio – Digital IO channel state change export.
    # -analogio – Continuous analog IO export.
INPUT_DIR = "/scratch/back_up/reward_competition_extention/data/rce_cohort_3"
OUTPUT_DIR = r"./proc" # where data is saved should always be shown in the inputs
TONE_DIN = "dio_ECU_Din1"
TONE_STATE = 1
os.makedirs(OUTPUT_DIR, exist_ok=True)
OUTPUT_PREFIX = "rce_pilot_3_novel"

In [10]:
COLS_TO_KEEP = ['session_dir', 'recording', 'metadata_dir', 'metadata_file',
'original_file', 'filename', 'session_path', 'all_subjects',
       'current_subject', 'event_timestamps', 'video_name',
       'video_timestamps', 'event_frames', 'first_item_data']

In [11]:
RAW_COLS_TO_KEEP = ['session_dir',
 'recording',
 'original_file',
 'session_path',
 'current_subject',
 'first_item_data',
 'first_timestamp',
 'all_subjects']

In [12]:
STATE_COLS_TO_KEEP = ['session_dir',
 'metadata_file',
 'event_timestamps',
 'video_name',
 'video_timestamps',
 'event_frames',]

In [13]:
same_columns = ['session_dir', 'video_name']
different_columns = ['metadata_file', 'event_frames', 'event_timestamps']

In [14]:
# TODO: Find way not to hard code this
# ALL_SESSION_DIR = glob.glob("/scratch/back_up/reward_competition_extention/data/standard/2023_06_*/*.rec")
ALL_SESSION_DIR = glob.glob("/scratch/back_up/reward_competition_extention/data/rce_cohort_3/novel/*.rec")



In [15]:
ALL_SESSION_DIR

['/scratch/back_up/reward_competition_extention/data/rce_cohort_3/novel/20240411_155157_comp_novel_subj_3-1_and_3-4_and_4-2_and_4-4.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/novel/20240412_161135_comp_novel_subj_4-2_and_4-4_and_5-2_and_5-3.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/novel/20240406_150017_comp_novel_subj_4-3_and_4-4_and_5-2_and_5-3.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/novel/20240408_155308_comp_novel_subj_3-1_and_3-4_and_5-2_and_5-3.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/novel/20240401_151442_comp_novel_subj_3-1_and_3-3_and_4-2_and_4-4.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/novel/20240402_145808_comp_novel_subj_3-3_and_3-4_and_5-2_and_5-3.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/novel/20240407_152222_comp_novel_subj_3-1_and_3-3_and_4-2_and_4-3.rec',
 '/scratch/back_up/reward_competit

## Outputs

Describe each output that the notebook creates. 

- Is it a plot or is it data?

- How valuable is the output and why is it valuable or useful?

## Other documentation

raw directory
- raw_group0.dat
    - voltage_value: Array with voltage measurement for each channel at each timestamp
- timestamps.dat
    - voltage_time_stamp: The time stamp of each voltage measurement

parent directory
- 1.videoTimeStamps.cameraHWSync
    - frame_number: Calculated by getting the index of each video time stamp tuple 
    - PosTimestamp: The time stamp of each video frame
    - HWframeCount: Unknown value. Starts at 30742 and increases by 1 for each tuple  
    - HWTimestamp: Unknown value. All zeroes
    - video_time: Calculated by dividing the frame number by the fps(frames per second) 
    - video_seconds: video_time, but rounded to seconds  	
    - These are filled in versions of the above collumns with the value from the most recent previous cell
        - filled_PosTimestamp 	
        - filledHWframeCount 	
        - filled_frame_number 	
        - filled_video_time 	
        - filled_video_seconds 	

DIO directory
- dio_ECU_Din1.dat
    - time: The time stamp the corresponds to the DIN input
    - state: Binary state of whether there is input from DIN or not 	
    - trial_number: Calculated by adding 1 to every time there is a DIN input
    - These are filled in versions of the above collumns with the value from the most recent previous cell
        - filled_state 	
        - filled_trial_number

ss_output directory (Spike sorting with Spike interface)
- firings.npz
    - unit_id: All the units that had a spike train for the given timestamp 	
    - number_of_units: Calculated by counting the number of units that had a spike train

## Functions

- function names are short and in snake case all lowercase
- a function name should be unique but does not have to describe the function
- doc strings describe functions not function names

## Processing

Describe what is done to the data here and how inputs are manipulated to generate outputs. 

In [16]:
# As much code and as many cells as required
# includes EDA and playing with data
# GO HAM!

# LOOP 1: Extracting all the Trodes

- Getting all the data from all the exported Trodes files

- Getting all the data from all the exported Trodes files and saving it to `session_to_trodes_data`
    - Creates a dictionary with the structure of:
        - `{dir_name: {file_name: metadata, file_name_2: metadata_2}, dir_name_2: {file_name_3: metadata_3, file_name_4: metadata_4}}`

In [17]:
# Saving the trodes data for each session
# Each key is a session name
# Each value is a dictionary of every recording file in that session
session_to_trodes_data = utilities.helper.create_recursive_dict()


# Saving the path of the session recording
session_to_path = {}

# Going through each session recording
# Which includes all the recordings from all the miniloggers and cameras
for session_path in ALL_SESSION_DIR:   
    try:
        # Getting the name of the session from the path
        session_basename = os.path.splitext(os.path.basename(session_path))[0]
        print("Current Session: {}".format(session_basename))
        # Reading the trodes data for every recording file in the session directory
        session_to_trodes_data[session_basename] = trodes.read_exported.organize_all_trodes_export(session_path)
        
        session_to_path[session_basename] = session_path
    except Exception as e: 
        print(e)


Current Session: 20240411_155157_comp_novel_subj_3-1_and_3-4_and_4-2_and_4-4


  return np.dtype(dtype_spec)


Skipping file 20240411_155157_comp_novel_subj_3-1_t3b3_merged.timestampoffset.txt due to error: Settings format not supported
Skipping file 20240411_155157_comp_novel_subj_4-2_t6b6_merged.timestampoffset.txt due to error: Settings format not supported
Skipping file 20240411_155157_comp_novel_subj_3-4_t4b4_merged.timestampoffset.txt due to error: Settings format not supported
Current Session: 20240412_161135_comp_novel_subj_4-2_and_4-4_and_5-2_and_5-3
Skipping file 20240412_161135_comp_novel_subj_5-3_t6b6_merged.timestampoffset.txt due to error: Settings format not supported
Skipping file 20240412_161135_comp_novel_subj_4-4_t4b4_merged.timestampoffset.txt due to error: Settings format not supported
Skipping file 20240412_161135_comp_novel_subj_5-2_t5b5_merged.timestampoffset.txt due to error: Settings format not supported
Skipping file 20240412_161135_comp_novel_subj_4-2_t3b3_merged.timestampoffset.txt due to error: Settings format not supported
Current Session: 20240406_150017_comp_nov

In [18]:
session_to_trodes_data

defaultdict(<function utilities.helper.create_recursive_dict()>,
            {'20240411_155157_comp_novel_subj_3-1_and_3-4_and_4-2_and_4-4': defaultdict(dict,
                         {'20240411_155157_comp_novel_subj_3-1_t3b3_merged': {'time': {'timestamps': {'description': 'Timestamps',
                             'byte_order': 'little endian',
                             'original_file': '20240411_155157_comp_novel_subj_3-1_t3b3_merged.rec',
                             'clockrate': '20000',
                             'trodes_version': '2.3.4',
                             'compile_date': 'Nov 28 2022',
                             'compile_time': '15:10:45',
                             'qt_version': '6.2.2',
                             'commit_tag': 'heads/Release_2.3.4-0-gd5a58cd9-dirty',
                             'controller_firmware': '3.17',
                             'headstage_firmware': '2.2',
                             'controller_serialnum': '00104 00176',
   

- Adding the video timestamps

In [19]:
for session_path in ALL_SESSION_DIR:   
    try:
        session_basename = os.path.splitext(os.path.basename(session_path))[0]
        print("Current Session: {}".format(session_basename))
        file_to_video_timestamps = {}
        for video_timestamps in glob.glob(os.path.join(session_path, "*cameraHWSync")):
            video_basename = os.path.basename(video_timestamps)
            print("Current Video Name: {}".format(video_basename))
            timestamp_array = trodes.read_exported.read_trodes_extracted_data_file(video_timestamps)
            if "video_timestamps" not in session_to_trodes_data[session_basename][session_basename]:
                session_to_trodes_data[session_basename][session_basename]["video_timestamps"] = defaultdict(dict)
            session_to_trodes_data[session_basename][session_basename]["video_timestamps"][video_basename.split(".")[-3]] = timestamp_array
    
    
    except Exception as e: 
        print(e)

Current Session: 20240411_155157_comp_novel_subj_3-1_and_3-4_and_4-2_and_4-4
Current Video Name: 20240411_155157_comp_novel_subj_3-1_and_3-4_and_4-2_and_4-4.2.videoTimeStamps.cameraHWSync
Current Video Name: 20240411_155157_comp_novel_subj_3-1_and_3-4_and_4-2_and_4-4.1.videoTimeStamps.cameraHWSync
Current Session: 20240412_161135_comp_novel_subj_4-2_and_4-4_and_5-2_and_5-3
Current Video Name: 20240412_161135_comp_novel_subj_4-2_and_4-4_and_5-2_and_5-3.2.videoTimeStamps.cameraHWSync
Current Video Name: 20240412_161135_comp_novel_subj_4-2_and_4-4_and_5-2_and_5-3.1.videoTimeStamps.cameraHWSync
Current Session: 20240406_150017_comp_novel_subj_4-3_and_4-4_and_5-2_and_5-3
Current Video Name: 20240406_150017_comp_novel_subj_4-3_and_4-4_and_5-2_and_5-3.2.videoTimeStamps.cameraHWSync
Current Video Name: 20240406_150017_comp_novel_subj_4-3_and_4-4_and_5-2_and_5-3.1.videoTimeStamps.cameraHWSync
Current Session: 20240408_155308_comp_novel_subj_3-1_and_3-4_and_5-2_and_5-3
Current Video Name: 202404

In [20]:
session_to_trodes_data[session_basename][session_basename]["video_timestamps"]

defaultdict(dict,
            {'1': {'clock rate': '20000',
              'camera_name': 'HD USB Camera (\\\\?\\usb#vid_32e4&pid_9230&mi_00#6&bec0719&2&0000#{e5323777-f976-4f5b-9b55-b94699c46e44}\\global)',
              'fields': '<PosTimestamp uint32><HWframeCount uint32><HWTimestamp uint64>',
              'data': array([( 2992091, 0, 0), ( 2993477, 0, 0), ( 2994863, 0, 0), ...,
                     (67254863, 0, 0), (67256249, 0, 0), (67256249, 0, 0)],
                    dtype=[('PosTimestamp', '<u4'), ('HWframeCount', '<u4'), ('HWTimestamp', '<u8')]),
              'filename': '20240409_142051_comp_novel_subj_3-3_and_3-4_and_4-3_and_4-4.1.videoTimeStamps.cameraHWSync'},
             '2': {'clock rate': '20000',
              'camera_name': 'HD USB Camera (\\\\?\\usb#vid_32e4&pid_9230&mi_00#6&1ea5b7be&1&0000#{e5323777-f976-4f5b-9b55-b94699c46e44}\\global)',
              'fields': '<PosTimestamp uint32><HWframeCount uint32><HWTimestamp uint64>',
              'data': array([( 2992

- Creating a dataframe the dictionary with a column for:
  - Session directory
  - Recording name
  - Metadata directory
  - Metadata file
  - And a column for each metadata

In [21]:
# Creating a dataframe from the nested dictionary
trodes_metadata_df = pd.DataFrame.from_dict({(i,j,k,l): session_to_trodes_data[i][j][k][l] 
                           for i in session_to_trodes_data.keys() 
                           for j in session_to_trodes_data[i].keys()
                           for k in session_to_trodes_data[i][j].keys()
                           for l in session_to_trodes_data[i][j][k].keys()},
                           orient='index')

# Resetting the index and renaming the columns
trodes_metadata_df = trodes_metadata_df.reset_index()
trodes_metadata_df = trodes_metadata_df.rename(columns={'level_0': 'session_dir', 'level_1': 'recording', 'level_2': 'metadata_dir', 'level_3': 'metadata_file'}, errors="ignore")

# Adding the session path to the dataframe
trodes_metadata_df["session_path"] = trodes_metadata_df["session_dir"].map(session_to_path)

In [22]:
trodes_metadata_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,decimation,fields,data,filename,direction,id,display_order,clock rate,camera_name,session_path
0,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,time,timestamps,Timestamps,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,20.0,<time uint32><systime int64>,"[[5287093, 1712865163592530500], [5287094, 171...",20240411_155157_comp_novel_subj_3-1_t3b3_merge...,,,,,,/scratch/back_up/reward_competition_extention/...
1,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,raw,timestamps,Raw timestamps,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,,<time uint32>,"[[5287093], [5287094], [5287095], [5287096], [...",20240411_155157_comp_novel_subj_3-1_t3b3_merge...,,,,,,/scratch/back_up/reward_competition_extention/...
2,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,raw,coordinates,Pad locations in microns,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,,<ml int32><dv int32><ap int32>,"[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [...",20240411_155157_comp_novel_subj_3-1_t3b3_merge...,,,,,,/scratch/back_up/reward_competition_extention/...
3,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,DIO,dio_ECU_Dout4,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,,<time uint32><state uint8>,"[[5287093, 0]]",20240411_155157_comp_novel_subj_3-1_t3b3_merge...,output,ECU_Dout4,5.0,,,/scratch/back_up/reward_competition_extention/...
4,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,,<time uint32><state uint8>,"[[5287093, 1], [5443830, 0], [6650246, 1], [66...",20240411_155157_comp_novel_subj_3-1_t3b3_merge...,input,ECU_Din2,6.0,,,/scratch/back_up/reward_competition_extention/...


In [23]:
trodes_metadata_df.tail()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,decimation,fields,data,filename,direction,id,display_order,clock rate,camera_name,session_path
433,20240410_152009_comp_novel_subj_4-2_and_4-3_an...,20240410_152009_comp_novel_subj_4-2_and_4-3_an...,video_timestamps,1,,,,,,,...,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[3119591, 0, 0], [3120977, 0, 0], [3122363, 0...",20240410_152009_comp_novel_subj_4-2_and_4-3_an...,,,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...
434,20240405_162313_comp_novel_subj_3-1_and_3-4_an...,20240405_162313_comp_novel_subj_3-1_and_3-4_an...,video_timestamps,2,,,,,,,...,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[5123558, 0, 0], [5124944, 0, 0], [5124944, 0...",20240405_162313_comp_novel_subj_3-1_and_3-4_an...,,,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...
435,20240405_162313_comp_novel_subj_3-1_and_3-4_an...,20240405_162313_comp_novel_subj_3-1_and_3-4_an...,video_timestamps,1,,,,,,,...,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[5123558, 0, 0], [5124944, 0, 0], [5124944, 0...",20240405_162313_comp_novel_subj_3-1_and_3-4_an...,,,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...
436,20240409_142051_comp_novel_subj_3-3_and_3-4_an...,20240409_142051_comp_novel_subj_3-3_and_3-4_an...,video_timestamps,1,,,,,,,...,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[2992091, 0, 0], [2993477, 0, 0], [2994863, 0...",20240409_142051_comp_novel_subj_3-3_and_3-4_an...,,,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...
437,20240409_142051_comp_novel_subj_3-3_and_3-4_an...,20240409_142051_comp_novel_subj_3-3_and_3-4_an...,video_timestamps,2,,,,,,,...,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[2992091, 0, 0], [2993477, 0, 0], [2993477, 0...",20240409_142051_comp_novel_subj_3-3_and_3-4_an...,,,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...


- Getting the first item from each tuple in the arrays in the `data` column
  - This first item is usually just the timestamp

In [24]:
trodes_metadata_df["data"].iloc[0]

array([( 5287093, 1712865163592530500), ( 5287094, 1712865163592532200),
       ( 5287095, 1712865163592533700), ...,
       (69116083, 1712868354902757700), (69116084, 1712868354902759200),
       (69116085, 1712868354902760700)],
      dtype=[('time', '<u4'), ('systime', '<i8')])

In [25]:
# Getting the dtype name of each column in the numpy array
trodes_metadata_df["first_dtype_name"] = trodes_metadata_df["data"].apply(lambda x: x.dtype.names[0])
# Getting the first item of each column in the numpy array
trodes_metadata_df["first_item_data"] = trodes_metadata_df["data"].apply(lambda x: x[x.dtype.names[0]])


In [26]:
# Same as above but for the last column
trodes_metadata_df["last_dtype_name"] = trodes_metadata_df["data"].apply(lambda x: x.dtype.names[-1])
trodes_metadata_df["last_item_data"] = trodes_metadata_df["data"].apply(lambda x: x[x.dtype.names[-1]])

In [27]:
trodes_metadata_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,direction,id,display_order,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data
0,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,time,timestamps,Timestamps,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,,,,,,/scratch/back_up/reward_competition_extention/...,time,"[5287093, 5287094, 5287095, 5287096, 5287097, ...",systime,"[1712865163592530500, 1712865163592532200, 171..."
1,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,raw,timestamps,Raw timestamps,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,,,,,,/scratch/back_up/reward_competition_extention/...,time,"[5287093, 5287094, 5287095, 5287096, 5287097, ...",time,"[5287093, 5287094, 5287095, 5287096, 5287097, ..."
2,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,raw,coordinates,Pad locations in microns,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,,,,,,/scratch/back_up/reward_competition_extention/...,ml,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",ap,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
3,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,DIO,dio_ECU_Dout4,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,output,ECU_Dout4,5.0,,,/scratch/back_up/reward_competition_extention/...,time,[5287093],state,[0]
4,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,input,ECU_Din2,6.0,,,/scratch/back_up/reward_competition_extention/...,time,"[5287093, 5443830, 6650246, 6662443, 6769647, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ..."


In [28]:
trodes_metadata_df.tail()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,direction,id,display_order,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data
433,20240410_152009_comp_novel_subj_4-2_and_4-3_an...,20240410_152009_comp_novel_subj_4-2_and_4-3_an...,video_timestamps,1,,,,,,,...,,,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[3119591, 3120977, 3122363, 3123749, 3123749, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
434,20240405_162313_comp_novel_subj_3-1_and_3-4_an...,20240405_162313_comp_novel_subj_3-1_and_3-4_an...,video_timestamps,2,,,,,,,...,,,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[5123558, 5124944, 5124944, 5126330, 5127716, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
435,20240405_162313_comp_novel_subj_3-1_and_3-4_an...,20240405_162313_comp_novel_subj_3-1_and_3-4_an...,video_timestamps,1,,,,,,,...,,,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[5123558, 5124944, 5124944, 5126330, 5127716, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
436,20240409_142051_comp_novel_subj_3-3_and_3-4_an...,20240409_142051_comp_novel_subj_3-3_and_3-4_an...,video_timestamps,1,,,,,,,...,,,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2992091, 2993477, 2994863, 2994863, 2996249, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
437,20240409_142051_comp_novel_subj_3-3_and_3-4_an...,20240409_142051_comp_novel_subj_3-3_and_3-4_an...,video_timestamps,2,,,,,,,...,,,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2992091, 2993477, 2993477, 2994863, 2996249, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."


In [29]:
trodes_metadata_df["recording"].unique()

array(['20240411_155157_comp_novel_subj_3-1_t3b3_merged',
       '20240411_155157_comp_novel_subj_3-4_t4b4_merged',
       '20240411_155157_comp_novel_subj_4-2_t6b6_merged',
       '20240412_161135_comp_novel_subj_5-3_t6b6_merged',
       '20240412_161135_comp_novel_subj_5-2_t5b5_merged',
       '20240412_161135_comp_novel_subj_4-2_t3b3_merged',
       '20240412_161135_comp_novel_subj_4-4_t4b4_merged',
       '20240406_150017_comp_novel_subj_4-4_t6b6_merged',
       '20240406_150017_comp_novel_subj_5-3_t3b3_merged',
       '20240406_150017_comp_novel_subj_5-2_t4b4_merged',
       '20240406_150017_comp_novel_subj_4-3_t5b5_merged',
       '20240408_155308_comp_novel_subj_3-1_t5b5_merged',
       '20240408_155308_comp_novel_subj_5-2_t4b4_merged',
       '20240408_155308_comp_novel_subj_3-4_t6b6_merged',
       '20240408_155308_comp_novel_subj_5-3_t3b3_merged',
       '20240401_151442_comp_novel_subj_4-4_t4b4_merged',
       '20240401_151442_comp_novel_subj_3-3_t6b6_merged',
       '202404

## Getting the subject information from the metadata

In [30]:
def split_by_multiple_delimiters(s, delimiters):
    """
    Splits a string by multiple delimiters.

    Parameters:
    - s (str): The string to split.
    - delimiters (list): A list of delimiters to split the string by.

    Returns:
    - list: A list of substrings.
    """
    return re.split('|'.join(map(re.escape, delimiters)), s)


In [31]:
trodes_metadata_df["all_subjects"] = trodes_metadata_df["session_dir"].apply(lambda x: x.split("subj")[-1].strip("_").replace("-", "."))#.split("t")[0].strip("_").replace("_",".").split(".and."))
trodes_metadata_df["all_subjects"] = trodes_metadata_df["all_subjects"].apply(lambda x: sorted(extract_floats(x)))

In [32]:
trodes_metadata_df["session_dir"].iloc[0]

'20240411_155157_comp_novel_subj_3-1_and_3-4_and_4-2_and_4-4'

In [33]:
trodes_metadata_df["all_subjects"].apply(lambda x: tuple(x)).unique()

array([('3.1', '3.4', '4.2', '4.4'), ('4.2', '4.4', '5.2', '5.3'),
       ('4.3', '4.4', '5.2', '5.3'), ('3.1', '3.4', '5.2', '5.3'),
       ('3.1', '3.3', '4.2', '4.4'), ('3.3', '3.4', '5.2', '5.3'),
       ('3.1', '3.3', '4.2', '4.3'), ('4.2', '4.3', '5.2', '5.3'),
       ('3.1', '3.4', '4.2', '4.3'), ('3.3', '3.4', '4.3', '4.4')],
      dtype=object)

In [34]:
trodes_metadata_df["current_subject"] = trodes_metadata_df["recording"].apply(lambda x: x.split("subj")[-1].strip("_").replace("-", ".").replace("_", "."))#.split("t")[0].strip("_").replace("_",".").split(".and."))
trodes_metadata_df["current_subject"] = trodes_metadata_df["current_subject"].apply(lambda x: str(extract_floats(x)[0]).strip())


In [35]:
trodes_metadata_df["current_subject"].unique()

array(['3.1', '3.4', '4.2', '5.3', '5.2', '4.4', '4.3', '3.3'],
      dtype=object)

## Dropping all the rows with unneeded metadata

In [36]:
trodes_metadata_df["metadata_dir"].unique()

array(['time', 'raw', 'DIO', 'video_timestamps'], dtype=object)

In [37]:
METADATA_TO_KEEP = ['raw', 'DIO', 'video_timestamps']

In [38]:
trodes_metadata_df = trodes_metadata_df[trodes_metadata_df["metadata_dir"].isin(METADATA_TO_KEEP)]

In [39]:
trodes_metadata_df = trodes_metadata_df[~trodes_metadata_df["metadata_file"].str.contains("out")]
trodes_metadata_df = trodes_metadata_df[~trodes_metadata_df["metadata_file"].str.contains("coordinates")]


In [40]:
trodes_metadata_df = trodes_metadata_df.reset_index(drop=True)

# Getting the first time stamp of each recording

In [41]:
trodes_raw_df = trodes_metadata_df[(trodes_metadata_df["metadata_dir"] == "raw") & (trodes_metadata_df["metadata_file"] == "timestamps")].copy()


In [42]:
trodes_raw_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,display_order,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
0,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,raw,timestamps,Raw timestamps,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[5287093, 5287094, 5287095, 5287096, 5287097, ...",time,"[5287093, 5287094, 5287095, 5287096, 5287097, ...","[3.1, 3.4, 4.2, 4.4]",3.1
9,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-4_t4b4_merged,raw,timestamps,Raw timestamps,little endian,20240411_155157_comp_novel_subj_3-4_t4b4_merge...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[5287093, 5287094, 5287095, 5287096, 5287097, ...",time,"[5287093, 5287094, 5287095, 5287096, 5287097, ...","[3.1, 3.4, 4.2, 4.4]",3.4
14,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_4-2_t6b6_merged,raw,timestamps,Raw timestamps,little endian,20240411_155157_comp_novel_subj_4-2_t6b6_merge...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[5287093, 5287094, 5287095, 5287096, 5287097, ...",time,"[5287093, 5287094, 5287095, 5287096, 5287097, ...","[3.1, 3.4, 4.2, 4.4]",4.2
15,20240412_161135_comp_novel_subj_4-2_and_4-4_an...,20240412_161135_comp_novel_subj_5-3_t6b6_merged,raw,timestamps,Raw timestamps,little endian,20240412_161135_comp_novel_subj_5-3_t6b6_merge...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[4542881, 4542882, 4542883, 4542884, 4542885, ...",time,"[4542881, 4542882, 4542883, 4542884, 4542885, ...","[4.2, 4.4, 5.2, 5.3]",5.3
20,20240412_161135_comp_novel_subj_4-2_and_4-4_an...,20240412_161135_comp_novel_subj_5-2_t5b5_merged,raw,timestamps,Raw timestamps,little endian,20240412_161135_comp_novel_subj_5-2_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[4542881, 4542882, 4542883, 4542884, 4542885, ...",time,"[4542881, 4542882, 4542883, 4542884, 4542885, ...","[4.2, 4.4, 5.2, 5.3]",5.2


In [43]:
trodes_raw_df["first_timestamp"] = trodes_raw_df["first_item_data"].apply(lambda x: x[0])

In [44]:
trodes_raw_df["recording"].iloc[0]

'20240411_155157_comp_novel_subj_3-1_t3b3_merged'

In [45]:
recording_to_first_timestamp = trodes_raw_df.set_index('session_dir')['first_timestamp'].to_dict()

In [46]:
recording_to_first_timestamp

{'20240411_155157_comp_novel_subj_3-1_and_3-4_and_4-2_and_4-4': 5287093,
 '20240412_161135_comp_novel_subj_4-2_and_4-4_and_5-2_and_5-3': 4542881,
 '20240406_150017_comp_novel_subj_4-3_and_4-4_and_5-2_and_5-3': 2684430,
 '20240408_155308_comp_novel_subj_3-1_and_3-4_and_5-2_and_5-3': 3500707,
 '20240401_151442_comp_novel_subj_3-1_and_3-3_and_4-2_and_4-4': 2932501,
 '20240402_145808_comp_novel_subj_3-3_and_3-4_and_5-2_and_5-3': 4558126,
 '20240407_152222_comp_novel_subj_3-1_and_3-3_and_4-2_and_4-3': 4030109,
 '20240410_152009_comp_novel_subj_4-2_and_4-3_and_5-2_and_5-3': 3119593,
 '20240405_162313_comp_novel_subj_3-1_and_3-4_and_4-2_and_4-3': 5123560,
 '20240409_142051_comp_novel_subj_3-3_and_3-4_and_4-3_and_4-4': 2992093}

In [47]:
trodes_metadata_df["first_timestamp"] = trodes_metadata_df["session_dir"].map(recording_to_first_timestamp)

In [48]:
trodes_metadata_df["first_timestamp"]

0      5287093
1      5287093
2      5287093
3      5287093
4      5287093
        ...   
205    3119593
206    5123560
207    5123560
208    2992093
209    2992093
Name: first_timestamp, Length: 210, dtype: int64

# Getting the event timestamps

In [49]:
trodes_metadata_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,display_order,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
0,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,raw,timestamps,Raw timestamps,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[5287093, 5287094, 5287095, 5287096, 5287097, ...",time,"[5287093, 5287094, 5287095, 5287096, 5287097, ...","[3.1, 3.4, 4.2, 4.4]",3.1
1,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,6.0,,,/scratch/back_up/reward_competition_extention/...,time,"[5287093, 5443830, 6650246, 6662443, 6769647, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4, 4.2, 4.4]",3.1
2,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,8.0,,,/scratch/back_up/reward_competition_extention/...,time,"[5287093, 5443830, 5444430, 5458830, 5471830, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4, 4.2, 4.4]",3.1
3,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,7.0,,,/scratch/back_up/reward_competition_extention/...,time,"[5287093, 5443830, 6644245, 6844248, 8644271, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4, 4.2, 4.4]",3.1
4,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,DIO,dio_ECU_Din4,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,9.0,,,/scratch/back_up/reward_competition_extention/...,time,[5287093],state,[0],"[3.1, 3.4, 4.2, 4.4]",3.1


In [50]:
trodes_metadata_df.tail()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,display_order,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
205,20240410_152009_comp_novel_subj_4-2_and_4-3_an...,20240410_152009_comp_novel_subj_4-2_and_4-3_an...,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[3119591, 3120977, 3122363, 3123749, 3123749, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[4.2, 4.3, 5.2, 5.3]",4.2
206,20240405_162313_comp_novel_subj_3-1_and_3-4_an...,20240405_162313_comp_novel_subj_3-1_and_3-4_an...,video_timestamps,2,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[5123558, 5124944, 5124944, 5126330, 5127716, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[3.1, 3.4, 4.2, 4.3]",3.1
207,20240405_162313_comp_novel_subj_3-1_and_3-4_an...,20240405_162313_comp_novel_subj_3-1_and_3-4_an...,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[5123558, 5124944, 5124944, 5126330, 5127716, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[3.1, 3.4, 4.2, 4.3]",3.1
208,20240409_142051_comp_novel_subj_3-3_and_3-4_an...,20240409_142051_comp_novel_subj_3-3_and_3-4_an...,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2992091, 2993477, 2994863, 2994863, 2996249, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[3.3, 3.4, 4.3, 4.4]",3.3
209,20240409_142051_comp_novel_subj_3-3_and_3-4_an...,20240409_142051_comp_novel_subj_3-3_and_3-4_an...,video_timestamps,2,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2992091, 2993477, 2993477, 2994863, 2996249, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[3.3, 3.4, 4.3, 4.4]",3.3


In [51]:
# trodes_state_df = trodes_metadata_df[trodes_metadata_df["last_dtype_name"] == "state"].copy()

# Filtering for digital IO channels
trodes_state_df = trodes_metadata_df[trodes_metadata_df["metadata_dir"].isin(["DIO"])].copy()
# Filtering for tone and port entry related channels
trodes_state_df = trodes_metadata_df[trodes_metadata_df["id"].isin(["ECU_Din1", "ECU_Din2", "ECU_Din3"])].copy()


In [52]:
trodes_state_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,display_order,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
1,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,6,,,/scratch/back_up/reward_competition_extention/...,time,"[5287093, 5443830, 6650246, 6662443, 6769647, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4, 4.2, 4.4]",3.1
2,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,8,,,/scratch/back_up/reward_competition_extention/...,time,"[5287093, 5443830, 5444430, 5458830, 5471830, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4, 4.2, 4.4]",3.1
3,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,7,,,/scratch/back_up/reward_competition_extention/...,time,"[5287093, 5443830, 6644245, 6844248, 8644271, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4, 4.2, 4.4]",3.1
5,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-4_t4b4_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-4_t4b4_merge...,20000,2.3.4,Nov 28 2022,...,8,,,/scratch/back_up/reward_competition_extention/...,time,"[5287093, 5443830, 5444430, 5458830, 5471830, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4, 4.2, 4.4]",3.4
6,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-4_t4b4_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-4_t4b4_merge...,20000,2.3.4,Nov 28 2022,...,6,,,/scratch/back_up/reward_competition_extention/...,time,"[5287093, 5443830, 6650246, 6662443, 6769647, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4, 4.2, 4.4]",3.4


In [53]:
trodes_state_df["event_indexes"] = trodes_state_df.apply(lambda x: np.column_stack([np.where(x["last_item_data"] == 1)[0], np.where(x["last_item_data"] == 1)[0]+1]), axis=1)

In [54]:
trodes_state_df["event_indexes"] = trodes_state_df.apply(lambda x: x["event_indexes"][x["event_indexes"][:, 1] <= x["first_item_data"].shape[0] - 1], axis=1)

In [55]:
trodes_state_df["event_timestamps"] = trodes_state_df.apply(lambda x: x["first_item_data"][x["event_indexes"]], axis=1)

## Updating the video timestamps

## Syncing up the video frame data

In [56]:
# Getting the rows that are the metadata for the video timestamps
trodes_video_df = trodes_metadata_df[trodes_metadata_df["metadata_dir"] == "video_timestamps"].copy().reset_index(drop=True)



In [57]:
# Filtering for the first video only
# This only applies to this pilot data where we are only looking the at competition data
# trodes_video_df = trodes_video_df[trodes_video_df["metadata_file"] == "1"].copy()

In [58]:
trodes_video_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,display_order,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
0,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,video_timestamps,2,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[5287091, 5288477, 5289863, 5291249, 5291249, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[3.1, 3.4, 4.2, 4.4]",3.1
1,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[5287091, 5288477, 5288477, 5289863, 5291249, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[3.1, 3.4, 4.2, 4.4]",3.1
2,20240412_161135_comp_novel_subj_4-2_and_4-4_an...,20240412_161135_comp_novel_subj_4-2_and_4-4_an...,video_timestamps,2,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[4542879, 4544265, 4544265, 4545651, 4547037, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[4.2, 4.4, 5.2, 5.3]",4.2
3,20240412_161135_comp_novel_subj_4-2_and_4-4_an...,20240412_161135_comp_novel_subj_4-2_and_4-4_an...,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[4542879, 4544265, 4545651, 4546683, 4547037, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[4.2, 4.4, 5.2, 5.3]",4.2
4,20240406_150017_comp_novel_subj_4-3_and_4-4_an...,20240406_150017_comp_novel_subj_4-3_and_4-4_an...,video_timestamps,2,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2684428, 2685814, 2685814, 2687200, 2688585, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[4.3, 4.4, 5.2, 5.3]",4.3


In [59]:
# Making the video timestamps into an evenly distributed array
trodes_video_df["video_timestamps"] = trodes_video_df["first_item_data"]

In [60]:
# Removing the columns that are no longer needed
trodes_video_df = trodes_video_df[["filename", "video_timestamps", "session_dir"]].copy()

In [61]:
# Renaming the filename so that we can merge with other dataframes with the same column name
trodes_video_df = trodes_video_df.rename(columns={"filename": "video_name"})

In [62]:
trodes_video_df.head()

Unnamed: 0,video_name,video_timestamps,session_dir
0,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,"[5287091, 5288477, 5289863, 5291249, 5291249, ...",20240411_155157_comp_novel_subj_3-1_and_3-4_an...
1,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,"[5287091, 5288477, 5288477, 5289863, 5291249, ...",20240411_155157_comp_novel_subj_3-1_and_3-4_an...
2,20240412_161135_comp_novel_subj_4-2_and_4-4_an...,"[4542879, 4544265, 4544265, 4545651, 4547037, ...",20240412_161135_comp_novel_subj_4-2_and_4-4_an...
3,20240412_161135_comp_novel_subj_4-2_and_4-4_an...,"[4542879, 4544265, 4545651, 4546683, 4547037, ...",20240412_161135_comp_novel_subj_4-2_and_4-4_an...
4,20240406_150017_comp_novel_subj_4-3_and_4-4_an...,"[2684428, 2685814, 2685814, 2687200, 2688585, ...",20240406_150017_comp_novel_subj_4-3_and_4-4_an...


- Adding each video as a row to each state row

In [63]:
trodes_state_df = pd.merge(trodes_state_df, trodes_video_df, on=["session_dir"], how="inner")

In [64]:
trodes_state_df.columns

Index(['session_dir', 'recording', 'metadata_dir', 'metadata_file',
       'description', 'byte_order', 'original_file', 'clockrate',
       'trodes_version', 'compile_date', 'compile_time', 'qt_version',
       'commit_tag', 'controller_firmware', 'headstage_firmware',
       'controller_serialnum', 'headstage_serialnum', 'autosettle', 'smartref',
       'gyro', 'accelerometer', 'magnetometer', 'time_offset',
       'system_time_at_creation', 'timestamp_at_creation', 'first_timestamp',
       'decimation', 'fields', 'data', 'filename', 'direction', 'id',
       'display_order', 'clock rate', 'camera_name', 'session_path',
       'first_dtype_name', 'first_item_data', 'last_dtype_name',
       'last_item_data', 'all_subjects', 'current_subject', 'event_indexes',
       'event_timestamps', 'video_name', 'video_timestamps'],
      dtype='object')

## Finding the closest frame to each event

In [65]:
trodes_state_df["event_timestamps"].iloc[1]

array([[ 5287093,  5443830],
       [ 6650246,  6662443],
       [ 6769647,  6866048],
       ...,
       [69100446, 69101246],
       [69102246, 69103447],
       [69105846, 69106646]], dtype=uint32)

In [66]:
trodes_state_df["event_frames"] = trodes_state_df.apply(lambda x: utilities.helper.find_nearest_indices(x["event_timestamps"], x["video_timestamps"]), axis=1)

In [67]:
trodes_state_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject,event_indexes,event_timestamps,video_name,video_timestamps,event_frames
0,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,"[5287093, 5443830, 6650246, 6662443, 6769647, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4, 4.2, 4.4]",3.1,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[5287093, 5443830], [6650246, 6662443], [6769...",20240411_155157_comp_novel_subj_3-1_and_3-4_an...,"[5287091, 5288477, 5289863, 5291249, 5291249, ...","[[1, 157], [1361, 1373], [1479, 1576], [1579, ..."
1,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,"[5287093, 5443830, 6650246, 6662443, 6769647, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4, 4.2, 4.4]",3.1,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[5287093, 5443830], [6650246, 6662443], [6769...",20240411_155157_comp_novel_subj_3-1_and_3-4_an...,"[5287091, 5288477, 5288477, 5289863, 5291249, ...","[[1, 158], [1362, 1375], [1481, 1578], [1581, ..."
2,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,"[5287093, 5443830, 5444430, 5458830, 5471830, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4, 4.2, 4.4]",3.1,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[5287093, 5443830], [5444430, 5458830], [5471...",20240411_155157_comp_novel_subj_3-1_and_3-4_an...,"[5287091, 5288477, 5289863, 5291249, 5291249, ...","[[1, 157], [157, 171], [185, 187], [192, 193],..."
3,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,"[5287093, 5443830, 5444430, 5458830, 5471830, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4, 4.2, 4.4]",3.1,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[5287093, 5443830], [5444430, 5458830], [5471...",20240411_155157_comp_novel_subj_3-1_and_3-4_an...,"[5287091, 5288477, 5288477, 5289863, 5291249, ...","[[1, 158], [158, 171], [185, 188], [192, 194],..."
4,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,"[5287093, 5443830, 6644245, 6844248, 8644271, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4, 4.2, 4.4]",3.1,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[5287093, 5443830], [6644245, 6844248], [8644...",20240411_155157_comp_novel_subj_3-1_and_3-4_an...,"[5287091, 5288477, 5289863, 5291249, 5291249, ...","[[1, 157], [1355, 1554], [3351, 3550], [4549, ..."


## Combine raw and state dataframes

In [68]:
trodes_state_df

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject,event_indexes,event_timestamps,video_name,video_timestamps,event_frames
0,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,"[5287093, 5443830, 6650246, 6662443, 6769647, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4, 4.2, 4.4]",3.1,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[5287093, 5443830], [6650246, 6662443], [6769...",20240411_155157_comp_novel_subj_3-1_and_3-4_an...,"[5287091, 5288477, 5289863, 5291249, 5291249, ...","[[1, 157], [1361, 1373], [1479, 1576], [1579, ..."
1,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,"[5287093, 5443830, 6650246, 6662443, 6769647, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4, 4.2, 4.4]",3.1,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[5287093, 5443830], [6650246, 6662443], [6769...",20240411_155157_comp_novel_subj_3-1_and_3-4_an...,"[5287091, 5288477, 5288477, 5289863, 5291249, ...","[[1, 158], [1362, 1375], [1481, 1578], [1581, ..."
2,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,"[5287093, 5443830, 5444430, 5458830, 5471830, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4, 4.2, 4.4]",3.1,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[5287093, 5443830], [5444430, 5458830], [5471...",20240411_155157_comp_novel_subj_3-1_and_3-4_an...,"[5287091, 5288477, 5289863, 5291249, 5291249, ...","[[1, 157], [157, 171], [185, 187], [192, 193],..."
3,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,"[5287093, 5443830, 5444430, 5458830, 5471830, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4, 4.2, 4.4]",3.1,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[5287093, 5443830], [5444430, 5458830], [5471...",20240411_155157_comp_novel_subj_3-1_and_3-4_an...,"[5287091, 5288477, 5288477, 5289863, 5291249, ...","[[1, 158], [158, 171], [185, 188], [192, 194],..."
4,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,"[5287093, 5443830, 6644245, 6844248, 8644271, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4, 4.2, 4.4]",3.1,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[5287093, 5443830], [6644245, 6844248], [8644...",20240411_155157_comp_novel_subj_3-1_and_3-4_an...,"[5287091, 5288477, 5289863, 5291249, 5291249, ...","[[1, 157], [1355, 1554], [3351, 3550], [4549, ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
223,20240409_142051_comp_novel_subj_3-3_and_3-4_an...,20240409_142051_comp_novel_subj_3-3_t3b3_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240409_142051_comp_novel_subj_3-3_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,"[2992093, 3115563, 4315981, 4515981, 6316009, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4, 4.3, 4.4]",3.3,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2992093, 3115563], [4315981, 4515981], [6316...",20240409_142051_comp_novel_subj_3-3_and_3-4_an...,"[2992091, 2993477, 2993477, 2994863, 2996249, ...","[[1, 124], [1322, 1521], [3318, 3517], [4516, ..."
224,20240409_142051_comp_novel_subj_3-3_and_3-4_an...,20240409_142051_comp_novel_subj_3-3_t3b3_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240409_142051_comp_novel_subj_3-3_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,"[2992093, 3115563, 3248365, 3265165, 3317366, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4, 4.3, 4.4]",3.3,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2992093, 3115563], [3248365, 3265165], [3317...",20240409_142051_comp_novel_subj_3-3_and_3-4_an...,"[2992091, 2993477, 2994863, 2994863, 2996249, ...","[[1, 124], [255, 274], [325, 331], [338, 375],..."
225,20240409_142051_comp_novel_subj_3-3_and_3-4_an...,20240409_142051_comp_novel_subj_3-3_t3b3_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240409_142051_comp_novel_subj_3-3_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,"[2992093, 3115563, 3248365, 3265165, 3317366, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4, 4.3, 4.4]",3.3,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2992093, 3115563], [3248365, 3265165], [3317...",20240409_142051_comp_novel_subj_3-3_and_3-4_an...,"[2992091, 2993477, 2993477, 2994863, 2996249, ...","[[1, 124], [256, 273], [325, 332], [339, 374],..."
226,20240409_142051_comp_novel_subj_3-3_and_3-4_an...,20240409_142051_comp_novel_subj_3-3_t3b3_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240409_142051_comp_novel_subj_3-3_t3b3_merge...,20000,2.3.4,Nov 28 2022,...,"[2992093, 3115563, 3181166, 3183567, 3184367, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.3, 3.4, 4.3, 4.4]",3.3,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2992093, 3115563], [3181166, 3183567], [3184...",20240409_142051_comp_novel_subj_3-3_and_3-4_an...,"[2992091, 2993477, 2994863, 2994863, 2996249, ...","[[1, 124], [189, 192], [192, 196], [196, 199],..."


In [69]:
trodes_state_df = trodes_state_df[STATE_COLS_TO_KEEP].drop_duplicates(subset=["session_dir", "video_name", "metadata_file"]).sort_values(["session_dir", "video_name", "metadata_file"]).reset_index(drop=True).copy()

In [70]:
trodes_state_df.head()

Unnamed: 0,session_dir,metadata_file,event_timestamps,video_name,video_timestamps,event_frames
0,20240401_151442_comp_novel_subj_3-1_and_3-3_an...,dio_ECU_Din1,"[[2932501, 3016731], [4217146, 4417150], [6217...",20240401_151442_comp_novel_subj_3-1_and_3-3_an...,"[2933884, 2935270, 2935270, 2936915, 2938042, ...","[[0, 83], [1280, 1481], [3278, 3477], [4476, 4..."
1,20240401_151442_comp_novel_subj_3-1_and_3-3_an...,dio_ECU_Din2,"[[2932501, 3016731], [3017128, 3051931], [3090...",20240401_151442_comp_novel_subj_3-1_and_3-3_an...,"[2933884, 2935270, 2935270, 2936915, 2938042, ...","[[0, 83], [84, 118], [158, 205], [275, 277], [..."
2,20240401_151442_comp_novel_subj_3-1_and_3-3_an...,dio_ECU_Din3,"[[2932501, 3016731], [3297734, 3305534], [3408...",20240401_151442_comp_novel_subj_3-1_and_3-3_an...,"[2933884, 2935270, 2935270, 2936915, 2938042, ...","[[0, 83], [364, 372], [474, 481], [625, 628], ..."
3,20240401_151442_comp_novel_subj_3-1_and_3-3_an...,dio_ECU_Din1,"[[2932501, 3016731], [4217146, 4417150], [6217...",20240401_151442_comp_novel_subj_3-1_and_3-3_an...,"[2932499, 2933884, 2935270, 2935270, 2938042, ...","[[1, 84], [1281, 1482], [3279, 3478], [4477, 4..."
4,20240401_151442_comp_novel_subj_3-1_and_3-3_an...,dio_ECU_Din2,"[[2932501, 3016731], [3017128, 3051931], [3090...",20240401_151442_comp_novel_subj_3-1_and_3-3_an...,"[2932499, 2933884, 2935270, 2935270, 2938042, ...","[[1, 84], [85, 119], [158, 205], [276, 277], [..."


In [71]:
trodes_state_df = trodes_state_df.groupby(same_columns).agg({**{col: 'first' for col in trodes_state_df.columns if col not in same_columns + different_columns}, **{col: lambda x: x.tolist() for col in different_columns}}).reset_index()

In [72]:
trodes_state_df.head()

Unnamed: 0,session_dir,video_name,video_timestamps,metadata_file,event_frames,event_timestamps
0,20240401_151442_comp_novel_subj_3-1_and_3-3_an...,20240401_151442_comp_novel_subj_3-1_and_3-3_an...,"[2933884, 2935270, 2935270, 2936915, 2938042, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[0, 83], [1280, 1481], [3278, 3477], [4476, ...","[[[2932501, 3016731], [4217146, 4417150], [621..."
1,20240401_151442_comp_novel_subj_3-1_and_3-3_an...,20240401_151442_comp_novel_subj_3-1_and_3-3_an...,"[2932499, 2933884, 2935270, 2935270, 2938042, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[1, 84], [1281, 1482], [3279, 3478], [4477, ...","[[[2932501, 3016731], [4217146, 4417150], [621..."
2,20240402_145808_comp_novel_subj_3-3_and_3-4_an...,20240402_145808_comp_novel_subj_3-3_and_3-4_an...,"[4558124, 4559509, 4560210, 4560895, 4562281, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[1, 27], [1225, 1425], [3221, 3422], [4419, ...","[[[4558126, 4585077], [5785494, 5985496], [778..."
3,20240402_145808_comp_novel_subj_3-3_and_3-4_an...,20240402_145808_comp_novel_subj_3-3_and_3-4_an...,"[4558124, 4559509, 4560895, 4560895, 4562281, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[1, 27], [1225, 1424], [3221, 3422], [4419, ...","[[[4558126, 4585077], [5785494, 5985496], [778..."
4,20240405_162313_comp_novel_subj_3-1_and_3-4_an...,20240405_162313_comp_novel_subj_3-1_and_3-4_an...,"[5123558, 5124944, 5124944, 5126330, 5127716, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[1, 80], [1279, 1478], [3276, 3475], [4473, ...","[[[5123560, 5203811], [6404228, 6604230], [840..."


In [73]:
trodes_state_df["tone_timestamps"] = trodes_state_df["event_timestamps"].apply(lambda x: x[0])
trodes_state_df["box_1_port_entry_timestamps"] = trodes_state_df["event_timestamps"].apply(lambda x: x[1])
trodes_state_df["box_2_port_entry_timestamps"] = trodes_state_df["event_timestamps"].apply(lambda x: x[2])

trodes_state_df["tone_frames"] = trodes_state_df["event_frames"].apply(lambda x: x[0])
trodes_state_df["box_1_port_entry_frames"] = trodes_state_df["event_frames"].apply(lambda x: x[1])
trodes_state_df["box_2_port_entry_frames"] = trodes_state_df["event_frames"].apply(lambda x: x[2])


In [74]:
trodes_state_df = trodes_state_df.drop(columns=["event_timestamps", "event_frames", "metadata_file"], errors="ignore")

In [75]:
trodes_state_df.head()

Unnamed: 0,session_dir,video_name,video_timestamps,tone_timestamps,box_1_port_entry_timestamps,box_2_port_entry_timestamps,tone_frames,box_1_port_entry_frames,box_2_port_entry_frames
0,20240401_151442_comp_novel_subj_3-1_and_3-3_an...,20240401_151442_comp_novel_subj_3-1_and_3-3_an...,"[2933884, 2935270, 2935270, 2936915, 2938042, ...","[[2932501, 3016731], [4217146, 4417150], [6217...","[[2932501, 3016731], [3017128, 3051931], [3090...","[[2932501, 3016731], [3297734, 3305534], [3408...","[[0, 83], [1280, 1481], [3278, 3477], [4476, 4...","[[0, 83], [84, 118], [158, 205], [275, 277], [...","[[0, 83], [364, 372], [474, 481], [625, 628], ..."
1,20240401_151442_comp_novel_subj_3-1_and_3-3_an...,20240401_151442_comp_novel_subj_3-1_and_3-3_an...,"[2932499, 2933884, 2935270, 2935270, 2938042, ...","[[2932501, 3016731], [4217146, 4417150], [6217...","[[2932501, 3016731], [3017128, 3051931], [3090...","[[2932501, 3016731], [3297734, 3305534], [3408...","[[1, 84], [1281, 1482], [3279, 3478], [4477, 4...","[[1, 84], [85, 119], [158, 205], [276, 277], [...","[[1, 84], [364, 373], [475, 482], [626, 629], ..."
2,20240402_145808_comp_novel_subj_3-3_and_3-4_an...,20240402_145808_comp_novel_subj_3-3_and_3-4_an...,"[4558124, 4559509, 4560210, 4560895, 4562281, ...","[[4558126, 4585077], [5785494, 5985496], [7785...","[[4558126, 4585077], [4679877, 4683080], [4743...","[[4558126, 4585077], [4615479, 4619676], [4628...","[[1, 27], [1225, 1425], [3221, 3422], [4419, 4...","[[1, 27], [122, 126], [185, 206], [207, 229], ...","[[1, 27], [58, 62], [70, 117], [134, 135], [14..."
3,20240402_145808_comp_novel_subj_3-3_and_3-4_an...,20240402_145808_comp_novel_subj_3-3_and_3-4_an...,"[4558124, 4559509, 4560895, 4560895, 4562281, ...","[[4558126, 4585077], [5785494, 5985496], [7785...","[[4558126, 4585077], [4679877, 4683080], [4743...","[[4558126, 4585077], [4615479, 4619676], [4628...","[[1, 27], [1225, 1424], [3221, 3422], [4419, 4...","[[1, 27], [121, 125], [185, 206], [207, 229], ...","[[1, 27], [58, 62], [70, 117], [134, 135], [14..."
4,20240405_162313_comp_novel_subj_3-1_and_3-4_an...,20240405_162313_comp_novel_subj_3-1_and_3-4_an...,"[5123558, 5124944, 5124944, 5126330, 5127716, ...","[[5123560, 5203811], [6404228, 6604230], [8404...","[[5123560, 5203811], [11358291, 11360289], [11...","[[5123560, 5203811], [5299214, 5322611], [5327...","[[1, 80], [1279, 1478], [3276, 3475], [4473, 4...","[[1, 80], [6223, 6226], [6231, 6295], [6295, 6...","[[1, 80], [175, 199], [203, 206], [246, 260], ..."


In [76]:
trodes_raw_df = trodes_raw_df[RAW_COLS_TO_KEEP].reset_index(drop=True).copy()

In [77]:
trodes_raw_df.head()

Unnamed: 0,session_dir,recording,original_file,session_path,current_subject,first_item_data,first_timestamp,all_subjects
0,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-1_t3b3_merged,20240411_155157_comp_novel_subj_3-1_t3b3_merge...,/scratch/back_up/reward_competition_extention/...,3.1,"[5287093, 5287094, 5287095, 5287096, 5287097, ...",5287093,"[3.1, 3.4, 4.2, 4.4]"
1,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_3-4_t4b4_merged,20240411_155157_comp_novel_subj_3-4_t4b4_merge...,/scratch/back_up/reward_competition_extention/...,3.4,"[5287093, 5287094, 5287095, 5287096, 5287097, ...",5287093,"[3.1, 3.4, 4.2, 4.4]"
2,20240411_155157_comp_novel_subj_3-1_and_3-4_an...,20240411_155157_comp_novel_subj_4-2_t6b6_merged,20240411_155157_comp_novel_subj_4-2_t6b6_merge...,/scratch/back_up/reward_competition_extention/...,4.2,"[5287093, 5287094, 5287095, 5287096, 5287097, ...",5287093,"[3.1, 3.4, 4.2, 4.4]"
3,20240412_161135_comp_novel_subj_4-2_and_4-4_an...,20240412_161135_comp_novel_subj_5-3_t6b6_merged,20240412_161135_comp_novel_subj_5-3_t6b6_merge...,/scratch/back_up/reward_competition_extention/...,5.3,"[4542881, 4542882, 4542883, 4542884, 4542885, ...",4542881,"[4.2, 4.4, 5.2, 5.3]"
4,20240412_161135_comp_novel_subj_4-2_and_4-4_an...,20240412_161135_comp_novel_subj_5-2_t5b5_merged,20240412_161135_comp_novel_subj_5-2_t5b5_merge...,/scratch/back_up/reward_competition_extention/...,5.2,"[4542881, 4542882, 4542883, 4542884, 4542885, ...",4542881,"[4.2, 4.4, 5.2, 5.3]"


In [78]:
trodes_final_df = pd.merge(trodes_raw_df, trodes_state_df, on=["session_dir"], how="inner")

In [79]:
trodes_final_df.shape

(76, 16)

In [80]:
trodes_final_df = trodes_final_df.rename(columns={"first_item_data": "raw_timestamps"})
trodes_final_df = trodes_final_df.drop(columns=["metadata_file"], errors="ignore")
trodes_final_df = trodes_final_df.sort_values(["session_dir", "recording"]).reset_index(drop=True).copy()

## Making the timestamps 0 indexed

In [81]:
trodes_final_df[[col for col in trodes_final_df.columns if "timestamps" in col]].head()

Unnamed: 0,raw_timestamps,video_timestamps,tone_timestamps,box_1_port_entry_timestamps,box_2_port_entry_timestamps
0,"[2932501, 2932502, 2932503, 2932504, 2932505, ...","[2933884, 2935270, 2935270, 2936915, 2938042, ...","[[2932501, 3016731], [4217146, 4417150], [6217...","[[2932501, 3016731], [3017128, 3051931], [3090...","[[2932501, 3016731], [3297734, 3305534], [3408..."
1,"[2932501, 2932502, 2932503, 2932504, 2932505, ...","[2932499, 2933884, 2935270, 2935270, 2938042, ...","[[2932501, 3016731], [4217146, 4417150], [6217...","[[2932501, 3016731], [3017128, 3051931], [3090...","[[2932501, 3016731], [3297734, 3305534], [3408..."
2,"[2932501, 2932502, 2932503, 2932504, 2932505, ...","[2933884, 2935270, 2935270, 2936915, 2938042, ...","[[2932501, 3016731], [4217146, 4417150], [6217...","[[2932501, 3016731], [3017128, 3051931], [3090...","[[2932501, 3016731], [3297734, 3305534], [3408..."
3,"[2932501, 2932502, 2932503, 2932504, 2932505, ...","[2932499, 2933884, 2935270, 2935270, 2938042, ...","[[2932501, 3016731], [4217146, 4417150], [6217...","[[2932501, 3016731], [3017128, 3051931], [3090...","[[2932501, 3016731], [3297734, 3305534], [3408..."
4,"[2932501, 2932502, 2932503, 2932504, 2932505, ...","[2933884, 2935270, 2935270, 2936915, 2938042, ...","[[2932501, 3016731], [4217146, 4417150], [6217...","[[2932501, 3016731], [3017128, 3051931], [3090...","[[2932501, 3016731], [3297734, 3305534], [3408..."


In [82]:
trodes_final_df["last_timestamp"] = trodes_final_df["raw_timestamps"].apply(lambda x: x[-1])

- Dropping raw timestamps because of memory issues

In [83]:
trodes_final_df = trodes_final_df.drop(columns=["raw_timestamps", "original_file"], errors="ignore")

In [84]:
copy_trodes_final_df = trodes_final_df.copy

In [85]:
for col in [col for col in trodes_final_df.columns if "timestamps" in col]:
    trodes_final_df[col] = trodes_final_df.apply(lambda x: x[col].astype(np.int32) - np.int32(x["first_timestamp"]), axis=1)

for col in [col for col in trodes_final_df.columns if "frames" in col]:
    trodes_final_df[col] = trodes_final_df[col].apply(lambda x: x.astype(np.int32))

In [86]:
sorted_columns = sorted(trodes_final_df.columns
, key=lambda x: x.split("_")[-1])
trodes_final_df = trodes_final_df[sorted_columns].copy()

## Saving to a file

In [87]:
trodes_final_df.to_pickle(os.path.join(OUTPUT_DIR, "{}_00_trodes_metadata.pkl".format(OUTPUT_PREFIX)))

In [88]:
trodes_final_df.head()

Unnamed: 0,session_dir,tone_frames,box_1_port_entry_frames,box_2_port_entry_frames,video_name,session_path,recording,current_subject,all_subjects,first_timestamp,last_timestamp,video_timestamps,tone_timestamps,box_1_port_entry_timestamps,box_2_port_entry_timestamps
0,20240401_151442_comp_novel_subj_3-1_and_3-3_an...,"[[0, 83], [1280, 1481], [3278, 3477], [4476, 4...","[[0, 83], [84, 118], [158, 205], [275, 277], [...","[[0, 83], [364, 372], [474, 481], [625, 628], ...",20240401_151442_comp_novel_subj_3-1_and_3-3_an...,/scratch/back_up/reward_competition_extention/...,20240401_151442_comp_novel_subj_3-1_t5b5_merged,3.1,"[3.1, 3.3, 4.2, 4.4]",2932501,66530053,"[1383, 2769, 2769, 4414, 5541, 6927, 6927, 831...","[[0, 84230], [1284645, 1484649], [3284672, 348...","[[0, 84230], [84627, 119430], [158430, 206033]...","[[0, 84230], [365233, 373033], [476236, 483234..."
1,20240401_151442_comp_novel_subj_3-1_and_3-3_an...,"[[1, 84], [1281, 1482], [3279, 3478], [4477, 4...","[[1, 84], [85, 119], [158, 205], [276, 277], [...","[[1, 84], [364, 373], [475, 482], [626, 629], ...",20240401_151442_comp_novel_subj_3-1_and_3-3_an...,/scratch/back_up/reward_competition_extention/...,20240401_151442_comp_novel_subj_3-1_t5b5_merged,3.1,"[3.1, 3.3, 4.2, 4.4]",2932501,66530053,"[-2, 1383, 2769, 2769, 5541, 5541, 6927, 6927,...","[[0, 84230], [1284645, 1484649], [3284672, 348...","[[0, 84230], [84627, 119430], [158430, 206033]...","[[0, 84230], [365233, 373033], [476236, 483234..."
2,20240401_151442_comp_novel_subj_3-1_and_3-3_an...,"[[0, 83], [1280, 1481], [3278, 3477], [4476, 4...","[[0, 83], [84, 118], [158, 205], [275, 277], [...","[[0, 83], [364, 372], [474, 481], [625, 628], ...",20240401_151442_comp_novel_subj_3-1_and_3-3_an...,/scratch/back_up/reward_competition_extention/...,20240401_151442_comp_novel_subj_3-3_t6b6_merged,3.3,"[3.1, 3.3, 4.2, 4.4]",2932501,66530053,"[1383, 2769, 2769, 4414, 5541, 6927, 6927, 831...","[[0, 84230], [1284645, 1484649], [3284672, 348...","[[0, 84230], [84627, 119430], [158430, 206033]...","[[0, 84230], [365233, 373033], [476236, 483234..."
3,20240401_151442_comp_novel_subj_3-1_and_3-3_an...,"[[1, 84], [1281, 1482], [3279, 3478], [4477, 4...","[[1, 84], [85, 119], [158, 205], [276, 277], [...","[[1, 84], [364, 373], [475, 482], [626, 629], ...",20240401_151442_comp_novel_subj_3-1_and_3-3_an...,/scratch/back_up/reward_competition_extention/...,20240401_151442_comp_novel_subj_3-3_t6b6_merged,3.3,"[3.1, 3.3, 4.2, 4.4]",2932501,66530053,"[-2, 1383, 2769, 2769, 5541, 5541, 6927, 6927,...","[[0, 84230], [1284645, 1484649], [3284672, 348...","[[0, 84230], [84627, 119430], [158430, 206033]...","[[0, 84230], [365233, 373033], [476236, 483234..."
4,20240401_151442_comp_novel_subj_3-1_and_3-3_an...,"[[0, 83], [1280, 1481], [3278, 3477], [4476, 4...","[[0, 83], [84, 118], [158, 205], [275, 277], [...","[[0, 83], [364, 372], [474, 481], [625, 628], ...",20240401_151442_comp_novel_subj_3-1_and_3-3_an...,/scratch/back_up/reward_competition_extention/...,20240401_151442_comp_novel_subj_4-2_t3b3_merged,4.2,"[3.1, 3.3, 4.2, 4.4]",2932501,66530053,"[1383, 2769, 2769, 4414, 5541, 6927, 6927, 831...","[[0, 84230], [1284645, 1484649], [3284672, 348...","[[0, 84230], [84627, 119430], [158430, 206033]...","[[0, 84230], [365233, 373033], [476236, 483234..."


In [89]:
trodes_final_df["session_dir"].unique()

array(['20240401_151442_comp_novel_subj_3-1_and_3-3_and_4-2_and_4-4',
       '20240402_145808_comp_novel_subj_3-3_and_3-4_and_5-2_and_5-3',
       '20240405_162313_comp_novel_subj_3-1_and_3-4_and_4-2_and_4-3',
       '20240406_150017_comp_novel_subj_4-3_and_4-4_and_5-2_and_5-3',
       '20240407_152222_comp_novel_subj_3-1_and_3-3_and_4-2_and_4-3',
       '20240408_155308_comp_novel_subj_3-1_and_3-4_and_5-2_and_5-3',
       '20240409_142051_comp_novel_subj_3-3_and_3-4_and_4-3_and_4-4',
       '20240410_152009_comp_novel_subj_4-2_and_4-3_and_5-2_and_5-3',
       '20240411_155157_comp_novel_subj_3-1_and_3-4_and_4-2_and_4-4',
       '20240412_161135_comp_novel_subj_4-2_and_4-4_and_5-2_and_5-3'],
      dtype=object)

In [90]:
trodes_final_df["video_name"].unique()

array(['20240401_151442_comp_novel_subj_3-1_and_3-3_and_4-2_and_4-4.1.videoTimeStamps.cameraHWSync',
       '20240401_151442_comp_novel_subj_3-1_and_3-3_and_4-2_and_4-4.2.videoTimeStamps.cameraHWSync',
       '20240402_145808_comp_novel_subj_3-3_and_3-4_and_5-2_and_5-3.1.videoTimeStamps.cameraHWSync',
       '20240402_145808_comp_novel_subj_3-3_and_3-4_and_5-2_and_5-3.2.videoTimeStamps.cameraHWSync',
       '20240405_162313_comp_novel_subj_3-1_and_3-4_and_4-2_and_4-3.1.videoTimeStamps.cameraHWSync',
       '20240405_162313_comp_novel_subj_3-1_and_3-4_and_4-2_and_4-3.2.videoTimeStamps.cameraHWSync',
       '20240406_150017_comp_novel_subj_4-3_and_4-4_and_5-2_and_5-3.1.videoTimeStamps.cameraHWSync',
       '20240406_150017_comp_novel_subj_4-3_and_4-4_and_5-2_and_5-3.2.videoTimeStamps.cameraHWSync',
       '20240407_152222_comp_novel_subj_3-1_and_3-3_and_4-2_and_4-3.1.videoTimeStamps.cameraHWSync',
       '20240407_152222_comp_novel_subj_3-1_and_3-3_and_4-2_and_4-3.2.videoTimeStamps.camer