# Time Stamp Extract

Brief 1-2 sentence description of notebook.

TODO: Supplement the description
- Notebook that extracts the timestamps and gets the time that tones played

In [1]:
# Imports of all used packages and libraries
import sys
import os
import git
import glob
from collections import defaultdict

In [2]:
git_repo = git.Repo(".", search_parent_directories=True)
git_root = git_repo.git.rev_parse("--show-toplevel")

In [3]:
git_root

'/nancy/user/riwata/projects/reward_comp_ext'

In [4]:
sys.path.insert(0, os.path.join(git_root, 'src'))

In [5]:
# Imports of all used packages and libraries
import glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [6]:
import spikeinterface.extractors as se
import spikeinterface.preprocessing as sp

In [7]:
import utilities.helper
import trodes.read_exported

# Functions

In [8]:
import re

def extract_floats(s):
    """
    Extracts all floats from a string and returns them as a list of strings.

    Parameters:
    - s (str): The string to extract floats from.

    Returns:
    - list: A list of strings, each representing a float found in the input string.
    """
    float_pattern = r"[-+]?\d*\.\d+|\d+"
    return [str(float(num)) for num in re.findall(float_pattern, s)]

## Inputs & Data

- Explanation of each input and where it comes from.

Inputs and Required data loading
- input variable names are in all caps snake case
- Whenever an input changes or is used for processing 
- The variables are all lower in snake case

In [9]:
# Path of the directory that contains the Spike Gadgets recording and the exported timestamp files
# Exported with this tool https://docs.spikegadgets.com/en/latest/basic/ExportFunctions.html
# Export these files:
    # -raw – Continuous raw band export.
    # -dio – Digital IO channel state change export.
    # -analogio – Continuous analog IO export.
INPUT_DIR = "/scratch/back_up/reward_competition_extention/data/rce_cohort_3"
OUTPUT_DIR = r"./proc" # where data is saved should always be shown in the inputs
TONE_DIN = "dio_ECU_Din1"
TONE_STATE = 1
os.makedirs(OUTPUT_DIR, exist_ok=True)
OUTPUT_PREFIX = "rce_pilot_3_alone_comp"

In [10]:
COLS_TO_KEEP = ['session_dir', 'recording', 'metadata_dir', 'metadata_file',
'original_file', 'filename', 'session_path', 'all_subjects',
       'current_subject', 'event_timestamps', 'video_name',
       'video_timestamps', 'event_frames', 'first_item_data']

In [11]:
RAW_COLS_TO_KEEP = ['session_dir',
 'recording',
 'original_file',
 'session_path',
 'current_subject',
 'first_item_data',
 'first_timestamp',
 'all_subjects']

In [12]:
STATE_COLS_TO_KEEP = ['session_dir',
 'metadata_file',
 'event_timestamps',
 'video_name',
 'video_timestamps',
 'event_frames',]

In [13]:
same_columns = ['session_dir', 'video_name']
different_columns = ['metadata_file', 'event_frames', 'event_timestamps']

In [14]:
# TODO: Find way not to hard code this
# ALL_SESSION_DIR = glob.glob("/scratch/back_up/reward_competition_extention/data/standard/2023_06_*/*.rec")
ALL_SESSION_DIR = glob.glob("/scratch/back_up/reward_competition_extention/data/rce_cohort_3/alone_comp/*.rec")



In [15]:
ALL_SESSION_DIR

['/scratch/back_up/reward_competition_extention/data/rce_cohort_3/alone_comp/20240323_144517_alone_comp_subj_3-1_and_3-4.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/alone_comp/20240323_122227_alone_comp_subj_5-2_and_5-3.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/alone_comp/20240320_142408_alone_comp_subj_3-1_and_3-3.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/alone_comp/20240322_160946_alone_comp_subj_4-3_and_4-4.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/alone_comp/20240322_120625_alone_comp_subj_3-3_and_3-4.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/alone_comp/20240320_171038_alone_comp_subj_4-2_and_4-3.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_3/alone_comp/20240323_165815_alone_comp_subj_4-2_and_4-4.rec']

## Outputs

Describe each output that the notebook creates. 

- Is it a plot or is it data?

- How valuable is the output and why is it valuable or useful?

## Other documentation

raw directory
- raw_group0.dat
    - voltage_value: Array with voltage measurement for each channel at each timestamp
- timestamps.dat
    - voltage_time_stamp: The time stamp of each voltage measurement

parent directory
- 1.videoTimeStamps.cameraHWSync
    - frame_number: Calculated by getting the index of each video time stamp tuple 
    - PosTimestamp: The time stamp of each video frame
    - HWframeCount: Unknown value. Starts at 30742 and increases by 1 for each tuple  
    - HWTimestamp: Unknown value. All zeroes
    - video_time: Calculated by dividing the frame number by the fps(frames per second) 
    - video_seconds: video_time, but rounded to seconds  	
    - These are filled in versions of the above collumns with the value from the most recent previous cell
        - filled_PosTimestamp 	
        - filledHWframeCount 	
        - filled_frame_number 	
        - filled_video_time 	
        - filled_video_seconds 	

DIO directory
- dio_ECU_Din1.dat
    - time: The time stamp the corresponds to the DIN input
    - state: Binary state of whether there is input from DIN or not 	
    - trial_number: Calculated by adding 1 to every time there is a DIN input
    - These are filled in versions of the above collumns with the value from the most recent previous cell
        - filled_state 	
        - filled_trial_number

ss_output directory (Spike sorting with Spike interface)
- firings.npz
    - unit_id: All the units that had a spike train for the given timestamp 	
    - number_of_units: Calculated by counting the number of units that had a spike train

## Functions

- function names are short and in snake case all lowercase
- a function name should be unique but does not have to describe the function
- doc strings describe functions not function names

## Processing

Describe what is done to the data here and how inputs are manipulated to generate outputs. 

In [16]:
# As much code and as many cells as required
# includes EDA and playing with data
# GO HAM!

# LOOP 1: Extracting all the Trodes

- Getting all the data from all the exported Trodes files

- Getting all the data from all the exported Trodes files and saving it to `session_to_trodes_data`
    - Creates a dictionary with the structure of:
        - `{dir_name: {file_name: metadata, file_name_2: metadata_2}, dir_name_2: {file_name_3: metadata_3, file_name_4: metadata_4}}`

In [17]:
# Saving the trodes data for each session
# Each key is a session name
# Each value is a dictionary of every recording file in that session
session_to_trodes_data = utilities.helper.create_recursive_dict()


# Saving the path of the session recording
session_to_path = {}

# Going through each session recording
# Which includes all the recordings from all the miniloggers and cameras
for session_path in ALL_SESSION_DIR:   
    try:
        # Getting the name of the session from the path
        session_basename = os.path.splitext(os.path.basename(session_path))[0]
        print("Current Session: {}".format(session_basename))
        # Reading the trodes data for every recording file in the session directory
        session_to_trodes_data[session_basename] = trodes.read_exported.organize_all_trodes_export(session_path)
        
        session_to_path[session_basename] = session_path
    except Exception as e: 
        print(e)


Current Session: 20240323_144517_alone_comp_subj_3-1_and_3-4
Skipping file 20240323_144517_alone_comp_subj_3-1_t5b5_merged.timestampoffset.txt due to error: Settings format not supported
Skipping file 20240323_144517_alone_comp_subj_3-4_t6b6_merged.timestampoffset.txt due to error: Settings format not supported


  return np.dtype(dtype_spec)


Current Session: 20240323_122227_alone_comp_subj_5-2_and_5-3
Skipping file 20240323_122227_alone_comp_subj_5-3_t5b5_merged.timestampoffset.txt due to error: Settings format not supported
Skipping file 20240323_122227_alone_comp_subj_5-2_t6b6_merged.timestampoffset.txt due to error: Settings format not supported
Current Session: 20240320_142408_alone_comp_subj_3-1_and_3-3
Skipping file 20240320_142408_alone_comp_subj_3-1_t6b6_merged.timestampoffset.txt due to error: Settings format not supported
Skipping file 20240320_142408_alone_comp_subj_3-3_t5b5_merged.timestampoffset.txt due to error: Settings format not supported
Current Session: 20240322_160946_alone_comp_subj_4-3_and_4-4
Skipping file 20240322_160946_alone_comp_subj_4-3_t6b6_merged.timestampoffset.txt due to error: Settings format not supported
Skipping file 20240322_160946_alone_comp_subj_4-4_t5b5_merged.timestampoffset.txt due to error: Settings format not supported
Current Session: 20240322_120625_alone_comp_subj_3-3_and_3-4


In [18]:
session_to_trodes_data

defaultdict(<function utilities.helper.create_recursive_dict()>,
            {'20240323_144517_alone_comp_subj_3-1_and_3-4': defaultdict(dict,
                         {'20240323_144517_alone_comp_subj_3-1_t5b5_merged': {'timestampoffset': {},
                           'DIO': {'dio_ECU_Din3': {'description': 'State change data for one digital channel. Display_order is 1-based',
                             'byte_order': 'little endian',
                             'original_file': '20240323_144517_alone_comp_subj_3-1_t5b5_merged.rec',
                             'clockrate': '20000',
                             'trodes_version': '2.3.4',
                             'compile_date': 'Nov 28 2022',
                             'compile_time': '15:10:45',
                             'qt_version': '6.2.2',
                             'commit_tag': 'heads/Release_2.3.4-0-gd5a58cd9-dirty',
                             'controller_firmware': '3.17',
                             'headsta

- Adding the video timestamps

In [19]:
for session_path in ALL_SESSION_DIR:   
    try:
        session_basename = os.path.splitext(os.path.basename(session_path))[0]
        print("Current Session: {}".format(session_basename))
        file_to_video_timestamps = {}
        for video_timestamps in glob.glob(os.path.join(session_path, "*cameraHWSync")):
            video_basename = os.path.basename(video_timestamps)
            print("Current Video Name: {}".format(video_basename))
            timestamp_array = trodes.read_exported.read_trodes_extracted_data_file(video_timestamps)
            if "video_timestamps" not in session_to_trodes_data[session_basename][session_basename]:
                session_to_trodes_data[session_basename][session_basename]["video_timestamps"] = defaultdict(dict)
            session_to_trodes_data[session_basename][session_basename]["video_timestamps"][video_basename.split(".")[-3]] = timestamp_array
    
    
    except Exception as e: 
        print(e)

Current Session: 20240323_144517_alone_comp_subj_3-1_and_3-4
Current Video Name: 20240323_144517_alone_comp_subj_3-1_and_3-4.1.videoTimeStamps.cameraHWSync
Current Video Name: 20240323_144517_alone_comp_subj_3-1_and_3-4.2.videoTimeStamps.cameraHWSync
Current Session: 20240323_122227_alone_comp_subj_5-2_and_5-3
Current Video Name: 20240323_122227_alone_comp_subj_5-2_and_5-3.1.videoTimeStamps.cameraHWSync
Current Video Name: 20240323_122227_alone_comp_subj_5-2_and_5-3.2.videoTimeStamps.cameraHWSync
Current Session: 20240320_142408_alone_comp_subj_3-1_and_3-3
Current Video Name: 20240320_142408_alone_comp_subj_3-1_and_3-3.2.videoTimeStamps.cameraHWSync
Current Video Name: 20240320_142408_alone_comp_subj_3-1_and_3-3.1.videoTimeStamps.cameraHWSync
Current Session: 20240322_160946_alone_comp_subj_4-3_and_4-4
Current Video Name: 20240322_160946_alone_comp_subj_4-3_and_4-4.2.videoTimeStamps.cameraHWSync
Current Video Name: 20240322_160946_alone_comp_subj_4-3_and_4-4.1.videoTimeStamps.cameraHWS

In [20]:
session_to_trodes_data[session_basename][session_basename]["video_timestamps"]

defaultdict(dict,
            {'2': {'clock rate': '20000',
              'camera_name': 'HD USB Camera (\\\\?\\usb#vid_32e4&pid_9230&mi_00#6&315f6863&1&0000#{e5323777-f976-4f5b-9b55-b94699c46e44}\\global)',
              'fields': '<PosTimestamp uint32><HWframeCount uint32><HWTimestamp uint64>',
              'data': array([( 2203531, 0, 0), ( 2204917, 0, 0), ( 2206303, 0, 0), ...,
                     (66200216, 0, 0), (66201602, 0, 0), (66201602, 0, 0)],
                    dtype=[('PosTimestamp', '<u4'), ('HWframeCount', '<u4'), ('HWTimestamp', '<u8')]),
              'filename': '20240323_165815_alone_comp_subj_4-2_and_4-4.2.videoTimeStamps.cameraHWSync'},
             '1': {'clock rate': '20000',
              'camera_name': 'HD USB Camera (\\\\?\\usb#vid_32e4&pid_9230&mi_00#6&bec0719&2&0000#{e5323777-f976-4f5b-9b55-b94699c46e44}\\global)',
              'fields': '<PosTimestamp uint32><HWframeCount uint32><HWTimestamp uint64>',
              'data': array([( 2204917, 0, 0), ( 22

- Creating a dataframe the dictionary with a column for:
  - Session directory
  - Recording name
  - Metadata directory
  - Metadata file
  - And a column for each metadata

In [21]:
# Creating a dataframe from the nested dictionary
trodes_metadata_df = pd.DataFrame.from_dict({(i,j,k,l): session_to_trodes_data[i][j][k][l] 
                           for i in session_to_trodes_data.keys() 
                           for j in session_to_trodes_data[i].keys()
                           for k in session_to_trodes_data[i][j].keys()
                           for l in session_to_trodes_data[i][j][k].keys()},
                           orient='index')

# Resetting the index and renaming the columns
trodes_metadata_df = trodes_metadata_df.reset_index()
trodes_metadata_df = trodes_metadata_df.rename(columns={'level_0': 'session_dir', 'level_1': 'recording', 'level_2': 'metadata_dir', 'level_3': 'metadata_file'}, errors="ignore")

# Adding the session path to the dataframe
trodes_metadata_df["session_path"] = trodes_metadata_df["session_dir"].map(session_to_path)

In [22]:
trodes_metadata_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,direction,id,display_order,fields,data,filename,decimation,clock rate,camera_name,session_path
0,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,input,ECU_Din3,8,<time uint32><state uint8>,"[[1293017, 1], [2249095, 0], [2249492, 1], [22...",20240323_144517_alone_comp_subj_3-1_t5b5_merge...,,,,/scratch/back_up/reward_competition_extention/...
1,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Dout4,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,output,ECU_Dout4,5,<time uint32><state uint8>,"[[1293017, 0]]",20240323_144517_alone_comp_subj_3-1_t5b5_merge...,,,,/scratch/back_up/reward_competition_extention/...
2,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Dout3,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,output,ECU_Dout3,4,<time uint32><state uint8>,"[[1293017, 0]]",20240323_144517_alone_comp_subj_3-1_t5b5_merge...,,,,/scratch/back_up/reward_competition_extention/...
3,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,input,ECU_Din1,7,<time uint32><state uint8>,"[[1293017, 1], [2249095, 0], [3449510, 1], [36...",20240323_144517_alone_comp_subj_3-1_t5b5_merge...,,,,/scratch/back_up/reward_competition_extention/...
4,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,input,ECU_Din2,6,<time uint32><state uint8>,"[[1293017, 1], [2249095, 0], [2324495, 1], [23...",20240323_144517_alone_comp_subj_3-1_t5b5_merge...,,,,/scratch/back_up/reward_competition_extention/...


In [23]:
trodes_metadata_df.tail()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,direction,id,display_order,fields,data,filename,decimation,clock rate,camera_name,session_path
163,20240322_120625_alone_comp_subj_3-3_and_3-4,20240322_120625_alone_comp_subj_3-3_and_3-4,video_timestamps,1,,,,,,,...,,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[3619890, 0, 0], [3620489, 0, 0], [3621276, 0...",20240322_120625_alone_comp_subj_3-3_and_3-4.1....,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...
164,20240320_171038_alone_comp_subj_4-2_and_4-3,20240320_171038_alone_comp_subj_4-2_and_4-3,video_timestamps,2,,,,,,,...,,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[2069102, 0, 0], [2070488, 0, 0], [2070488, 0...",20240320_171038_alone_comp_subj_4-2_and_4-3.2....,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...
165,20240320_171038_alone_comp_subj_4-2_and_4-3,20240320_171038_alone_comp_subj_4-2_and_4-3,video_timestamps,1,,,,,,,...,,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[2069102, 0, 0], [2070488, 0, 0], [2071874, 0...",20240320_171038_alone_comp_subj_4-2_and_4-3.1....,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...
166,20240323_165815_alone_comp_subj_4-2_and_4-4,20240323_165815_alone_comp_subj_4-2_and_4-4,video_timestamps,2,,,,,,,...,,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[2203531, 0, 0], [2204917, 0, 0], [2206303, 0...",20240323_165815_alone_comp_subj_4-2_and_4-4.2....,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...
167,20240323_165815_alone_comp_subj_4-2_and_4-4,20240323_165815_alone_comp_subj_4-2_and_4-4,video_timestamps,1,,,,,,,...,,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[2204917, 0, 0], [2206048, 0, 0], [2206303, 0...",20240323_165815_alone_comp_subj_4-2_and_4-4.1....,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...


- Getting the first item from each tuple in the arrays in the `data` column
  - This first item is usually just the timestamp

In [24]:
trodes_metadata_df["data"].iloc[0]

array([( 1293017, 1), ( 2249095, 0), ( 2249492, 1), ..., (33530476, 0),
       (33532081, 1), (33533079, 0)],
      dtype=[('time', '<u4'), ('state', 'u1')])

In [25]:
# Getting the dtype name of each column in the numpy array
trodes_metadata_df["first_dtype_name"] = trodes_metadata_df["data"].apply(lambda x: x.dtype.names[0])
# Getting the first item of each column in the numpy array
trodes_metadata_df["first_item_data"] = trodes_metadata_df["data"].apply(lambda x: x[x.dtype.names[0]])


In [26]:
# Same as above but for the last column
trodes_metadata_df["last_dtype_name"] = trodes_metadata_df["data"].apply(lambda x: x.dtype.names[-1])
trodes_metadata_df["last_item_data"] = trodes_metadata_df["data"].apply(lambda x: x[x.dtype.names[-1]])

In [27]:
trodes_metadata_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,data,filename,decimation,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data
0,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,"[[1293017, 1], [2249095, 0], [2249492, 1], [22...",20240323_144517_alone_comp_subj_3-1_t5b5_merge...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1293017, 2249095, 2249492, 2257492, 2257894, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ..."
1,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Dout4,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,"[[1293017, 0]]",20240323_144517_alone_comp_subj_3-1_t5b5_merge...,,,,/scratch/back_up/reward_competition_extention/...,time,[1293017],state,[0]
2,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Dout3,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,"[[1293017, 0]]",20240323_144517_alone_comp_subj_3-1_t5b5_merge...,,,,/scratch/back_up/reward_competition_extention/...,time,[1293017],state,[0]
3,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,"[[1293017, 1], [2249095, 0], [3449510, 1], [36...",20240323_144517_alone_comp_subj_3-1_t5b5_merge...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1293017, 2249095, 3449510, 3649512, 5449536, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ..."
4,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,"[[1293017, 1], [2249095, 0], [2324495, 1], [23...",20240323_144517_alone_comp_subj_3-1_t5b5_merge...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1293017, 2249095, 2324495, 2340498, 2342696, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ..."


In [28]:
trodes_metadata_df.tail()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,data,filename,decimation,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data
163,20240322_120625_alone_comp_subj_3-3_and_3-4,20240322_120625_alone_comp_subj_3-3_and_3-4,video_timestamps,1,,,,,,,...,"[[3619890, 0, 0], [3620489, 0, 0], [3621276, 0...",20240322_120625_alone_comp_subj_3-3_and_3-4.1....,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[3619890, 3620489, 3621276, 3622662, 3624047, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
164,20240320_171038_alone_comp_subj_4-2_and_4-3,20240320_171038_alone_comp_subj_4-2_and_4-3,video_timestamps,2,,,,,,,...,"[[2069102, 0, 0], [2070488, 0, 0], [2070488, 0...",20240320_171038_alone_comp_subj_4-2_and_4-3.2....,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2069102, 2070488, 2070488, 2071874, 2073259, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
165,20240320_171038_alone_comp_subj_4-2_and_4-3,20240320_171038_alone_comp_subj_4-2_and_4-3,video_timestamps,1,,,,,,,...,"[[2069102, 0, 0], [2070488, 0, 0], [2071874, 0...",20240320_171038_alone_comp_subj_4-2_and_4-3.1....,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2069102, 2070488, 2071874, 2071874, 2073259, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
166,20240323_165815_alone_comp_subj_4-2_and_4-4,20240323_165815_alone_comp_subj_4-2_and_4-4,video_timestamps,2,,,,,,,...,"[[2203531, 0, 0], [2204917, 0, 0], [2206303, 0...",20240323_165815_alone_comp_subj_4-2_and_4-4.2....,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2203531, 2204917, 2206303, 2206303, 2207689, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
167,20240323_165815_alone_comp_subj_4-2_and_4-4,20240323_165815_alone_comp_subj_4-2_and_4-4,video_timestamps,1,,,,,,,...,"[[2204917, 0, 0], [2206048, 0, 0], [2206303, 0...",20240323_165815_alone_comp_subj_4-2_and_4-4.1....,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2204917, 2206048, 2206303, 2207689, 2209075, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."


In [29]:
trodes_metadata_df["recording"].unique()

array(['20240323_144517_alone_comp_subj_3-1_t5b5_merged',
       '20240323_144517_alone_comp_subj_3-4_t6b6_merged',
       '20240323_122227_alone_comp_subj_5-2_t6b6_merged',
       '20240323_122227_alone_comp_subj_5-3_t5b5_merged',
       '20240320_142408_alone_comp_subj_3-3_t5b5_merged',
       '20240320_142408_alone_comp_subj_3-1_t6b6_merged',
       '20240322_160946_alone_comp_subj_4-4_t5b5_merged',
       '20240322_160946_alone_comp_subj_4-3_t6b6_merged',
       '20240322_120625_alone_comp_subj_3-3_t6b6_merged',
       '20240322_120625_alone_comp_subj_3-4_t5b5_merged',
       '20240320_171038_alone_comp_subj_4-2_t6b6_merged',
       '20240320_171038_alone_comp_subj_4-3_t5b5_merged',
       '20240323_165815_alone_comp_subj_4-2_t5b5_merged',
       '20240323_165815_alone_comp_subj_4-4_t6b6_merged',
       '20240323_144517_alone_comp_subj_3-1_and_3-4',
       '20240323_122227_alone_comp_subj_5-2_and_5-3',
       '20240320_142408_alone_comp_subj_3-1_and_3-3',
       '20240322_160946_al

## Getting the subject information from the metadata

In [30]:
def split_by_multiple_delimiters(s, delimiters):
    """
    Splits a string by multiple delimiters.

    Parameters:
    - s (str): The string to split.
    - delimiters (list): A list of delimiters to split the string by.

    Returns:
    - list: A list of substrings.
    """
    return re.split('|'.join(map(re.escape, delimiters)), s)


In [31]:
trodes_metadata_df["all_subjects"] = trodes_metadata_df["session_dir"].apply(lambda x: x.split("subj")[-1].strip("_").replace("-", "."))#.split("t")[0].strip("_").replace("_",".").split(".and."))
trodes_metadata_df["all_subjects"] = trodes_metadata_df["all_subjects"].apply(lambda x: sorted(extract_floats(x)))

In [32]:
trodes_metadata_df["session_dir"].iloc[0]

'20240323_144517_alone_comp_subj_3-1_and_3-4'

In [33]:
trodes_metadata_df["all_subjects"].apply(lambda x: tuple(x)).unique()

array([('3.1', '3.4'), ('5.2', '5.3'), ('3.1', '3.3'), ('4.3', '4.4'),
       ('3.3', '3.4'), ('4.2', '4.3'), ('4.2', '4.4')], dtype=object)

In [34]:
trodes_metadata_df["current_subject"] = trodes_metadata_df["recording"].apply(lambda x: x.split("subj")[-1].strip("_").replace("-", ".").replace("_", "."))#.split("t")[0].strip("_").replace("_",".").split(".and."))
trodes_metadata_df["current_subject"] = trodes_metadata_df["current_subject"].apply(lambda x: str(extract_floats(x)[0]).strip())


In [35]:
trodes_metadata_df["current_subject"].unique()

array(['3.1', '3.4', '5.2', '5.3', '3.3', '4.4', '4.3', '4.2'],
      dtype=object)

## Dropping all the rows with unneeded metadata

In [36]:
trodes_metadata_df["metadata_dir"].unique()

array(['DIO', 'raw', 'time', 'video_timestamps'], dtype=object)

In [37]:
METADATA_TO_KEEP = ['raw', 'DIO', 'video_timestamps']

In [38]:
trodes_metadata_df = trodes_metadata_df[trodes_metadata_df["metadata_dir"].isin(METADATA_TO_KEEP)]

In [39]:
trodes_metadata_df = trodes_metadata_df[~trodes_metadata_df["metadata_file"].str.contains("out")]
trodes_metadata_df = trodes_metadata_df[~trodes_metadata_df["metadata_file"].str.contains("coordinates")]


In [40]:
trodes_metadata_df = trodes_metadata_df.reset_index(drop=True)

# Getting the first time stamp of each recording

In [41]:
trodes_raw_df = trodes_metadata_df[(trodes_metadata_df["metadata_dir"] == "raw") & (trodes_metadata_df["metadata_file"] == "timestamps")].copy()


In [42]:
trodes_raw_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,decimation,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
4,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,raw,timestamps,Raw timestamps,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1293017, 1293018, 1293019, 1293020, 1293021, ...",time,"[1293017, 1293018, 1293019, 1293020, 1293021, ...","[3.1, 3.4]",3.1
9,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-4_t6b6_merged,raw,timestamps,Raw timestamps,little endian,20240323_144517_alone_comp_subj_3-4_t6b6_merge...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1293017, 1293018, 1293019, 1293020, 1293021, ...",time,"[1293017, 1293018, 1293019, 1293020, 1293021, ...","[3.1, 3.4]",3.4
10,20240323_122227_alone_comp_subj_5-2_and_5-3,20240323_122227_alone_comp_subj_5-2_t6b6_merged,raw,timestamps,Raw timestamps,little endian,20240323_122227_along_comp_subj_5-2_t6b6_merge...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[2058017, 2058018, 2058019, 2058020, 2058021, ...",time,"[2058017, 2058018, 2058019, 2058020, 2058021, ...","[5.2, 5.3]",5.2
19,20240323_122227_alone_comp_subj_5-2_and_5-3,20240323_122227_alone_comp_subj_5-3_t5b5_merged,raw,timestamps,Raw timestamps,little endian,20240323_122227_along_comp_subj_5-3_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[2058017, 2058018, 2058019, 2058020, 2058021, ...",time,"[2058017, 2058018, 2058019, 2058020, 2058021, ...","[5.2, 5.3]",5.3
20,20240320_142408_alone_comp_subj_3-1_and_3-3,20240320_142408_alone_comp_subj_3-3_t5b5_merged,raw,timestamps,Raw timestamps,little endian,20240320_142408_alone_comp_subj_3-3_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1830734, 1830735, 1830736, 1830737, 1830738, ...",time,"[1830734, 1830735, 1830736, 1830737, 1830738, ...","[3.1, 3.3]",3.3


In [43]:
trodes_raw_df["first_timestamp"] = trodes_raw_df["first_item_data"].apply(lambda x: x[0])

In [44]:
trodes_raw_df["recording"].iloc[0]

'20240323_144517_alone_comp_subj_3-1_t5b5_merged'

In [45]:
recording_to_first_timestamp = trodes_raw_df.set_index('session_dir')['first_timestamp'].to_dict()

In [46]:
recording_to_first_timestamp

{'20240323_144517_alone_comp_subj_3-1_and_3-4': 1293017,
 '20240323_122227_alone_comp_subj_5-2_and_5-3': 2058017,
 '20240320_142408_alone_comp_subj_3-1_and_3-3': 1830734,
 '20240322_160946_alone_comp_subj_4-3_and_4-4': 5331441,
 '20240322_120625_alone_comp_subj_3-3_and_3-4': 3618506,
 '20240320_171038_alone_comp_subj_4-2_and_4-3': 2067718,
 '20240323_165815_alone_comp_subj_4-2_and_4-4': 2203533}

In [47]:
trodes_metadata_df["first_timestamp"] = trodes_metadata_df["session_dir"].map(recording_to_first_timestamp)

In [48]:
trodes_metadata_df["first_timestamp"]

0     1293017
1     1293017
2     1293017
3     1293017
4     1293017
       ...   
79    3618506
80    2067718
81    2067718
82    2203533
83    2203533
Name: first_timestamp, Length: 84, dtype: int64

# Getting the event timestamps

In [49]:
trodes_metadata_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,decimation,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
0,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1293017, 2249095, 2249492, 2257492, 2257894, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4]",3.1
1,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1293017, 2249095, 3449510, 3649512, 5449536, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4]",3.1
2,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1293017, 2249095, 2324495, 2340498, 2342696, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4]",3.1
3,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din4,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,[1293017],state,[0],"[3.1, 3.4]",3.1
4,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,raw,timestamps,Raw timestamps,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1293017, 1293018, 1293019, 1293020, 1293021, ...",time,"[1293017, 1293018, 1293019, 1293020, 1293021, ...","[3.1, 3.4]",3.1


In [50]:
trodes_metadata_df.tail()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,decimation,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
79,20240322_120625_alone_comp_subj_3-3_and_3-4,20240322_120625_alone_comp_subj_3-3_and_3-4,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[3619890, 3620489, 3621276, 3622662, 3624047, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[3.3, 3.4]",3.3
80,20240320_171038_alone_comp_subj_4-2_and_4-3,20240320_171038_alone_comp_subj_4-2_and_4-3,video_timestamps,2,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2069102, 2070488, 2070488, 2071874, 2073259, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[4.2, 4.3]",4.2
81,20240320_171038_alone_comp_subj_4-2_and_4-3,20240320_171038_alone_comp_subj_4-2_and_4-3,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2069102, 2070488, 2071874, 2071874, 2073259, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[4.2, 4.3]",4.2
82,20240323_165815_alone_comp_subj_4-2_and_4-4,20240323_165815_alone_comp_subj_4-2_and_4-4,video_timestamps,2,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2203531, 2204917, 2206303, 2206303, 2207689, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[4.2, 4.4]",4.2
83,20240323_165815_alone_comp_subj_4-2_and_4-4,20240323_165815_alone_comp_subj_4-2_and_4-4,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2204917, 2206048, 2206303, 2207689, 2209075, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[4.2, 4.4]",4.2


In [51]:
# trodes_state_df = trodes_metadata_df[trodes_metadata_df["last_dtype_name"] == "state"].copy()

# Filtering for digital IO channels
trodes_state_df = trodes_metadata_df[trodes_metadata_df["metadata_dir"].isin(["DIO"])].copy()
# Filtering for tone and port entry related channels
trodes_state_df = trodes_metadata_df[trodes_metadata_df["id"].isin(["ECU_Din1", "ECU_Din2", "ECU_Din3"])].copy()


In [52]:
trodes_state_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,decimation,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
0,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1293017, 2249095, 2249492, 2257492, 2257894, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4]",3.1
1,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1293017, 2249095, 3449510, 3649512, 5449536, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4]",3.1
2,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1293017, 2249095, 2324495, 2340498, 2342696, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4]",3.1
5,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-4_t6b6_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-4_t6b6_merge...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1293017, 2249095, 2324495, 2340498, 2342696, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4]",3.4
6,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-4_t6b6_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-4_t6b6_merge...,20000,2.3.4,Nov 28 2022,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[1293017, 2249095, 2249492, 2257492, 2257894, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4]",3.4


In [53]:
trodes_state_df["event_indexes"] = trodes_state_df.apply(lambda x: np.column_stack([np.where(x["last_item_data"] == 1)[0], np.where(x["last_item_data"] == 1)[0]+1]), axis=1)

In [54]:
trodes_state_df["event_indexes"] = trodes_state_df.apply(lambda x: x["event_indexes"][x["event_indexes"][:, 1] <= x["first_item_data"].shape[0] - 1], axis=1)

In [55]:
trodes_state_df["event_timestamps"] = trodes_state_df.apply(lambda x: x["first_item_data"][x["event_indexes"]], axis=1)

## Updating the video timestamps

## Syncing up the video frame data

In [56]:
# Getting the rows that are the metadata for the video timestamps
trodes_video_df = trodes_metadata_df[trodes_metadata_df["metadata_dir"] == "video_timestamps"].copy().reset_index(drop=True)



In [57]:
# Filtering for the first video only
# This only applies to this pilot data where we are only looking the at competition data
# trodes_video_df = trodes_video_df[trodes_video_df["metadata_file"] == "1"].copy()

In [58]:
trodes_video_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,decimation,clock rate,camera_name,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
0,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_and_3-4,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[1294401, 1294401, 1295787, 1295787, 1297172, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[3.1, 3.4]",3.1
1,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_and_3-4,video_timestamps,2,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[1293015, 1294401, 1295787, 1295787, 1297172, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[3.1, 3.4]",3.1
2,20240323_122227_alone_comp_subj_5-2_and_5-3,20240323_122227_alone_comp_subj_5-2_and_5-3,video_timestamps,1,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2058015, 2058015, 2059401, 2060787, 2061424, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[5.2, 5.3]",5.2
3,20240323_122227_alone_comp_subj_5-2_and_5-3,20240323_122227_alone_comp_subj_5-2_and_5-3,video_timestamps,2,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[2058015, 2058015, 2059401, 2060787, 2062172, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[5.2, 5.3]",5.2
4,20240320_142408_alone_comp_subj_3-1_and_3-3,20240320_142408_alone_comp_subj_3-1_and_3-3,video_timestamps,2,,,,,,,...,,20000,HD USB Camera (\\?\usb#vid_32e4&pid_9230&mi_00...,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[1832066, 1832118, 1833504, 1834890, 1836001, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[3.1, 3.3]",3.1


In [59]:
# Making the video timestamps into an evenly distributed array
trodes_video_df["video_timestamps"] = trodes_video_df["first_item_data"]

In [60]:
# Removing the columns that are no longer needed
trodes_video_df = trodes_video_df[["filename", "video_timestamps", "session_dir"]].copy()

In [61]:
# Renaming the filename so that we can merge with other dataframes with the same column name
trodes_video_df = trodes_video_df.rename(columns={"filename": "video_name"})

In [62]:
trodes_video_df.head()

Unnamed: 0,video_name,video_timestamps,session_dir
0,20240323_144517_alone_comp_subj_3-1_and_3-4.1....,"[1294401, 1294401, 1295787, 1295787, 1297172, ...",20240323_144517_alone_comp_subj_3-1_and_3-4
1,20240323_144517_alone_comp_subj_3-1_and_3-4.2....,"[1293015, 1294401, 1295787, 1295787, 1297172, ...",20240323_144517_alone_comp_subj_3-1_and_3-4
2,20240323_122227_alone_comp_subj_5-2_and_5-3.1....,"[2058015, 2058015, 2059401, 2060787, 2061424, ...",20240323_122227_alone_comp_subj_5-2_and_5-3
3,20240323_122227_alone_comp_subj_5-2_and_5-3.2....,"[2058015, 2058015, 2059401, 2060787, 2062172, ...",20240323_122227_alone_comp_subj_5-2_and_5-3
4,20240320_142408_alone_comp_subj_3-1_and_3-3.2....,"[1832066, 1832118, 1833504, 1834890, 1836001, ...",20240320_142408_alone_comp_subj_3-1_and_3-3


- Adding each video as a row to each state row

In [63]:
trodes_state_df = pd.merge(trodes_state_df, trodes_video_df, on=["session_dir"], how="inner")

In [64]:
trodes_state_df.columns

Index(['session_dir', 'recording', 'metadata_dir', 'metadata_file',
       'description', 'byte_order', 'original_file', 'clockrate',
       'trodes_version', 'compile_date', 'compile_time', 'qt_version',
       'commit_tag', 'controller_firmware', 'headstage_firmware',
       'controller_serialnum', 'headstage_serialnum', 'autosettle', 'smartref',
       'gyro', 'accelerometer', 'magnetometer', 'time_offset',
       'system_time_at_creation', 'timestamp_at_creation', 'first_timestamp',
       'direction', 'id', 'display_order', 'fields', 'data', 'filename',
       'decimation', 'clock rate', 'camera_name', 'session_path',
       'first_dtype_name', 'first_item_data', 'last_dtype_name',
       'last_item_data', 'all_subjects', 'current_subject', 'event_indexes',
       'event_timestamps', 'video_name', 'video_timestamps'],
      dtype='object')

## Finding the closest frame to each event

In [65]:
trodes_state_df["event_timestamps"].iloc[1]

array([[ 1293017,  2249095],
       [ 2249492,  2257492],
       [ 2257894,  2260895],
       ...,
       [33527276, 33528276],
       [33529479, 33530476],
       [33532081, 33533079]], dtype=uint32)

In [66]:
trodes_state_df["event_frames"] = trodes_state_df.apply(lambda x: utilities.helper.find_nearest_indices(x["event_timestamps"], x["video_timestamps"]), axis=1)

In [67]:
trodes_state_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject,event_indexes,event_timestamps,video_name,video_timestamps,event_frames
0,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,"[1293017, 2249095, 2249492, 2257492, 2257894, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4]",3.1,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[1293017, 2249095], [2249492, 2257492], [2257...",20240323_144517_alone_comp_subj_3-1_and_3-4.1....,"[1294401, 1294401, 1295787, 1295787, 1297172, ...","[[0, 473], [475, 481], [483, 486], [486, 487],..."
1,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,"[1293017, 2249095, 2249492, 2257492, 2257894, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4]",3.1,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[1293017, 2249095], [2249492, 2257492], [2257...",20240323_144517_alone_comp_subj_3-1_and_3-4.2....,"[1293015, 1294401, 1295787, 1295787, 1297172, ...","[[1, 473], [475, 481], [483, 486], [486, 487],..."
2,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,"[1293017, 2249095, 3449510, 3649512, 5449536, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4]",3.1,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[1293017, 2249095], [3449510, 3649512], [5449...",20240323_144517_alone_comp_subj_3-1_and_3-4.1....,"[1294401, 1294401, 1295787, 1295787, 1297172, ...","[[0, 473], [1673, 1872], [3669, 3868], [4867, ..."
3,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,"[1293017, 2249095, 3449510, 3649512, 5449536, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4]",3.1,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[1293017, 2249095], [3449510, 3649512], [5449...",20240323_144517_alone_comp_subj_3-1_and_3-4.2....,"[1293015, 1294401, 1295787, 1295787, 1297172, ...","[[1, 473], [1673, 1872], [3669, 3868], [4867, ..."
4,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,"[1293017, 2249095, 2324495, 2340498, 2342696, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4]",3.1,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[1293017, 2249095], [2324495, 2340498], [2342...",20240323_144517_alone_comp_subj_3-1_and_3-4.1....,"[1294401, 1294401, 1295787, 1295787, 1297172, ...","[[0, 473], [549, 565], [568, 580], [814, 816],..."


## Combine raw and state dataframes

In [68]:
trodes_state_df

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject,event_indexes,event_timestamps,video_name,video_timestamps,event_frames
0,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,"[1293017, 2249095, 2249492, 2257492, 2257894, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4]",3.1,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[1293017, 2249095], [2249492, 2257492], [2257...",20240323_144517_alone_comp_subj_3-1_and_3-4.1....,"[1294401, 1294401, 1295787, 1295787, 1297172, ...","[[0, 473], [475, 481], [483, 486], [486, 487],..."
1,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,"[1293017, 2249095, 2249492, 2257492, 2257894, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4]",3.1,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[1293017, 2249095], [2249492, 2257492], [2257...",20240323_144517_alone_comp_subj_3-1_and_3-4.2....,"[1293015, 1294401, 1295787, 1295787, 1297172, ...","[[1, 473], [475, 481], [483, 486], [486, 487],..."
2,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,"[1293017, 2249095, 3449510, 3649512, 5449536, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4]",3.1,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[1293017, 2249095], [3449510, 3649512], [5449...",20240323_144517_alone_comp_subj_3-1_and_3-4.1....,"[1294401, 1294401, 1295787, 1295787, 1297172, ...","[[0, 473], [1673, 1872], [3669, 3868], [4867, ..."
3,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,"[1293017, 2249095, 3449510, 3649512, 5449536, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4]",3.1,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[1293017, 2249095], [3449510, 3649512], [5449...",20240323_144517_alone_comp_subj_3-1_and_3-4.2....,"[1293015, 1294401, 1295787, 1295787, 1297172, ...","[[1, 473], [1673, 1872], [3669, 3868], [4867, ..."
4,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,20000,2.3.4,Nov 28 2022,...,"[1293017, 2249095, 2324495, 2340498, 2342696, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[3.1, 3.4]",3.1,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[1293017, 2249095], [2324495, 2340498], [2342...",20240323_144517_alone_comp_subj_3-1_and_3-4.1....,"[1294401, 1294401, 1295787, 1295787, 1297172, ...","[[0, 473], [549, 565], [568, 580], [814, 816],..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
79,20240323_165815_alone_comp_subj_4-2_and_4-4,20240323_165815_alone_comp_subj_4-4_t6b6_merged,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20240323_165815_alone_comp_subj_4-4_t6b6_merge...,20000,2.3.4,Nov 28 2022,...,"[2203533, 2267319, 3467734, 3667734, 5467759, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[4.2, 4.4]",4.4,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2203533, 2267319], [3467734, 3667734], [5467...",20240323_165815_alone_comp_subj_4-2_and_4-4.1....,"[2204917, 2206048, 2206303, 2207689, 2209075, ...","[[0, 64], [1262, 1461], [3258, 3457], [4456, 4..."
80,20240323_165815_alone_comp_subj_4-2_and_4-4,20240323_165815_alone_comp_subj_4-4_t6b6_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240323_165815_alone_comp_subj_4-4_t6b6_merge...,20000,2.3.4,Nov 28 2022,...,"[2203533, 2267319, 2268119, 2268919, 2618523, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[4.2, 4.4]",4.4,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2203533, 2267319], [2268119, 2268919], [2618...",20240323_165815_alone_comp_subj_4-2_and_4-4.2....,"[2203531, 2204917, 2206303, 2206303, 2207689, ...","[[1, 65], [65, 66], [414, 433], [434, 437], [4..."
81,20240323_165815_alone_comp_subj_4-2_and_4-4,20240323_165815_alone_comp_subj_4-4_t6b6_merged,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20240323_165815_alone_comp_subj_4-4_t6b6_merge...,20000,2.3.4,Nov 28 2022,...,"[2203533, 2267319, 2268119, 2268919, 2618523, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[4.2, 4.4]",4.4,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2203533, 2267319], [2268119, 2268919], [2618...",20240323_165815_alone_comp_subj_4-2_and_4-4.1....,"[2204917, 2206048, 2206303, 2207689, 2209075, ...","[[0, 64], [64, 65], [414, 432], [433, 436], [4..."
82,20240323_165815_alone_comp_subj_4-2_and_4-4,20240323_165815_alone_comp_subj_4-4_t6b6_merged,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20240323_165815_alone_comp_subj_4-4_t6b6_merge...,20000,2.3.4,Nov 28 2022,...,"[2203533, 2267319, 2359520, 2361320, 2361920, ...",state,"[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, ...","[4.2, 4.4]",4.4,"[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, ...","[[2203533, 2267319], [2359520, 2361320], [2361...",20240323_165815_alone_comp_subj_4-2_and_4-4.2....,"[2203531, 2204917, 2206303, 2206303, 2207689, ...","[[1, 65], [156, 157], [159, 174], [174, 182], ..."


In [69]:
trodes_state_df = trodes_state_df[STATE_COLS_TO_KEEP].drop_duplicates(subset=["session_dir", "video_name", "metadata_file"]).sort_values(["session_dir", "video_name", "metadata_file"]).reset_index(drop=True).copy()

In [70]:
trodes_state_df.head()

Unnamed: 0,session_dir,metadata_file,event_timestamps,video_name,video_timestamps,event_frames
0,20240320_142408_alone_comp_subj_3-1_and_3-3,dio_ECU_Din1,"[[1830734, 1906208], [3106623, 3306625], [5106...",20240320_142408_alone_comp_subj_3-1_and_3-3.1....,"[1832118, 1833504, 1834890, 1834890, 1836276, ...","[[0, 74], [1272, 1471], [3268, 3469], [4466, 4..."
1,20240320_142408_alone_comp_subj_3-1_and_3-3,dio_ECU_Din2,"[[1830734, 1906208], [1983809, 1989608], [2083...",20240320_142408_alone_comp_subj_3-1_and_3-3.1....,"[1832118, 1833504, 1834890, 1834890, 1836276, ...","[[0, 74], [152, 157], [252, 305], [305, 323], ..."
2,20240320_142408_alone_comp_subj_3-1_and_3-3,dio_ECU_Din3,"[[1830734, 1906208], [1992808, 2056807], [2085...",20240320_142408_alone_comp_subj_3-1_and_3-3.1....,"[1832118, 1833504, 1834890, 1834890, 1836276, ...","[[0, 74], [160, 225], [254, 257], [257, 279], ..."
3,20240320_142408_alone_comp_subj_3-1_and_3-3,dio_ECU_Din1,"[[1830734, 1906208], [3106623, 3306625], [5106...",20240320_142408_alone_comp_subj_3-1_and_3-3.2....,"[1832066, 1832118, 1833504, 1834890, 1836001, ...","[[0, 75], [1273, 1472], [3269, 3470], [4467, 4..."
4,20240320_142408_alone_comp_subj_3-1_and_3-3,dio_ECU_Din2,"[[1830734, 1906208], [1983809, 1989608], [2083...",20240320_142408_alone_comp_subj_3-1_and_3-3.2....,"[1832066, 1832118, 1833504, 1834890, 1836001, ...","[[0, 75], [153, 158], [252, 306], [306, 324], ..."


In [71]:
trodes_state_df = trodes_state_df.groupby(same_columns).agg({**{col: 'first' for col in trodes_state_df.columns if col not in same_columns + different_columns}, **{col: lambda x: x.tolist() for col in different_columns}}).reset_index()

In [72]:
trodes_state_df.head()

Unnamed: 0,session_dir,video_name,video_timestamps,metadata_file,event_frames,event_timestamps
0,20240320_142408_alone_comp_subj_3-1_and_3-3,20240320_142408_alone_comp_subj_3-1_and_3-3.1....,"[1832118, 1833504, 1834890, 1834890, 1836276, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[0, 74], [1272, 1471], [3268, 3469], [4466, ...","[[[1830734, 1906208], [3106623, 3306625], [510..."
1,20240320_142408_alone_comp_subj_3-1_and_3-3,20240320_142408_alone_comp_subj_3-1_and_3-3.2....,"[1832066, 1832118, 1833504, 1834890, 1836001, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[0, 75], [1273, 1472], [3269, 3470], [4467, ...","[[[1830734, 1906208], [3106623, 3306625], [510..."
2,20240320_171038_alone_comp_subj_4-2_and_4-3,20240320_171038_alone_comp_subj_4-2_and_4-3.1....,"[2069102, 2070488, 2071874, 2071874, 2073259, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[0, 79], [1276, 1477], [3272, 3473], [4470, ...","[[[2067718, 2147462], [3347876, 3547879], [534..."
3,20240320_171038_alone_comp_subj_4-2_and_4-3,20240320_171038_alone_comp_subj_4-2_and_4-3.2....,"[2069102, 2070488, 2070488, 2071874, 2073259, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[0, 79], [1276, 1477], [3273, 3473], [4471, ...","[[[2067718, 2147462], [3347876, 3547879], [534..."
4,20240322_120625_alone_comp_subj_3-3_and_3-4,20240322_120625_alone_comp_subj_3-3_and_3-4.1....,"[3619890, 3620489, 3621276, 3622662, 3624047, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[0, 101], [1299, 1500], [3297, 3496], [4494,...","[[[3618506, 3720760], [4921177, 5121179], [692..."


In [73]:
trodes_state_df["tone_timestamps"] = trodes_state_df["event_timestamps"].apply(lambda x: x[0])
trodes_state_df["box_1_port_entry_timestamps"] = trodes_state_df["event_timestamps"].apply(lambda x: x[1])
trodes_state_df["box_2_port_entry_timestamps"] = trodes_state_df["event_timestamps"].apply(lambda x: x[2])

trodes_state_df["tone_frames"] = trodes_state_df["event_frames"].apply(lambda x: x[0])
trodes_state_df["box_1_port_entry_frames"] = trodes_state_df["event_frames"].apply(lambda x: x[1])
trodes_state_df["box_2_port_entry_frames"] = trodes_state_df["event_frames"].apply(lambda x: x[2])


In [74]:
trodes_state_df = trodes_state_df.drop(columns=["event_timestamps", "event_frames", "metadata_file"], errors="ignore")

In [75]:
trodes_state_df.head()

Unnamed: 0,session_dir,video_name,video_timestamps,tone_timestamps,box_1_port_entry_timestamps,box_2_port_entry_timestamps,tone_frames,box_1_port_entry_frames,box_2_port_entry_frames
0,20240320_142408_alone_comp_subj_3-1_and_3-3,20240320_142408_alone_comp_subj_3-1_and_3-3.1....,"[1832118, 1833504, 1834890, 1834890, 1836276, ...","[[1830734, 1906208], [3106623, 3306625], [5106...","[[1830734, 1906208], [1983809, 1989608], [2083...","[[1830734, 1906208], [1992808, 2056807], [2085...","[[0, 74], [1272, 1471], [3268, 3469], [4466, 4...","[[0, 74], [152, 157], [252, 305], [305, 323], ...","[[0, 74], [160, 225], [254, 257], [257, 279], ..."
1,20240320_142408_alone_comp_subj_3-1_and_3-3,20240320_142408_alone_comp_subj_3-1_and_3-3.2....,"[1832066, 1832118, 1833504, 1834890, 1836001, ...","[[1830734, 1906208], [3106623, 3306625], [5106...","[[1830734, 1906208], [1983809, 1989608], [2083...","[[1830734, 1906208], [1992808, 2056807], [2085...","[[0, 75], [1273, 1472], [3269, 3470], [4467, 4...","[[0, 75], [153, 158], [252, 306], [306, 324], ...","[[0, 75], [161, 226], [255, 258], [258, 280], ..."
2,20240320_171038_alone_comp_subj_4-2_and_4-3,20240320_171038_alone_comp_subj_4-2_and_4-3.1....,"[2069102, 2070488, 2071874, 2071874, 2073259, ...","[[2067718, 2147462], [3347876, 3547879], [5347...","[[2067718, 2147462], [2153459, 2160659], [2161...","[[2067718, 2147462], [2147859, 2255860], [2256...","[[0, 79], [1276, 1477], [3272, 3473], [4470, 4...","[[0, 79], [84, 92], [92, 132], [145, 175], [25...","[[0, 79], [79, 186], [188, 191], [191, 268], [..."
3,20240320_171038_alone_comp_subj_4-2_and_4-3,20240320_171038_alone_comp_subj_4-2_and_4-3.2....,"[2069102, 2070488, 2070488, 2071874, 2073259, ...","[[2067718, 2147462], [3347876, 3547879], [5347...","[[2067718, 2147462], [2153459, 2160659], [2161...","[[2067718, 2147462], [2147859, 2255860], [2256...","[[0, 79], [1276, 1477], [3273, 3473], [4471, 4...","[[0, 79], [84, 93], [93, 133], [145, 175], [25...","[[0, 79], [79, 187], [188, 191], [191, 268], [..."
4,20240322_120625_alone_comp_subj_3-3_and_3-4,20240322_120625_alone_comp_subj_3-3_and_3-4.1....,"[3619890, 3620489, 3621276, 3622662, 3624047, ...","[[3618506, 3720760], [4921177, 5121179], [6921...","[[3618506, 3720760], [3721162, 3724162], [3800...","[[3618506, 3720760], [3841964, 3867564], [3906...","[[0, 101], [1299, 1500], [3297, 3496], [4494, ...","[[0, 101], [102, 105], [181, 226], [226, 260],...","[[0, 101], [223, 248], [286, 292], [293, 302],..."


In [76]:
trodes_raw_df = trodes_raw_df[RAW_COLS_TO_KEEP].reset_index(drop=True).copy()

In [77]:
trodes_raw_df.head()

Unnamed: 0,session_dir,recording,original_file,session_path,current_subject,first_item_data,first_timestamp,all_subjects
0,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-1_t5b5_merged,20240323_144517_alone_comp_subj_3-1_t5b5_merge...,/scratch/back_up/reward_competition_extention/...,3.1,"[1293017, 1293018, 1293019, 1293020, 1293021, ...",1293017,"[3.1, 3.4]"
1,20240323_144517_alone_comp_subj_3-1_and_3-4,20240323_144517_alone_comp_subj_3-4_t6b6_merged,20240323_144517_alone_comp_subj_3-4_t6b6_merge...,/scratch/back_up/reward_competition_extention/...,3.4,"[1293017, 1293018, 1293019, 1293020, 1293021, ...",1293017,"[3.1, 3.4]"
2,20240323_122227_alone_comp_subj_5-2_and_5-3,20240323_122227_alone_comp_subj_5-2_t6b6_merged,20240323_122227_along_comp_subj_5-2_t6b6_merge...,/scratch/back_up/reward_competition_extention/...,5.2,"[2058017, 2058018, 2058019, 2058020, 2058021, ...",2058017,"[5.2, 5.3]"
3,20240323_122227_alone_comp_subj_5-2_and_5-3,20240323_122227_alone_comp_subj_5-3_t5b5_merged,20240323_122227_along_comp_subj_5-3_t5b5_merge...,/scratch/back_up/reward_competition_extention/...,5.3,"[2058017, 2058018, 2058019, 2058020, 2058021, ...",2058017,"[5.2, 5.3]"
4,20240320_142408_alone_comp_subj_3-1_and_3-3,20240320_142408_alone_comp_subj_3-3_t5b5_merged,20240320_142408_alone_comp_subj_3-3_t5b5_merge...,/scratch/back_up/reward_competition_extention/...,3.3,"[1830734, 1830735, 1830736, 1830737, 1830738, ...",1830734,"[3.1, 3.3]"


In [78]:
trodes_final_df = pd.merge(trodes_raw_df, trodes_state_df, on=["session_dir"], how="inner")

In [79]:
trodes_final_df.shape

(28, 16)

In [80]:
trodes_final_df = trodes_final_df.rename(columns={"first_item_data": "raw_timestamps"})
trodes_final_df = trodes_final_df.drop(columns=["metadata_file"], errors="ignore")
trodes_final_df = trodes_final_df.sort_values(["session_dir", "recording"]).reset_index(drop=True).copy()

## Making the timestamps 0 indexed

In [81]:
trodes_final_df[[col for col in trodes_final_df.columns if "timestamps" in col]].head()

Unnamed: 0,raw_timestamps,video_timestamps,tone_timestamps,box_1_port_entry_timestamps,box_2_port_entry_timestamps
0,"[1830734, 1830735, 1830736, 1830737, 1830738, ...","[1832118, 1833504, 1834890, 1834890, 1836276, ...","[[1830734, 1906208], [3106623, 3306625], [5106...","[[1830734, 1906208], [1983809, 1989608], [2083...","[[1830734, 1906208], [1992808, 2056807], [2085..."
1,"[1830734, 1830735, 1830736, 1830737, 1830738, ...","[1832066, 1832118, 1833504, 1834890, 1836001, ...","[[1830734, 1906208], [3106623, 3306625], [5106...","[[1830734, 1906208], [1983809, 1989608], [2083...","[[1830734, 1906208], [1992808, 2056807], [2085..."
2,"[1830734, 1830735, 1830736, 1830737, 1830738, ...","[1832118, 1833504, 1834890, 1834890, 1836276, ...","[[1830734, 1906208], [3106623, 3306625], [5106...","[[1830734, 1906208], [1983809, 1989608], [2083...","[[1830734, 1906208], [1992808, 2056807], [2085..."
3,"[1830734, 1830735, 1830736, 1830737, 1830738, ...","[1832066, 1832118, 1833504, 1834890, 1836001, ...","[[1830734, 1906208], [3106623, 3306625], [5106...","[[1830734, 1906208], [1983809, 1989608], [2083...","[[1830734, 1906208], [1992808, 2056807], [2085..."
4,"[2067718, 2067719, 2067720, 2067721, 2067722, ...","[2069102, 2070488, 2071874, 2071874, 2073259, ...","[[2067718, 2147462], [3347876, 3547879], [5347...","[[2067718, 2147462], [2153459, 2160659], [2161...","[[2067718, 2147462], [2147859, 2255860], [2256..."


In [82]:
trodes_final_df["last_timestamp"] = trodes_final_df["raw_timestamps"].apply(lambda x: x[-1])

- Dropping raw timestamps because of memory issues

In [83]:
trodes_final_df = trodes_final_df.drop(columns=["raw_timestamps", "original_file"], errors="ignore")

In [84]:
copy_trodes_final_df = trodes_final_df.copy

In [85]:
for col in [col for col in trodes_final_df.columns if "timestamps" in col]:
    trodes_final_df[col] = trodes_final_df.apply(lambda x: x[col].astype(np.int32) - np.int32(x["first_timestamp"]), axis=1)

for col in [col for col in trodes_final_df.columns if "frames" in col]:
    trodes_final_df[col] = trodes_final_df[col].apply(lambda x: x.astype(np.int32))

In [86]:
sorted_columns = sorted(trodes_final_df.columns
, key=lambda x: x.split("_")[-1])
trodes_final_df = trodes_final_df[sorted_columns].copy()

## Saving to a file

In [87]:
trodes_final_df.to_pickle(os.path.join(OUTPUT_DIR, "{}_00_trodes_metadata.pkl".format(OUTPUT_PREFIX)))

In [88]:
trodes_final_df.head()

Unnamed: 0,session_dir,tone_frames,box_1_port_entry_frames,box_2_port_entry_frames,video_name,session_path,recording,current_subject,all_subjects,first_timestamp,last_timestamp,video_timestamps,tone_timestamps,box_1_port_entry_timestamps,box_2_port_entry_timestamps
0,20240320_142408_alone_comp_subj_3-1_and_3-3,"[[0, 74], [1272, 1471], [3268, 3469], [4466, 4...","[[0, 74], [152, 157], [252, 305], [305, 323], ...","[[0, 74], [160, 225], [254, 257], [257, 279], ...",20240320_142408_alone_comp_subj_3-1_and_3-3.1....,/scratch/back_up/reward_competition_extention/...,20240320_142408_alone_comp_subj_3-1_t6b6_merged,3.1,"[3.1, 3.3]",1830734,65425515,"[1384, 2770, 4156, 4156, 5542, 6928, 6928, 831...","[[0, 75474], [1275889, 1475891], [3275911, 347...","[[0, 75474], [153075, 158874], [252873, 306276...","[[0, 75474], [162074, 226073], [255076, 258076..."
1,20240320_142408_alone_comp_subj_3-1_and_3-3,"[[0, 75], [1273, 1472], [3269, 3470], [4467, 4...","[[0, 75], [153, 158], [252, 306], [306, 324], ...","[[0, 75], [161, 226], [255, 258], [258, 280], ...",20240320_142408_alone_comp_subj_3-1_and_3-3.2....,/scratch/back_up/reward_competition_extention/...,20240320_142408_alone_comp_subj_3-1_t6b6_merged,3.1,"[3.1, 3.3]",1830734,65425515,"[1332, 1384, 2770, 4156, 5267, 5542, 6928, 831...","[[0, 75474], [1275889, 1475891], [3275911, 347...","[[0, 75474], [153075, 158874], [252873, 306276...","[[0, 75474], [162074, 226073], [255076, 258076..."
2,20240320_142408_alone_comp_subj_3-1_and_3-3,"[[0, 74], [1272, 1471], [3268, 3469], [4466, 4...","[[0, 74], [152, 157], [252, 305], [305, 323], ...","[[0, 74], [160, 225], [254, 257], [257, 279], ...",20240320_142408_alone_comp_subj_3-1_and_3-3.1....,/scratch/back_up/reward_competition_extention/...,20240320_142408_alone_comp_subj_3-3_t5b5_merged,3.3,"[3.1, 3.3]",1830734,65425515,"[1384, 2770, 4156, 4156, 5542, 6928, 6928, 831...","[[0, 75474], [1275889, 1475891], [3275911, 347...","[[0, 75474], [153075, 158874], [252873, 306276...","[[0, 75474], [162074, 226073], [255076, 258076..."
3,20240320_142408_alone_comp_subj_3-1_and_3-3,"[[0, 75], [1273, 1472], [3269, 3470], [4467, 4...","[[0, 75], [153, 158], [252, 306], [306, 324], ...","[[0, 75], [161, 226], [255, 258], [258, 280], ...",20240320_142408_alone_comp_subj_3-1_and_3-3.2....,/scratch/back_up/reward_competition_extention/...,20240320_142408_alone_comp_subj_3-3_t5b5_merged,3.3,"[3.1, 3.3]",1830734,65425515,"[1332, 1384, 2770, 4156, 5267, 5542, 6928, 831...","[[0, 75474], [1275889, 1475891], [3275911, 347...","[[0, 75474], [153075, 158874], [252873, 306276...","[[0, 75474], [162074, 226073], [255076, 258076..."
4,20240320_171038_alone_comp_subj_4-2_and_4-3,"[[0, 79], [1276, 1477], [3272, 3473], [4470, 4...","[[0, 79], [84, 92], [92, 132], [145, 175], [25...","[[0, 79], [79, 186], [188, 191], [191, 268], [...",20240320_171038_alone_comp_subj_4-2_and_4-3.1....,/scratch/back_up/reward_competition_extention/...,20240320_171038_alone_comp_subj_4-2_t6b6_merged,4.2,"[4.2, 4.3]",2067718,66452444,"[1384, 2770, 4156, 4156, 5541, 6927, 6927, 831...","[[0, 79744], [1280158, 1480161], [3280184, 348...","[[0, 79744], [85741, 92941], [93544, 134344], ...","[[0, 79744], [80141, 188142], [188942, 191545]..."


In [89]:
trodes_final_df["session_dir"].unique()

array(['20240320_142408_alone_comp_subj_3-1_and_3-3',
       '20240320_171038_alone_comp_subj_4-2_and_4-3',
       '20240322_120625_alone_comp_subj_3-3_and_3-4',
       '20240322_160946_alone_comp_subj_4-3_and_4-4',
       '20240323_122227_alone_comp_subj_5-2_and_5-3',
       '20240323_144517_alone_comp_subj_3-1_and_3-4',
       '20240323_165815_alone_comp_subj_4-2_and_4-4'], dtype=object)

In [90]:
trodes_final_df["video_name"].unique()

array(['20240320_142408_alone_comp_subj_3-1_and_3-3.1.videoTimeStamps.cameraHWSync',
       '20240320_142408_alone_comp_subj_3-1_and_3-3.2.videoTimeStamps.cameraHWSync',
       '20240320_171038_alone_comp_subj_4-2_and_4-3.1.videoTimeStamps.cameraHWSync',
       '20240320_171038_alone_comp_subj_4-2_and_4-3.2.videoTimeStamps.cameraHWSync',
       '20240322_120625_alone_comp_subj_3-3_and_3-4.1.videoTimeStamps.cameraHWSync',
       '20240322_120625_alone_comp_subj_3-3_and_3-4.2.videoTimeStamps.cameraHWSync',
       '20240322_160946_alone_comp_subj_4-3_and_4-4.1.videoTimeStamps.cameraHWSync',
       '20240322_160946_alone_comp_subj_4-3_and_4-4.2.videoTimeStamps.cameraHWSync',
       '20240323_122227_alone_comp_subj_5-2_and_5-3.1.videoTimeStamps.cameraHWSync',
       '20240323_122227_alone_comp_subj_5-2_and_5-3.2.videoTimeStamps.cameraHWSync',
       '20240323_144517_alone_comp_subj_3-1_and_3-4.1.videoTimeStamps.cameraHWSync',
       '20240323_144517_alone_comp_subj_3-1_and_3-4.2.videoTimeSt