# Time Stamp Extract

Brief 1-2 sentence description of notebook.

TODO: Supplement the description
- Notebook that extracts the timestamps and gets the time that tones played

In [1]:
# Imports of all used packages and libraries
import sys
import os
import git
import glob
from collections import defaultdict

In [2]:
git_repo = git.Repo(".", search_parent_directories=True)
git_root = git_repo.git.rev_parse("--show-toplevel")

In [3]:
git_root

'/nancy/user/riwata/projects/reward_comp_ext'

In [4]:
sys.path.insert(0, os.path.join(git_root, 'src'))

In [5]:
# Imports of all used packages and libraries
import glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [6]:
import spikeinterface.extractors as se
import spikeinterface.preprocessing as sp

In [7]:
import utilities.helper
import trodes.read_exported

# Functions

In [8]:
import re

def extract_floats(s):
    """
    Extracts all floats from a string and returns them as a list of strings.

    Parameters:
    - s (str): The string to extract floats from.

    Returns:
    - list: A list of strings, each representing a float found in the input string.
    """
    float_pattern = r"[-+]?\d*\.\d+|\d+"
    return [str(float(num)) for num in re.findall(float_pattern, s)]

## Inputs & Data

- Explanation of each input and where it comes from.

Inputs and Required data loading
- input variable names are in all caps snake case
- Whenever an input changes or is used for processing 
- The variables are all lower in snake case

In [9]:
# Path of the directory that contains the Spike Gadgets recording and the exported timestamp files
# Exported with this tool https://docs.spikegadgets.com/en/latest/basic/ExportFunctions.html
# Export these files:
    # -raw – Continuous raw band export.
    # -dio – Digital IO channel state change export.
    # -analogio – Continuous analog IO export.
INPUT_DIR = "/scratch/back_up/reward_competition_extention/data/rce_cohort_2"
OUTPUT_DIR = r"./proc" # where data is saved should always be shown in the inputs
TONE_DIN = "dio_ECU_Din1"
TONE_STATE = 1
os.makedirs(OUTPUT_DIR, exist_ok=True)
OUTPUT_PREFIX = "rce_pilot_2_standard"

In [10]:
COLS_TO_KEEP = ['session_dir', 'recording', 'metadata_dir', 'metadata_file',
'original_file', 'filename', 'session_path', 'all_subjects',
       'current_subject', 'event_timestamps', 'video_name',
       'video_timestamps', 'event_frames', 'first_item_data']

In [11]:
RAW_COLS_TO_KEEP = ['session_dir',
 'recording',
 'original_file',
 'session_path',
 'current_subject',
 'first_item_data',
 'first_timestamp',
 'all_subjects']

In [12]:
STATE_COLS_TO_KEEP = ['session_dir',
 'metadata_file',
 'event_timestamps',
 'video_name',
 'video_timestamps',
 'event_frames',]

In [13]:
same_columns = ['session_dir', 'video_name']
different_columns = ['metadata_file', 'event_frames', 'event_timestamps']

In [14]:
# TODO: Find way not to hard code this
# ALL_SESSION_DIR = glob.glob("/scratch/back_up/reward_competition_extention/data/standard/2023_06_*/*.rec")
ALL_SESSION_DIR = glob.glob("/scratch/back_up/reward_competition_extention/data/rce_cohort_2/standard/*.rec")



In [15]:
ALL_SESSION_DIR

['/scratch/back_up/reward_competition_extention/data/rce_cohort_2/standard/20230616_111904_standard_comp_to_training_D4_subj_1-4_and_1-2.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_2/standard/20230612_101430_standard_comp_to_training_D1_subj_1-4_and_1-3.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_2/standard/20230614_114041_standard_comp_to_training_D3_subj_1-1_and_1-2.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_2/standard/20230612_112630_standard_comp_to_training_D1_subj_1-2_and_1-1.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_2/standard/20230613_105657_standard_comp_to_training_D2_subj_1-1_and_1-4.rec']

## Outputs

Describe each output that the notebook creates. 

- Is it a plot or is it data?

- How valuable is the output and why is it valuable or useful?

## Other documentation

raw directory
- raw_group0.dat
    - voltage_value: Array with voltage measurement for each channel at each timestamp
- timestamps.dat
    - voltage_time_stamp: The time stamp of each voltage measurement

parent directory
- 1.videoTimeStamps.cameraHWSync
    - frame_number: Calculated by getting the index of each video time stamp tuple 
    - PosTimestamp: The time stamp of each video frame
    - HWframeCount: Unknown value. Starts at 30742 and increases by 1 for each tuple  
    - HWTimestamp: Unknown value. All zeroes
    - video_time: Calculated by dividing the frame number by the fps(frames per second) 
    - video_seconds: video_time, but rounded to seconds  	
    - These are filled in versions of the above collumns with the value from the most recent previous cell
        - filled_PosTimestamp 	
        - filledHWframeCount 	
        - filled_frame_number 	
        - filled_video_time 	
        - filled_video_seconds 	

DIO directory
- dio_ECU_Din1.dat
    - time: The time stamp the corresponds to the DIN input
    - state: Binary state of whether there is input from DIN or not 	
    - trial_number: Calculated by adding 1 to every time there is a DIN input
    - These are filled in versions of the above collumns with the value from the most recent previous cell
        - filled_state 	
        - filled_trial_number

ss_output directory (Spike sorting with Spike interface)
- firings.npz
    - unit_id: All the units that had a spike train for the given timestamp 	
    - number_of_units: Calculated by counting the number of units that had a spike train

## Functions

- function names are short and in snake case all lowercase
- a function name should be unique but does not have to describe the function
- doc strings describe functions not function names

## Processing

Describe what is done to the data here and how inputs are manipulated to generate outputs. 

In [16]:
# As much code and as many cells as required
# includes EDA and playing with data
# GO HAM!

# LOOP 1: Extracting all the Trodes

- Getting all the data from all the exported Trodes files

- Getting all the data from all the exported Trodes files and saving it to `session_to_trodes_data`
    - Creates a dictionary with the structure of:
        - `{dir_name: {file_name: metadata, file_name_2: metadata_2}, dir_name_2: {file_name_3: metadata_3, file_name_4: metadata_4}}`

In [17]:
# Saving the trodes data for each session
# Each key is a session name
# Each value is a dictionary of every recording file in that session
session_to_trodes_data = utilities.helper.create_recursive_dict()


# Saving the path of the session recording
session_to_path = {}

# Going through each session recording
# Which includes all the recordings from all the miniloggers and cameras
for session_path in ALL_SESSION_DIR:   
    try:
        # Getting the name of the session from the path
        session_basename = os.path.splitext(os.path.basename(session_path))[0]
        print("Current Session: {}".format(session_basename))
        # Reading the trodes data for every recording file in the session directory
        session_to_trodes_data[session_basename] = trodes.read_exported.organize_all_trodes_export(session_path)
        
        session_to_path[session_basename] = session_path
    except Exception as e: 
        print(e)


Current Session: 20230616_111904_standard_comp_to_training_D4_subj_1-4_and_1-2
Skipping file 20230616_111904_standard_comp_to_training_D4_subj_1-4_t4b3L_box1_merged.timestampoffset.txt due to error: Settings format not supported


  return np.dtype(dtype_spec)


Skipping file 20230616_111904_standard_comp_to_training_D4_subj_1-2_t2b2L_box2_merged.timestampoffset.txt due to error: Settings format not supported
Current Session: 20230612_101430_standard_comp_to_training_D1_subj_1-4_and_1-3
Skipping file 20230612_101430_standard_comp_to_training_D1_subj_1-3_t3b3L_box2_merged.timestampoffset.txt due to error: Settings format not supported
Skipping file 20230612_101430_standard_comp_to_training_D1_subj_1-4_t4b2L_box1_merged.timestampoffset.txt due to error: Settings format not supported
Current Session: 20230614_114041_standard_comp_to_training_D3_subj_1-1_and_1-2
Skipping file 20230614_114041_standard_comp_to_training_D3_subj_1-2_t2b2L_box2_merged.timestampoffset.txt due to error: Settings format not supported
Skipping file 20230614_114041_standard_comp_to_training_D3_subj_1-1_t1b3L_box1_merged.timestampoffset.txt due to error: Settings format not supported
Current Session: 20230612_112630_standard_comp_to_training_D1_subj_1-2_and_1-1
Skipping file

In [18]:
session_to_trodes_data

defaultdict(<function utilities.helper.create_recursive_dict()>,
            {'20230616_111904_standard_comp_to_training_D4_subj_1-4_and_1-2': defaultdict(dict,
                         {'20230616_111904_standard_comp_to_training_D4_subj_1-4_t4b3L_box1_merged': {'timestampoffset': {},
                           'time': {'timestamps': {'description': 'Timestamps',
                             'byte_order': 'little endian',
                             'original_file': '20230616_111904_standard_comp_to_training_D4_subj_1-4_t4b3L_box1_merged.rec',
                             'clockrate': '20000',
                             'trodes_version': '2.4.0',
                             'compile_date': 'May 24 2023',
                             'compile_time': '10:11:04',
                             'qt_version': '6.2.2',
                             'commit_tag': 'heads/Release_2.4.0-0-g1eecf3b7',
                             'controller_firmware': '3.17',
                             'heads

- Adding the video timestamps

In [19]:
for session_path in ALL_SESSION_DIR:   
    try:
        session_basename = os.path.splitext(os.path.basename(session_path))[0]
        print("Current Session: {}".format(session_basename))
        file_to_video_timestamps = {}
        for video_timestamps in glob.glob(os.path.join(session_path, "*cameraHWSync")):
            video_basename = os.path.basename(video_timestamps)
            print("Current Video Name: {}".format(video_basename))
            timestamp_array = trodes.read_exported.read_trodes_extracted_data_file(video_timestamps)
            if "video_timestamps" not in session_to_trodes_data[session_basename][session_basename]:
                session_to_trodes_data[session_basename][session_basename]["video_timestamps"] = defaultdict(dict)
            session_to_trodes_data[session_basename][session_basename]["video_timestamps"][video_basename.split(".")[-3]] = timestamp_array
    
    
    except Exception as e: 
        print(e)

Current Session: 20230616_111904_standard_comp_to_training_D4_subj_1-4_and_1-2
Current Video Name: 20230616_111904_standard_comp_to_training_D4_subj_1-4_and_1-2.1.videoTimeStamps.cameraHWSync
Current Video Name: 20230616_111904_standard_comp_to_training_D4_subj_1-4_and_1-2.2.videoTimeStamps.cameraHWSync
Current Session: 20230612_101430_standard_comp_to_training_D1_subj_1-4_and_1-3
Current Video Name: 20230612_101430_standard_comp_to_training_D1_subj_1-4_and_1-3.2.videoTimeStamps.cameraHWSync
Current Video Name: 20230612_101430_standard_comp_to_training_D1_subj_1-4_and_1-3.1.videoTimeStamps.cameraHWSync
Current Session: 20230614_114041_standard_comp_to_training_D3_subj_1-1_and_1-2
Current Video Name: 20230614_114041_standard_comp_to_training_D3_subj_1-1_and_1-2.3.videoTimeStamps.cameraHWSync
Current Video Name: 20230614_114041_standard_comp_to_training_D3_subj_1-1_and_1-2.2.videoTimeStamps.cameraHWSync
Current Video Name: 20230614_114041_standard_comp_to_training_D3_subj_1-1_and_1-2.1.v

In [20]:
session_to_trodes_data[session_basename][session_basename]["video_timestamps"]

defaultdict(dict,
            {'1': {'clock rate': '30000',
              'fields': '<PosTimestamp uint32><HWframeCount uint32><HWTimestamp uint64>',
              'data': array([( 7455977, 0, 0), ( 7455977, 0, 0), ( 7457363, 0, 0), ...,
                     (75874971, 0, 0), (75874971, 0, 0), (75876357, 0, 0)],
                    dtype=[('PosTimestamp', '<u4'), ('HWframeCount', '<u4'), ('HWTimestamp', '<u8')]),
              'filename': '20230613_105657_standard_comp_to_training_D2_subj_1-1_and_1-4.1.videoTimeStamps.cameraHWSync'},
             '2': {'clock rate': '30000',
              'fields': '<PosTimestamp uint32><HWframeCount uint32><HWTimestamp uint64>',
              'data': array([( 7454591, 0, 0), ( 7455977, 0, 0), ( 7457363, 0, 0), ...,
                     (75873585, 0, 0), (75874971, 0, 0), (75876357, 0, 0)],
                    dtype=[('PosTimestamp', '<u4'), ('HWframeCount', '<u4'), ('HWTimestamp', '<u8')]),
              'filename': '20230613_105657_standard_comp_to_t

- Creating a dataframe the dictionary with a column for:
  - Session directory
  - Recording name
  - Metadata directory
  - Metadata file
  - And a column for each metadata

In [21]:
# Creating a dataframe from the nested dictionary
trodes_metadata_df = pd.DataFrame.from_dict({(i,j,k,l): session_to_trodes_data[i][j][k][l] 
                           for i in session_to_trodes_data.keys() 
                           for j in session_to_trodes_data[i].keys()
                           for k in session_to_trodes_data[i][j].keys()
                           for l in session_to_trodes_data[i][j][k].keys()},
                           orient='index')

# Resetting the index and renaming the columns
trodes_metadata_df = trodes_metadata_df.reset_index()
trodes_metadata_df = trodes_metadata_df.rename(columns={'level_0': 'session_dir', 'level_1': 'recording', 'level_2': 'metadata_dir', 'level_3': 'metadata_file'}, errors="ignore")

# Adding the session path to the dataframe
trodes_metadata_df["session_path"] = trodes_metadata_df["session_dir"].map(session_to_path)

In [22]:
trodes_metadata_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,first_timestamp,decimation,fields,data,filename,direction,id,display_order,clock rate,session_path
0,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,time,timestamps,Timestamps,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,307664,20.0,<time uint32><systime int64>,"[[307664, 1686928758076131800], [307665, 16869...",20230616_111904_standard_comp_to_training_D4_s...,,,,,/scratch/back_up/reward_competition_extention/...
1,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,raw,timestamps,Raw timestamps,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,307664,,<time uint32>,"[[307664], [307665], [307666], [307667], [3076...",20230616_111904_standard_comp_to_training_D4_s...,,,,,/scratch/back_up/reward_competition_extention/...
2,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,raw,coordinates,Pad locations in microns,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,307664,,<ml int32><dv int32><ap int32>,"[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [...",20230616_111904_standard_comp_to_training_D4_s...,,,,,/scratch/back_up/reward_competition_extention/...
3,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,analog,timestamps,Analog IO timestamps,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,307664,,<time uint32>,"[[307664], [307665], [307666], [307667], [3076...",20230616_111904_standard_comp_to_training_D4_s...,,,,,/scratch/back_up/reward_competition_extention/...
4,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,DIO,dio_ECU_Dout3,State change data for one digital channel. Dis...,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,307664,,<time uint32><state uint8>,"[[307664, 0]]",20230616_111904_standard_comp_to_training_D4_s...,output,ECU_Dout3,4.0,,/scratch/back_up/reward_competition_extention/...


In [23]:
trodes_metadata_df.tail()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,first_timestamp,decimation,fields,data,filename,direction,id,display_order,clock rate,session_path
126,20230614_114041_standard_comp_to_training_D3_s...,20230614_114041_standard_comp_to_training_D3_s...,video_timestamps,1,,,,,,,...,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[34373721, 0, 0], [34375107, 0, 0], [34376493...",20230614_114041_standard_comp_to_training_D3_s...,,,,30000,/scratch/back_up/reward_competition_extention/...
127,20230612_112630_standard_comp_to_training_D1_s...,20230612_112630_standard_comp_to_training_D1_s...,video_timestamps,1,,,,,,,...,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[7978450, 0, 0], [7979510, 0, 0], [7979835, 0...",20230612_112630_standard_comp_to_training_D1_s...,,,,30000,/scratch/back_up/reward_competition_extention/...
128,20230612_112630_standard_comp_to_training_D1_s...,20230612_112630_standard_comp_to_training_D1_s...,video_timestamps,2,,,,,,,...,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[7978450, 0, 0], [7979597, 0, 0], [7979835, 0...",20230612_112630_standard_comp_to_training_D1_s...,,,,30000,/scratch/back_up/reward_competition_extention/...
129,20230613_105657_standard_comp_to_training_D2_s...,20230613_105657_standard_comp_to_training_D2_s...,video_timestamps,1,,,,,,,...,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[7455977, 0, 0], [7455977, 0, 0], [7457363, 0...",20230613_105657_standard_comp_to_training_D2_s...,,,,30000,/scratch/back_up/reward_competition_extention/...
130,20230613_105657_standard_comp_to_training_D2_s...,20230613_105657_standard_comp_to_training_D2_s...,video_timestamps,2,,,,,,,...,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[7454591, 0, 0], [7455977, 0, 0], [7457363, 0...",20230613_105657_standard_comp_to_training_D2_s...,,,,30000,/scratch/back_up/reward_competition_extention/...


- Getting the first item from each tuple in the arrays in the `data` column
  - This first item is usually just the timestamp

In [24]:
trodes_metadata_df["data"].iloc[0]

array([(  307664, 1686928758076131800), (  307665, 1686928758076134800),
       (  307666, 1686928758076136500), ...,
       (67788420, 1686932131985293600), (67788421, 1686932131985298400),
       (67788422, 1686932131985303100)],
      dtype=[('time', '<u4'), ('systime', '<i8')])

In [25]:
# Getting the dtype name of each column in the numpy array
trodes_metadata_df["first_dtype_name"] = trodes_metadata_df["data"].apply(lambda x: x.dtype.names[0])
# Getting the first item of each column in the numpy array
trodes_metadata_df["first_item_data"] = trodes_metadata_df["data"].apply(lambda x: x[x.dtype.names[0]])


In [26]:
# Same as above but for the last column
trodes_metadata_df["last_dtype_name"] = trodes_metadata_df["data"].apply(lambda x: x.dtype.names[-1])
trodes_metadata_df["last_item_data"] = trodes_metadata_df["data"].apply(lambda x: x[x.dtype.names[-1]])

In [27]:
trodes_metadata_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,filename,direction,id,display_order,clock rate,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data
0,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,time,timestamps,Timestamps,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,20230616_111904_standard_comp_to_training_D4_s...,,,,,/scratch/back_up/reward_competition_extention/...,time,"[307664, 307665, 307666, 307667, 307668, 30766...",systime,"[1686928758076131800, 1686928758076134800, 168..."
1,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,raw,timestamps,Raw timestamps,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,20230616_111904_standard_comp_to_training_D4_s...,,,,,/scratch/back_up/reward_competition_extention/...,time,"[307664, 307665, 307666, 307667, 307668, 30766...",time,"[307664, 307665, 307666, 307667, 307668, 30766..."
2,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,raw,coordinates,Pad locations in microns,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,20230616_111904_standard_comp_to_training_D4_s...,,,,,/scratch/back_up/reward_competition_extention/...,ml,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",ap,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
3,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,analog,timestamps,Analog IO timestamps,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,20230616_111904_standard_comp_to_training_D4_s...,,,,,/scratch/back_up/reward_competition_extention/...,time,"[307664, 307665, 307666, 307667, 307668, 30766...",time,"[307664, 307665, 307666, 307667, 307668, 30766..."
4,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,DIO,dio_ECU_Dout3,State change data for one digital channel. Dis...,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,20230616_111904_standard_comp_to_training_D4_s...,output,ECU_Dout3,4.0,,/scratch/back_up/reward_competition_extention/...,time,[307664],state,[0]


In [28]:
trodes_metadata_df.tail()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,filename,direction,id,display_order,clock rate,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data
126,20230614_114041_standard_comp_to_training_D3_s...,20230614_114041_standard_comp_to_training_D3_s...,video_timestamps,1,,,,,,,...,20230614_114041_standard_comp_to_training_D3_s...,,,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[34373721, 34375107, 34376493, 34377879, 34377...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
127,20230612_112630_standard_comp_to_training_D1_s...,20230612_112630_standard_comp_to_training_D1_s...,video_timestamps,1,,,,,,,...,20230612_112630_standard_comp_to_training_D1_s...,,,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[7978450, 7979510, 7979835, 7981221, 7982607, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
128,20230612_112630_standard_comp_to_training_D1_s...,20230612_112630_standard_comp_to_training_D1_s...,video_timestamps,2,,,,,,,...,20230612_112630_standard_comp_to_training_D1_s...,,,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[7978450, 7979597, 7979835, 7981221, 7982607, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
129,20230613_105657_standard_comp_to_training_D2_s...,20230613_105657_standard_comp_to_training_D2_s...,video_timestamps,1,,,,,,,...,20230613_105657_standard_comp_to_training_D2_s...,,,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[7455977, 7455977, 7457363, 7458749, 7458749, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
130,20230613_105657_standard_comp_to_training_D2_s...,20230613_105657_standard_comp_to_training_D2_s...,video_timestamps,2,,,,,,,...,20230613_105657_standard_comp_to_training_D2_s...,,,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[7454591, 7455977, 7457363, 7458749, 7458749, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."


In [29]:
trodes_metadata_df["recording"].unique()

array(['20230616_111904_standard_comp_to_training_D4_subj_1-4_t4b3L_box1_merged',
       '20230616_111904_standard_comp_to_training_D4_subj_1-2_t2b2L_box2_merged',
       '20230612_101430_standard_comp_to_training_D1_subj_1-3_t3b3L_box2_merged',
       '20230612_101430_standard_comp_to_training_D1_subj_1-4_t4b2L_box1_merged',
       '20230614_114041_standard_comp_to_training_D3_subj_1-2_t2b2L_box2_merged',
       '20230614_114041_standard_comp_to_training_D3_subj_1-1_t1b3L_box1_merged',
       '20230612_112630_standard_comp_to_training_D1_subj_1-2_t2b2L_box1_merged',
       '20230612_112630_standard_comp_to_training_D1_subj_1-1_t1b3L_box2_merged',
       '20230613_105657_standard_comp_to_training_D2_subj_1-4_t4b3L_box2_merged',
       '20230613_105657_standard_comp_to_training_D2_subj_1-1_t1b2L_box1_merged',
       '20230616_111904_standard_comp_to_training_D4_subj_1-4_and_1-2',
       '20230612_101430_standard_comp_to_training_D1_subj_1-4_and_1-3',
       '20230614_114041_standard_com

## Getting the subject information from the metadata

In [30]:
def split_by_multiple_delimiters(s, delimiters):
    """
    Splits a string by multiple delimiters.

    Parameters:
    - s (str): The string to split.
    - delimiters (list): A list of delimiters to split the string by.

    Returns:
    - list: A list of substrings.
    """
    return re.split('|'.join(map(re.escape, delimiters)), s)


In [31]:
trodes_metadata_df["all_subjects"] = trodes_metadata_df["session_dir"].apply(lambda x: x.split("subj")[-1].strip("_").replace("-", "."))#.split("t")[0].strip("_").replace("_",".").split(".and."))
trodes_metadata_df["all_subjects"] = trodes_metadata_df["all_subjects"].apply(lambda x: sorted(extract_floats(x)))

In [32]:
trodes_metadata_df["session_dir"].iloc[0]

'20230616_111904_standard_comp_to_training_D4_subj_1-4_and_1-2'

In [33]:
trodes_metadata_df["all_subjects"].apply(lambda x: tuple(x)).unique()

array([('1.2', '1.4'), ('1.3', '1.4'), ('1.1', '1.2'), ('1.1', '1.4')],
      dtype=object)

In [34]:
trodes_metadata_df["current_subject"] = trodes_metadata_df["recording"].apply(lambda x: x.split("subj")[-1].strip("_").replace("-", ".").replace("_", "."))#.split("t")[0].strip("_").replace("_",".").split(".and."))
trodes_metadata_df["current_subject"] = trodes_metadata_df["current_subject"].apply(lambda x: str(extract_floats(x)[0]).strip())


In [35]:
trodes_metadata_df["current_subject"].unique()

array(['1.4', '1.2', '1.3', '1.1'], dtype=object)

## Dropping all the rows with unneeded metadata

In [36]:
trodes_metadata_df["metadata_dir"].unique()

array(['time', 'raw', 'analog', 'DIO', 'video_timestamps'], dtype=object)

In [37]:
METADATA_TO_KEEP = ['raw', 'DIO', 'video_timestamps']

In [38]:
trodes_metadata_df = trodes_metadata_df[trodes_metadata_df["metadata_dir"].isin(METADATA_TO_KEEP)]

In [39]:
trodes_metadata_df = trodes_metadata_df[~trodes_metadata_df["metadata_file"].str.contains("out")]
trodes_metadata_df = trodes_metadata_df[~trodes_metadata_df["metadata_file"].str.contains("coordinates")]


In [40]:
trodes_metadata_df = trodes_metadata_df.reset_index(drop=True)

# Getting the first time stamp of each recording

In [41]:
trodes_raw_df = trodes_metadata_df[(trodes_metadata_df["metadata_dir"] == "raw") & (trodes_metadata_df["metadata_file"] == "timestamps")].copy()


In [42]:
trodes_raw_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,id,display_order,clock rate,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
0,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,raw,timestamps,Raw timestamps,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[307664, 307665, 307666, 307667, 307668, 30766...",time,"[307664, 307665, 307666, 307667, 307668, 30766...","[1.2, 1.4]",1.4
9,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,raw,timestamps,Raw timestamps,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[307664, 307665, 307666, 307667, 307668, 30766...",time,"[307664, 307665, 307666, 307667, 307668, 30766...","[1.2, 1.4]",1.2
14,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,raw,timestamps,Raw timestamps,little endian,20230612_101430_standard_comp_to_training_D1_s...,20000,2.4.0,May 24 2023,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[8798886, 8798887, 8798888, 8798889, 8798890, ...",time,"[8798886, 8798887, 8798888, 8798889, 8798890, ...","[1.3, 1.4]",1.3
15,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,raw,timestamps,Raw timestamps,little endian,20230612_101430_standard_comp_to_training_D1_s...,20000,2.4.0,May 24 2023,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[8798886, 8798887, 8798888, 8798889, 8798890, ...",time,"[8798886, 8798887, 8798888, 8798889, 8798890, ...","[1.3, 1.4]",1.4
24,20230614_114041_standard_comp_to_training_D3_s...,20230614_114041_standard_comp_to_training_D3_s...,raw,timestamps,Raw timestamps,little endian,20230614_114041_standard_comp_to_training_D3_s...,20000,2.4.0,May 24 2023,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[34373723, 34373724, 34373725, 34373726, 34373...",time,"[34373723, 34373724, 34373725, 34373726, 34373...","[1.1, 1.2]",1.2


In [43]:
trodes_raw_df["first_timestamp"] = trodes_raw_df["first_item_data"].apply(lambda x: x[0])

In [44]:
trodes_raw_df["recording"].iloc[0]

'20230616_111904_standard_comp_to_training_D4_subj_1-4_t4b3L_box1_merged'

In [45]:
recording_to_first_timestamp = trodes_raw_df.set_index('session_dir')['first_timestamp'].to_dict()

In [46]:
recording_to_first_timestamp

{'20230616_111904_standard_comp_to_training_D4_subj_1-4_and_1-2': 307664,
 '20230612_101430_standard_comp_to_training_D1_subj_1-4_and_1-3': 8798886,
 '20230614_114041_standard_comp_to_training_D3_subj_1-1_and_1-2': 34373723,
 '20230612_112630_standard_comp_to_training_D1_subj_1-2_and_1-1': 7977066,
 '20230613_105657_standard_comp_to_training_D2_subj_1-1_and_1-4': 7454593}

In [47]:
trodes_metadata_df["first_timestamp"] = trodes_metadata_df["session_dir"].map(recording_to_first_timestamp)

In [48]:
trodes_metadata_df["first_timestamp"]

0       307664
1       307664
2       307664
3       307664
4       307664
        ...   
56    34373723
57     7977066
58     7977066
59     7454593
60     7454593
Name: first_timestamp, Length: 61, dtype: int64

# Getting the event timestamps

In [49]:
trodes_metadata_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,id,display_order,clock rate,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
0,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,raw,timestamps,Raw timestamps,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,,,,/scratch/back_up/reward_competition_extention/...,time,"[307664, 307665, 307666, 307667, 307668, 30766...",time,"[307664, 307665, 307666, 307667, 307668, 30766...","[1.2, 1.4]",1.4
1,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,ECU_Din3,8.0,,/scratch/back_up/reward_competition_extention/...,time,"[307664, 32080654, 32081454, 32082454, 3208385...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.2, 1.4]",1.4
2,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,ECU_Din2,6.0,,/scratch/back_up/reward_competition_extention/...,time,"[307664, 646678, 647879, 648676, 693876, 69427...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.2, 1.4]",1.4
3,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,ECU_Din1,7.0,,/scratch/back_up/reward_competition_extention/...,time,"[307664, 395679, 595679, 2795675, 2995676, 509...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.2, 1.4]",1.4
4,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,DIO,dio_ECU_Din4,State change data for one digital channel. Dis...,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,ECU_Din4,9.0,,/scratch/back_up/reward_competition_extention/...,time,[307664],state,[0],"[1.2, 1.4]",1.4


In [50]:
trodes_metadata_df.tail()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,id,display_order,clock rate,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
56,20230614_114041_standard_comp_to_training_D3_s...,20230614_114041_standard_comp_to_training_D3_s...,video_timestamps,1,,,,,,,...,,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[34373721, 34375107, 34376493, 34377879, 34377...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1.1, 1.2]",1.1
57,20230612_112630_standard_comp_to_training_D1_s...,20230612_112630_standard_comp_to_training_D1_s...,video_timestamps,1,,,,,,,...,,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[7978450, 7979510, 7979835, 7981221, 7982607, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1.1, 1.2]",1.2
58,20230612_112630_standard_comp_to_training_D1_s...,20230612_112630_standard_comp_to_training_D1_s...,video_timestamps,2,,,,,,,...,,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[7978450, 7979597, 7979835, 7981221, 7982607, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1.1, 1.2]",1.2
59,20230613_105657_standard_comp_to_training_D2_s...,20230613_105657_standard_comp_to_training_D2_s...,video_timestamps,1,,,,,,,...,,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[7455977, 7455977, 7457363, 7458749, 7458749, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1.1, 1.4]",1.1
60,20230613_105657_standard_comp_to_training_D2_s...,20230613_105657_standard_comp_to_training_D2_s...,video_timestamps,2,,,,,,,...,,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[7454591, 7455977, 7457363, 7458749, 7458749, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1.1, 1.4]",1.1


In [51]:
# trodes_state_df = trodes_metadata_df[trodes_metadata_df["last_dtype_name"] == "state"].copy()

# Filtering for digital IO channels
trodes_state_df = trodes_metadata_df[trodes_metadata_df["metadata_dir"].isin(["DIO"])].copy()
# Filtering for tone and port entry related channels
trodes_state_df = trodes_metadata_df[trodes_metadata_df["id"].isin(["ECU_Din1", "ECU_Din2", "ECU_Din3"])].copy()


In [52]:
trodes_state_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,id,display_order,clock rate,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
1,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,ECU_Din3,8,,/scratch/back_up/reward_competition_extention/...,time,"[307664, 32080654, 32081454, 32082454, 3208385...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.2, 1.4]",1.4
2,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,ECU_Din2,6,,/scratch/back_up/reward_competition_extention/...,time,"[307664, 646678, 647879, 648676, 693876, 69427...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.2, 1.4]",1.4
3,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,ECU_Din1,7,,/scratch/back_up/reward_competition_extention/...,time,"[307664, 395679, 595679, 2795675, 2995676, 509...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.2, 1.4]",1.4
6,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,ECU_Din3,8,,/scratch/back_up/reward_competition_extention/...,time,"[307664, 32080654, 32081454, 32082454, 3208385...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.2, 1.4]",1.2
7,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,ECU_Din1,7,,/scratch/back_up/reward_competition_extention/...,time,"[307664, 395679, 595679, 2795675, 2995676, 509...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.2, 1.4]",1.2


In [53]:
trodes_state_df["event_indexes"] = trodes_state_df.apply(lambda x: np.column_stack([np.where(x["last_item_data"] == 1)[0], np.where(x["last_item_data"] == 1)[0]+1]), axis=1)

In [54]:
trodes_state_df["event_indexes"] = trodes_state_df.apply(lambda x: x["event_indexes"][x["event_indexes"][:, 1] <= x["first_item_data"].shape[0] - 1], axis=1)

In [55]:
trodes_state_df["event_timestamps"] = trodes_state_df.apply(lambda x: x["first_item_data"][x["event_indexes"]], axis=1)

## Updating the video timestamps

## Syncing up the video frame data

In [56]:
# Getting the rows that are the metadata for the video timestamps
trodes_video_df = trodes_metadata_df[trodes_metadata_df["metadata_dir"] == "video_timestamps"].copy().reset_index(drop=True)



In [57]:
# Filtering for the first video only
# This only applies to this pilot data where we are only looking the at competition data
# trodes_video_df = trodes_video_df[trodes_video_df["metadata_file"] == "1"].copy()

In [58]:
trodes_video_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,id,display_order,clock rate,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
0,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,video_timestamps,1,,,,,,,...,,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[309047, 309047, 310433, 311819, 313205, 31320...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1.2, 1.4]",1.4
1,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,video_timestamps,2,,,,,,,...,,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[309047, 309047, 310433, 310433, 311819, 31320...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1.2, 1.4]",1.4
2,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,video_timestamps,2,,,,,,,...,,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[8798884, 8800270, 8801656, 8803042, 8803042, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1.3, 1.4]",1.4
3,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,video_timestamps,1,,,,,,,...,,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[8798884, 8800270, 8801656, 8803042, 8803042, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1.3, 1.4]",1.4
4,20230614_114041_standard_comp_to_training_D3_s...,20230614_114041_standard_comp_to_training_D3_s...,video_timestamps,3,,,,,,,...,,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[78832417, 78832417, 78833803, 78835189, 78835...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1.1, 1.2]",1.1


In [59]:
# Making the video timestamps into an evenly distributed array
trodes_video_df["video_timestamps"] = trodes_video_df["first_item_data"]

In [60]:
# Removing the columns that are no longer needed
trodes_video_df = trodes_video_df[["filename", "video_timestamps", "session_dir"]].copy()

In [61]:
# Renaming the filename so that we can merge with other dataframes with the same column name
trodes_video_df = trodes_video_df.rename(columns={"filename": "video_name"})

In [62]:
trodes_video_df.head()

Unnamed: 0,video_name,video_timestamps,session_dir
0,20230616_111904_standard_comp_to_training_D4_s...,"[309047, 309047, 310433, 311819, 313205, 31320...",20230616_111904_standard_comp_to_training_D4_s...
1,20230616_111904_standard_comp_to_training_D4_s...,"[309047, 309047, 310433, 310433, 311819, 31320...",20230616_111904_standard_comp_to_training_D4_s...
2,20230612_101430_standard_comp_to_training_D1_s...,"[8798884, 8800270, 8801656, 8803042, 8803042, ...",20230612_101430_standard_comp_to_training_D1_s...
3,20230612_101430_standard_comp_to_training_D1_s...,"[8798884, 8800270, 8801656, 8803042, 8803042, ...",20230612_101430_standard_comp_to_training_D1_s...
4,20230614_114041_standard_comp_to_training_D3_s...,"[78832417, 78832417, 78833803, 78835189, 78835...",20230614_114041_standard_comp_to_training_D3_s...


- Adding each video as a row to each state row

In [63]:
trodes_state_df = pd.merge(trodes_state_df, trodes_video_df, on=["session_dir"], how="inner")

In [64]:
trodes_state_df.columns

Index(['session_dir', 'recording', 'metadata_dir', 'metadata_file',
       'description', 'byte_order', 'original_file', 'clockrate',
       'trodes_version', 'compile_date', 'compile_time', 'qt_version',
       'commit_tag', 'controller_firmware', 'headstage_firmware',
       'controller_serialnum', 'headstage_serialnum', 'autosettle', 'smartref',
       'gyro', 'accelerometer', 'magnetometer', 'time_offset',
       'system_time_at_creation', 'timestamp_at_creation', 'first_timestamp',
       'decimation', 'fields', 'data', 'filename', 'direction', 'id',
       'display_order', 'clock rate', 'session_path', 'first_dtype_name',
       'first_item_data', 'last_dtype_name', 'last_item_data', 'all_subjects',
       'current_subject', 'event_indexes', 'event_timestamps', 'video_name',
       'video_timestamps'],
      dtype='object')

## Finding the closest frame to each event

In [65]:
trodes_state_df["event_timestamps"].iloc[1]

array([[32080654, 32081454],
       [32082454, 32083854],
       [32084654, 32085655],
       [32096854, 32109657],
       [32143856, 32152457],
       [32153457, 32154457],
       [32157057, 32165457],
       [32165857, 32167057],
       [32171054, 32201456],
       [32475257, 32476057],
       [32476857, 32480254],
       [32480654, 32503457],
       [32505257, 32511654],
       [32516854, 32517654],
       [32518654, 32642654],
       [32644056, 32669854],
       [32834059, 32837054],
       [32839654, 32873856],
       [33199256, 33240056],
       [33252653, 33254656],
       [33255456, 33277256],
       [33583654, 33590456],
       [33818255, 33831455],
       [34187853, 34228653],
       [34396253, 34397256],
       [34487460, 34583055],
       [34583853, 34592652],
       [34626053, 34690855],
       [34691653, 34703253],
       [34956455, 34976853],
       [34991253, 34994453],
       [34995855, 35003252],
       [35037055, 35122455],
       [35125453, 35146252],
       [351474

In [66]:
trodes_state_df["event_frames"] = trodes_state_df.apply(lambda x: utilities.helper.find_nearest_indices(x["event_timestamps"], x["video_timestamps"]), axis=1)

In [67]:
trodes_state_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject,event_indexes,event_timestamps,video_name,video_timestamps,event_frames
0,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,"[307664, 32080654, 32081454, 32082454, 3208385...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.2, 1.4]",1.4,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[32080654, 32081454], [32082454, 32083854], [...",20230616_111904_standard_comp_to_training_D4_s...,"[309047, 309047, 310433, 311819, 313205, 31320...","[[31729, 31729], [31731, 31732], [31733, 31735..."
1,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,"[307664, 32080654, 32081454, 32082454, 3208385...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.2, 1.4]",1.4,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[32080654, 32081454], [32082454, 32083854], [...",20230616_111904_standard_comp_to_training_D4_s...,"[309047, 309047, 310433, 310433, 311819, 31320...","[[36005, 36005], [36006, 36008], [36009, 36010..."
2,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,"[307664, 646678, 647879, 648676, 693876, 69427...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.2, 1.4]",1.4,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[646678, 647879], [648676, 693876], [694279, ...",20230616_111904_standard_comp_to_training_D4_s...,"[309047, 309047, 310433, 311819, 313205, 31320...","[[338, 339], [341, 385], [385, 482], [696, 703..."
3,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,"[307664, 646678, 647879, 648676, 693876, 69427...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.2, 1.4]",1.4,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[646678, 647879], [648676, 693876], [694279, ...",20230616_111904_standard_comp_to_training_D4_s...,"[309047, 309047, 310433, 310433, 311819, 31320...","[[422, 423], [425, 480], [480, 601], [869, 877..."
4,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,"[307664, 395679, 595679, 2795675, 2995676, 509...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.2, 1.4]",1.4,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[395679, 595679], [2795675, 2995676], [509567...",20230616_111904_standard_comp_to_training_D4_s...,"[309047, 309047, 310433, 311819, 313205, 31320...","[[87, 287], [2483, 2683], [4778, 4979], [6575,..."


## Combine raw and state dataframes

In [68]:
trodes_state_df

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject,event_indexes,event_timestamps,video_name,video_timestamps,event_frames
0,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,"[307664, 32080654, 32081454, 32082454, 3208385...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.2, 1.4]",1.4,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[32080654, 32081454], [32082454, 32083854], [...",20230616_111904_standard_comp_to_training_D4_s...,"[309047, 309047, 310433, 311819, 313205, 31320...","[[31729, 31729], [31731, 31732], [31733, 31735..."
1,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,"[307664, 32080654, 32081454, 32082454, 3208385...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.2, 1.4]",1.4,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[32080654, 32081454], [32082454, 32083854], [...",20230616_111904_standard_comp_to_training_D4_s...,"[309047, 309047, 310433, 310433, 311819, 31320...","[[36005, 36005], [36006, 36008], [36009, 36010..."
2,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,"[307664, 646678, 647879, 648676, 693876, 69427...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.2, 1.4]",1.4,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[646678, 647879], [648676, 693876], [694279, ...",20230616_111904_standard_comp_to_training_D4_s...,"[309047, 309047, 310433, 311819, 313205, 31320...","[[338, 339], [341, 385], [385, 482], [696, 703..."
3,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,"[307664, 646678, 647879, 648676, 693876, 69427...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.2, 1.4]",1.4,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[646678, 647879], [648676, 693876], [694279, ...",20230616_111904_standard_comp_to_training_D4_s...,"[309047, 309047, 310433, 310433, 311819, 31320...","[[422, 423], [425, 480], [480, 601], [869, 877..."
4,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20230616_111904_standard_comp_to_training_D4_s...,20000,2.4.0,May 24 2023,...,"[307664, 395679, 595679, 2795675, 2995676, 509...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.2, 1.4]",1.4,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[395679, 595679], [2795675, 2995676], [509567...",20230616_111904_standard_comp_to_training_D4_s...,"[309047, 309047, 310433, 311819, 313205, 31320...","[[87, 287], [2483, 2683], [4778, 4979], [6575,..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61,20230613_105657_standard_comp_to_training_D2_s...,20230613_105657_standard_comp_to_training_D2_s...,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20230613_105657_standard_comp_to_training_D2_s...,20000,2.4.0,May 24 2023,...,"[7454593, 41078926, 41277526, 41587525, 416451...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4]",1.1,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[41078926, 41277526], [41587525, 41645128], [...",20230613_105657_standard_comp_to_training_D2_s...,"[7454591, 7455977, 7457363, 7458749, 7458749, ...","[[33561, 33759], [34069, 34126], [34127, 34139..."
62,20230613_105657_standard_comp_to_training_D2_s...,20230613_105657_standard_comp_to_training_D2_s...,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20230613_105657_standard_comp_to_training_D2_s...,20000,2.4.0,May 24 2023,...,"[7454593, 7503951, 7522151, 7524748, 7533948, ...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4]",1.1,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[7503951, 7522151], [7524748, 7533948], [7815...",20230613_105657_standard_comp_to_training_D2_s...,"[7455977, 7455977, 7457363, 7458749, 7458749, ...","[[49, 67], [70, 79], [360, 366], [460, 469], [..."
63,20230613_105657_standard_comp_to_training_D2_s...,20230613_105657_standard_comp_to_training_D2_s...,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20230613_105657_standard_comp_to_training_D2_s...,20000,2.4.0,May 24 2023,...,"[7454593, 7503951, 7522151, 7524748, 7533948, ...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4]",1.1,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[7503951, 7522151], [7524748, 7533948], [7815...",20230613_105657_standard_comp_to_training_D2_s...,"[7454591, 7455977, 7457363, 7458749, 7458749, ...","[[49, 67], [70, 79], [360, 366], [460, 469], [..."
64,20230613_105657_standard_comp_to_training_D2_s...,20230613_105657_standard_comp_to_training_D2_s...,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20230613_105657_standard_comp_to_training_D2_s...,20000,2.4.0,May 24 2023,...,"[7454593, 8373348, 8573351, 10773348, 10973350...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4]",1.1,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[8373348, 8573351], [10773348, 10973350], [13...",20230613_105657_standard_comp_to_training_D2_s...,"[7455977, 7455977, 7457363, 7458749, 7458749, ...","[[916, 1117], [3312, 3513], [5608, 5808], [740..."


In [69]:
trodes_state_df = trodes_state_df[STATE_COLS_TO_KEEP].drop_duplicates(subset=["session_dir", "video_name", "metadata_file"]).sort_values(["session_dir", "video_name", "metadata_file"]).reset_index(drop=True).copy()

In [70]:
trodes_state_df.head()

Unnamed: 0,session_dir,metadata_file,event_timestamps,video_name,video_timestamps,event_frames
0,20230612_101430_standard_comp_to_training_D1_s...,dio_ECU_Din1,"[[9781115, 9981112], [12181113, 12381110], [14...",20230612_101430_standard_comp_to_training_D1_s...,"[8798884, 8800270, 8801656, 8803042, 8803042, ...","[[980, 1181], [3376, 3575], [5672, 5871], [746..."
1,20230612_101430_standard_comp_to_training_D1_s...,dio_ECU_Din2,"[[9289915, 9314113], [9318312, 9357515], [9358...",20230612_101430_standard_comp_to_training_D1_s...,"[8798884, 8800270, 8801656, 8803042, 8803042, ...","[[490, 514], [518, 558], [558, 637], [638, 640..."
2,20230612_101430_standard_comp_to_training_D1_s...,dio_ECU_Din3,"[[41881086, 41888889], [42363889, 42365886], [...",20230612_101430_standard_comp_to_training_D1_s...,"[8798884, 8800270, 8801656, 8803042, 8803042, ...","[[33137, 33147], [33665, 33666], [33668, 33669..."
3,20230612_101430_standard_comp_to_training_D1_s...,dio_ECU_Din1,"[[9781115, 9981112], [12181113, 12381110], [14...",20230612_101430_standard_comp_to_training_D1_s...,"[8798884, 8800270, 8801656, 8803042, 8803042, ...","[[980, 1180], [3376, 3575], [5672, 5871], [746..."
4,20230612_101430_standard_comp_to_training_D1_s...,dio_ECU_Din2,"[[9289915, 9314113], [9318312, 9357515], [9358...",20230612_101430_standard_comp_to_training_D1_s...,"[8798884, 8800270, 8801656, 8803042, 8803042, ...","[[490, 514], [518, 558], [558, 637], [638, 640..."


In [71]:
trodes_state_df = trodes_state_df.groupby(same_columns).agg({**{col: 'first' for col in trodes_state_df.columns if col not in same_columns + different_columns}, **{col: lambda x: x.tolist() for col in different_columns}}).reset_index()

In [72]:
trodes_state_df.head()

Unnamed: 0,session_dir,video_name,video_timestamps,metadata_file,event_frames,event_timestamps
0,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,"[8798884, 8800270, 8801656, 8803042, 8803042, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[980, 1181], [3376, 3575], [5672, 5871], [74...","[[[9781115, 9981112], [12181113, 12381110], [1..."
1,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,"[8798884, 8800270, 8801656, 8803042, 8803042, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[980, 1180], [3376, 3575], [5672, 5871], [74...","[[[9781115, 9981112], [12181113, 12381110], [1..."
2,20230612_112630_standard_comp_to_training_D1_s...,20230612_112630_standard_comp_to_training_D1_s...,"[7978450, 7979510, 7979835, 7981221, 7982607, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[1125, 1324], [3519, 3720], [5815, 6014], [7...","[[[9103808, 9303807], [11503806, 11703806], [1..."
3,20230612_112630_standard_comp_to_training_D1_s...,20230612_112630_standard_comp_to_training_D1_s...,"[7978450, 7979597, 7979835, 7981221, 7982607, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[1125, 1324], [3519, 3720], [5815, 6014], [7...","[[[9103808, 9303807], [11503806, 11703806], [1..."
4,20230613_105657_standard_comp_to_training_D2_s...,20230613_105657_standard_comp_to_training_D2_s...,"[7455977, 7455977, 7457363, 7458749, 7458749, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[916, 1117], [3312, 3513], [5608, 5808], [74...","[[[8373348, 8573351], [10773348, 10973350], [1..."


In [73]:
trodes_state_df["tone_timestamps"] = trodes_state_df["event_timestamps"].apply(lambda x: x[0])
trodes_state_df["box_1_port_entry_timestamps"] = trodes_state_df["event_timestamps"].apply(lambda x: x[1])
trodes_state_df["box_2_port_entry_timestamps"] = trodes_state_df["event_timestamps"].apply(lambda x: x[2])

trodes_state_df["tone_frames"] = trodes_state_df["event_frames"].apply(lambda x: x[0])
trodes_state_df["box_1_port_entry_frames"] = trodes_state_df["event_frames"].apply(lambda x: x[1])
trodes_state_df["box_2_port_entry_frames"] = trodes_state_df["event_frames"].apply(lambda x: x[2])


In [74]:
trodes_state_df = trodes_state_df.drop(columns=["event_timestamps", "event_frames", "metadata_file"], errors="ignore")

In [75]:
trodes_state_df.head()

Unnamed: 0,session_dir,video_name,video_timestamps,tone_timestamps,box_1_port_entry_timestamps,box_2_port_entry_timestamps,tone_frames,box_1_port_entry_frames,box_2_port_entry_frames
0,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,"[8798884, 8800270, 8801656, 8803042, 8803042, ...","[[9781115, 9981112], [12181113, 12381110], [14...","[[9289915, 9314113], [9318312, 9357515], [9358...","[[41881086, 41888889], [42363889, 42365886], [...","[[980, 1181], [3376, 3575], [5672, 5871], [746...","[[490, 514], [518, 558], [558, 637], [638, 640...","[[33137, 33147], [33665, 33666], [33668, 33669..."
1,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,"[8798884, 8800270, 8801656, 8803042, 8803042, ...","[[9781115, 9981112], [12181113, 12381110], [14...","[[9289915, 9314113], [9318312, 9357515], [9358...","[[41881086, 41888889], [42363889, 42365886], [...","[[980, 1180], [3376, 3575], [5672, 5871], [746...","[[490, 514], [518, 558], [558, 637], [638, 640...","[[33021, 33027], [33502, 33503], [33504, 33506..."
2,20230612_112630_standard_comp_to_training_D1_s...,20230612_112630_standard_comp_to_training_D1_s...,"[7978450, 7979510, 7979835, 7981221, 7982607, ...","[[9103808, 9303807], [11503806, 11703806], [13...","[[8169811, 8226416], [8366813, 8384208], [8894...","[[41014777, 41015772], [41241974, 41247379], [...","[[1125, 1324], [3519, 3720], [5815, 6014], [76...","[[192, 248], [389, 405], [916, 929], [929, 948...","[[33019, 33020], [33246, 33251], [33253, 33255..."
3,20230612_112630_standard_comp_to_training_D1_s...,20230612_112630_standard_comp_to_training_D1_s...,"[7978450, 7979597, 7979835, 7981221, 7982607, ...","[[9103808, 9303807], [11503806, 11703806], [13...","[[8169811, 8226416], [8366813, 8384208], [8894...","[[41014777, 41015772], [41241974, 41247379], [...","[[1125, 1324], [3519, 3720], [5815, 6014], [76...","[[192, 248], [389, 405], [916, 930], [930, 948...","[[32974, 32976], [33201, 33207], [33208, 33211..."
4,20230613_105657_standard_comp_to_training_D2_s...,20230613_105657_standard_comp_to_training_D2_s...,"[7455977, 7455977, 7457363, 7458749, 7458749, ...","[[8373348, 8573351], [10773348, 10973350], [13...","[[7503951, 7522151], [7524748, 7533948], [7815...","[[41078926, 41277526], [41587525, 41645128], [...","[[916, 1117], [3312, 3513], [5608, 5808], [740...","[[49, 67], [70, 79], [360, 366], [460, 469], [...","[[33601, 33798], [34108, 34165], [34166, 34179..."


In [76]:
trodes_raw_df = trodes_raw_df[RAW_COLS_TO_KEEP].reset_index(drop=True).copy()

In [77]:
trodes_raw_df.head()

Unnamed: 0,session_dir,recording,original_file,session_path,current_subject,first_item_data,first_timestamp,all_subjects
0,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,/scratch/back_up/reward_competition_extention/...,1.4,"[307664, 307665, 307666, 307667, 307668, 30766...",307664,"[1.2, 1.4]"
1,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,20230616_111904_standard_comp_to_training_D4_s...,/scratch/back_up/reward_competition_extention/...,1.2,"[307664, 307665, 307666, 307667, 307668, 30766...",307664,"[1.2, 1.4]"
2,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,/scratch/back_up/reward_competition_extention/...,1.3,"[8798886, 8798887, 8798888, 8798889, 8798890, ...",8798886,"[1.3, 1.4]"
3,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,20230612_101430_standard_comp_to_training_D1_s...,/scratch/back_up/reward_competition_extention/...,1.4,"[8798886, 8798887, 8798888, 8798889, 8798890, ...",8798886,"[1.3, 1.4]"
4,20230614_114041_standard_comp_to_training_D3_s...,20230614_114041_standard_comp_to_training_D3_s...,20230614_114041_standard_comp_to_training_D3_s...,/scratch/back_up/reward_competition_extention/...,1.2,"[34373723, 34373724, 34373725, 34373726, 34373...",34373723,"[1.1, 1.2]"


In [78]:
trodes_final_df = pd.merge(trodes_raw_df, trodes_state_df, on=["session_dir"], how="inner")

In [79]:
trodes_final_df.shape

(22, 16)

In [80]:
trodes_final_df = trodes_final_df.rename(columns={"first_item_data": "raw_timestamps"})
trodes_final_df = trodes_final_df.drop(columns=["metadata_file"], errors="ignore")
trodes_final_df = trodes_final_df.sort_values(["session_dir", "recording"]).reset_index(drop=True).copy()

## Making the timestamps 0 indexed

In [81]:
trodes_final_df[[col for col in trodes_final_df.columns if "timestamps" in col]].head()

Unnamed: 0,raw_timestamps,video_timestamps,tone_timestamps,box_1_port_entry_timestamps,box_2_port_entry_timestamps
0,"[8798886, 8798887, 8798888, 8798889, 8798890, ...","[8798884, 8800270, 8801656, 8803042, 8803042, ...","[[9781115, 9981112], [12181113, 12381110], [14...","[[9289915, 9314113], [9318312, 9357515], [9358...","[[41881086, 41888889], [42363889, 42365886], [..."
1,"[8798886, 8798887, 8798888, 8798889, 8798890, ...","[8798884, 8800270, 8801656, 8803042, 8803042, ...","[[9781115, 9981112], [12181113, 12381110], [14...","[[9289915, 9314113], [9318312, 9357515], [9358...","[[41881086, 41888889], [42363889, 42365886], [..."
2,"[8798886, 8798887, 8798888, 8798889, 8798890, ...","[8798884, 8800270, 8801656, 8803042, 8803042, ...","[[9781115, 9981112], [12181113, 12381110], [14...","[[9289915, 9314113], [9318312, 9357515], [9358...","[[41881086, 41888889], [42363889, 42365886], [..."
3,"[8798886, 8798887, 8798888, 8798889, 8798890, ...","[8798884, 8800270, 8801656, 8803042, 8803042, ...","[[9781115, 9981112], [12181113, 12381110], [14...","[[9289915, 9314113], [9318312, 9357515], [9358...","[[41881086, 41888889], [42363889, 42365886], [..."
4,"[7977066, 7977067, 7977068, 7977069, 7977070, ...","[7978450, 7979510, 7979835, 7981221, 7982607, ...","[[9103808, 9303807], [11503806, 11703806], [13...","[[8169811, 8226416], [8366813, 8384208], [8894...","[[41014777, 41015772], [41241974, 41247379], [..."


In [82]:
trodes_final_df["last_timestamp"] = trodes_final_df["raw_timestamps"].apply(lambda x: x[-1])

- Dropping raw timestamps because of memory issues

In [83]:
trodes_final_df = trodes_final_df.drop(columns=["raw_timestamps", "original_file"], errors="ignore")

In [84]:
copy_trodes_final_df = trodes_final_df.copy

In [85]:
for col in [col for col in trodes_final_df.columns if "timestamps" in col]:
    trodes_final_df[col] = trodes_final_df.apply(lambda x: x[col].astype(np.int32) - np.int32(x["first_timestamp"]), axis=1)

for col in [col for col in trodes_final_df.columns if "frames" in col]:
    trodes_final_df[col] = trodes_final_df[col].apply(lambda x: x.astype(np.int32))

In [86]:
sorted_columns = sorted(trodes_final_df.columns
, key=lambda x: x.split("_")[-1])
trodes_final_df = trodes_final_df[sorted_columns].copy()

## Saving to a file

In [87]:
trodes_final_df.to_pickle(os.path.join(OUTPUT_DIR, "{}_00_trodes_metadata.pkl".format(OUTPUT_PREFIX)))

In [88]:
trodes_final_df.head()

Unnamed: 0,session_dir,tone_frames,box_1_port_entry_frames,box_2_port_entry_frames,video_name,session_path,recording,current_subject,all_subjects,first_timestamp,last_timestamp,video_timestamps,tone_timestamps,box_1_port_entry_timestamps,box_2_port_entry_timestamps
0,20230612_101430_standard_comp_to_training_D1_s...,"[[980, 1181], [3376, 3575], [5672, 5871], [746...","[[490, 514], [518, 558], [558, 637], [638, 640...","[[33137, 33147], [33665, 33666], [33668, 33669...",20230612_101430_standard_comp_to_training_D1_s...,/scratch/back_up/reward_competition_extention/...,20230612_101430_standard_comp_to_training_D1_s...,1.3,"[1.3, 1.4]",8798886,77093151,"[-2, 1384, 2770, 4156, 4156, 5542, 6928, 6928,...","[[982229, 1182226], [3382227, 3582224], [56822...","[[491029, 515227], [519426, 558629], [559427, ...","[[33082200, 33090003], [33565003, 33567000], [..."
1,20230612_101430_standard_comp_to_training_D1_s...,"[[980, 1180], [3376, 3575], [5672, 5871], [746...","[[490, 514], [518, 558], [558, 637], [638, 640...","[[33021, 33027], [33502, 33503], [33504, 33506...",20230612_101430_standard_comp_to_training_D1_s...,/scratch/back_up/reward_competition_extention/...,20230612_101430_standard_comp_to_training_D1_s...,1.3,"[1.3, 1.4]",8798886,77093151,"[-2, 1384, 2770, 4156, 4156, 5542, 6928, 6928,...","[[982229, 1182226], [3382227, 3582224], [56822...","[[491029, 515227], [519426, 558629], [559427, ...","[[33082200, 33090003], [33565003, 33567000], [..."
2,20230612_101430_standard_comp_to_training_D1_s...,"[[980, 1181], [3376, 3575], [5672, 5871], [746...","[[490, 514], [518, 558], [558, 637], [638, 640...","[[33137, 33147], [33665, 33666], [33668, 33669...",20230612_101430_standard_comp_to_training_D1_s...,/scratch/back_up/reward_competition_extention/...,20230612_101430_standard_comp_to_training_D1_s...,1.4,"[1.3, 1.4]",8798886,77093151,"[-2, 1384, 2770, 4156, 4156, 5542, 6928, 6928,...","[[982229, 1182226], [3382227, 3582224], [56822...","[[491029, 515227], [519426, 558629], [559427, ...","[[33082200, 33090003], [33565003, 33567000], [..."
3,20230612_101430_standard_comp_to_training_D1_s...,"[[980, 1180], [3376, 3575], [5672, 5871], [746...","[[490, 514], [518, 558], [558, 637], [638, 640...","[[33021, 33027], [33502, 33503], [33504, 33506...",20230612_101430_standard_comp_to_training_D1_s...,/scratch/back_up/reward_competition_extention/...,20230612_101430_standard_comp_to_training_D1_s...,1.4,"[1.3, 1.4]",8798886,77093151,"[-2, 1384, 2770, 4156, 4156, 5542, 6928, 6928,...","[[982229, 1182226], [3382227, 3582224], [56822...","[[491029, 515227], [519426, 558629], [559427, ...","[[33082200, 33090003], [33565003, 33567000], [..."
4,20230612_112630_standard_comp_to_training_D1_s...,"[[1125, 1324], [3519, 3720], [5815, 6014], [76...","[[192, 248], [389, 405], [916, 929], [929, 948...","[[33019, 33020], [33246, 33251], [33253, 33255...",20230612_112630_standard_comp_to_training_D1_s...,/scratch/back_up/reward_competition_extention/...,20230612_112630_standard_comp_to_training_D1_s...,1.1,"[1.1, 1.2]",7977066,76318450,"[1384, 2444, 2769, 4155, 5541, 6708, 6927, 831...","[[1126742, 1326741], [3526740, 3726740], [5826...","[[192745, 249350], [389747, 407142], [917544, ...","[[33037711, 33038706], [33264908, 33270313], [..."


In [89]:
trodes_final_df["session_dir"].unique()

array(['20230612_101430_standard_comp_to_training_D1_subj_1-4_and_1-3',
       '20230612_112630_standard_comp_to_training_D1_subj_1-2_and_1-1',
       '20230613_105657_standard_comp_to_training_D2_subj_1-1_and_1-4',
       '20230614_114041_standard_comp_to_training_D3_subj_1-1_and_1-2',
       '20230616_111904_standard_comp_to_training_D4_subj_1-4_and_1-2'],
      dtype=object)

In [90]:
trodes_final_df["video_name"].unique()

array(['20230612_101430_standard_comp_to_training_D1_subj_1-4_and_1-3.1.videoTimeStamps.cameraHWSync',
       '20230612_101430_standard_comp_to_training_D1_subj_1-4_and_1-3.2.videoTimeStamps.cameraHWSync',
       '20230612_112630_standard_comp_to_training_D1_subj_1-2_and_1-1.1.videoTimeStamps.cameraHWSync',
       '20230612_112630_standard_comp_to_training_D1_subj_1-2_and_1-1.2.videoTimeStamps.cameraHWSync',
       '20230613_105657_standard_comp_to_training_D2_subj_1-1_and_1-4.1.videoTimeStamps.cameraHWSync',
       '20230613_105657_standard_comp_to_training_D2_subj_1-1_and_1-4.2.videoTimeStamps.cameraHWSync',
       '20230614_114041_standard_comp_to_training_D3_subj_1-1_and_1-2.1.videoTimeStamps.cameraHWSync',
       '20230614_114041_standard_comp_to_training_D3_subj_1-1_and_1-2.2.videoTimeStamps.cameraHWSync',
       '20230614_114041_standard_comp_to_training_D3_subj_1-1_and_1-2.3.videoTimeStamps.cameraHWSync',
       '20230616_111904_standard_comp_to_training_D4_subj_1-4_and_1-2.1.v