# Time Stamp Extract

Brief 1-2 sentence description of notebook.

TODO: Supplement the description
- Notebook that extracts the timestamps and gets the time that tones played

In [1]:
# Imports of all used packages and libraries
import sys
import os
import git
import glob
from collections import defaultdict

In [2]:
git_repo = git.Repo(".", search_parent_directories=True)
git_root = git_repo.git.rev_parse("--show-toplevel")

In [3]:
git_root

'/nancy/user/riwata/projects/reward_comp_ext'

In [4]:
sys.path.insert(0, os.path.join(git_root, 'src'))

In [5]:
# Imports of all used packages and libraries
import glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [6]:
import spikeinterface.extractors as se
import spikeinterface.preprocessing as sp

In [7]:
import utilities.helper
import trodes.read_exported

# Functions

In [8]:
import re

def extract_floats(s):
    """
    Extracts all floats from a string and returns them as a list of strings.

    Parameters:
    - s (str): The string to extract floats from.

    Returns:
    - list: A list of strings, each representing a float found in the input string.
    """
    float_pattern = r"[-+]?\d*\.\d+|\d+"
    return [str(float(num)) for num in re.findall(float_pattern, s)]

## Inputs & Data

- Explanation of each input and where it comes from.

Inputs and Required data loading
- input variable names are in all caps snake case
- Whenever an input changes or is used for processing 
- The variables are all lower in snake case

In [9]:
# Path of the directory that contains the Spike Gadgets recording and the exported timestamp files
# Exported with this tool https://docs.spikegadgets.com/en/latest/basic/ExportFunctions.html
# Export these files:
    # -raw – Continuous raw band export.
    # -dio – Digital IO channel state change export.
    # -analogio – Continuous analog IO export.
INPUT_DIR = "/scratch/back_up/reward_competition_extention/data/rce_cohort_2"
OUTPUT_DIR = r"./proc" # where data is saved should always be shown in the inputs
TONE_DIN = "dio_ECU_Din1"
TONE_STATE = 1
os.makedirs(OUTPUT_DIR, exist_ok=True)
OUTPUT_PREFIX = "rce_pilot_2_comp_novel"

In [10]:
COLS_TO_KEEP = ['session_dir', 'recording', 'metadata_dir', 'metadata_file',
'original_file', 'filename', 'session_path', 'all_subjects',
       'current_subject', 'event_timestamps', 'video_name',
       'video_timestamps', 'event_frames', 'first_item_data']

In [11]:
RAW_COLS_TO_KEEP = ['session_dir',
 'recording',
 'original_file',
 'session_path',
 'current_subject',
 'first_item_data',
 'first_timestamp',
 'all_subjects']

In [12]:
STATE_COLS_TO_KEEP = ['session_dir',
 'metadata_file',
 'event_timestamps',
 'video_name',
 'video_timestamps',
 'event_frames',]

In [13]:
same_columns = ['session_dir', 'video_name']
different_columns = ['metadata_file', 'event_frames', 'event_timestamps']

In [14]:
# TODO: Find way not to hard code this
# ALL_SESSION_DIR = glob.glob("/scratch/back_up/reward_competition_extention/data/standard/2023_06_*/*.rec")
ALL_SESSION_DIR = glob.glob("/scratch/back_up/reward_competition_extention/data/rce_cohort_2/novel/*.rec")



In [15]:
ALL_SESSION_DIR

['/scratch/back_up/reward_competition_extention/data/rce_cohort_2/novel/20230629_111937_standard_comp_to_novel_agent_D2_subj_1-1vs2-1and1-4vs2-2.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_2/novel/20230630_115506_standard_comp_to_novel_agent_D3_subj_1-4vs2-1and1-2vs2-2.rec',
 '/scratch/back_up/reward_competition_extention/data/rce_cohort_2/novel/20230628_111202_standard_comp_to_novel_agent_D1_subj_1-1vs2-2and1-2vs2-1.rec']

## Outputs

Describe each output that the notebook creates. 

- Is it a plot or is it data?

- How valuable is the output and why is it valuable or useful?

## Other documentation

raw directory
- raw_group0.dat
    - voltage_value: Array with voltage measurement for each channel at each timestamp
- timestamps.dat
    - voltage_time_stamp: The time stamp of each voltage measurement

parent directory
- 1.videoTimeStamps.cameraHWSync
    - frame_number: Calculated by getting the index of each video time stamp tuple 
    - PosTimestamp: The time stamp of each video frame
    - HWframeCount: Unknown value. Starts at 30742 and increases by 1 for each tuple  
    - HWTimestamp: Unknown value. All zeroes
    - video_time: Calculated by dividing the frame number by the fps(frames per second) 
    - video_seconds: video_time, but rounded to seconds  	
    - These are filled in versions of the above collumns with the value from the most recent previous cell
        - filled_PosTimestamp 	
        - filledHWframeCount 	
        - filled_frame_number 	
        - filled_video_time 	
        - filled_video_seconds 	

DIO directory
- dio_ECU_Din1.dat
    - time: The time stamp the corresponds to the DIN input
    - state: Binary state of whether there is input from DIN or not 	
    - trial_number: Calculated by adding 1 to every time there is a DIN input
    - These are filled in versions of the above collumns with the value from the most recent previous cell
        - filled_state 	
        - filled_trial_number

ss_output directory (Spike sorting with Spike interface)
- firings.npz
    - unit_id: All the units that had a spike train for the given timestamp 	
    - number_of_units: Calculated by counting the number of units that had a spike train

## Functions

- function names are short and in snake case all lowercase
- a function name should be unique but does not have to describe the function
- doc strings describe functions not function names

## Processing

Describe what is done to the data here and how inputs are manipulated to generate outputs. 

In [16]:
# As much code and as many cells as required
# includes EDA and playing with data
# GO HAM!

# LOOP 1: Extracting all the Trodes

- Getting all the data from all the exported Trodes files

- Getting all the data from all the exported Trodes files and saving it to `session_to_trodes_data`
    - Creates a dictionary with the structure of:
        - `{dir_name: {file_name: metadata, file_name_2: metadata_2}, dir_name_2: {file_name_3: metadata_3, file_name_4: metadata_4}}`

In [17]:
# Saving the trodes data for each session
# Each key is a session name
# Each value is a dictionary of every recording file in that session
session_to_trodes_data = utilities.helper.create_recursive_dict()


# Saving the path of the session recording
session_to_path = {}

# Going through each session recording
# Which includes all the recordings from all the miniloggers and cameras
for session_path in ALL_SESSION_DIR:   
    try:
        # Getting the name of the session from the path
        session_basename = os.path.splitext(os.path.basename(session_path))[0]
        print("Current Session: {}".format(session_basename))
        # Reading the trodes data for every recording file in the session directory
        session_to_trodes_data[session_basename] = trodes.read_exported.organize_all_trodes_export(session_path)
        
        session_to_path[session_basename] = session_path
    except Exception as e: 
        print(e)


Current Session: 20230629_111937_standard_comp_to_novel_agent_D2_subj_1-1vs2-1and1-4vs2-2


  return np.dtype(dtype_spec)


Skipping file 20230629_111937_standard_comp_to_novel_agent_D2_subj_1-1v1-4and2-1_merged.timestampoffset.txt due to error: Settings format not supported
Skipping file 20230629_111937_standard_comp_to_novel_agent_D2_subj_1-4vs1-1and2-2_merged.timestampoffset.txt due to error: Settings format not supported
Current Session: 20230630_115506_standard_comp_to_novel_agent_D3_subj_1-4vs2-1and1-2vs2-2
Skipping file 20230630_115506_standard_comp_to_novel_agent_D3_subj_1-2vs1-4and2-2_merged.timestampoffset.txt due to error: Settings format not supported
Skipping file 20230630_115506_standard_comp_to_novel_agent_D3_subj_1-4vs1-2and2-1_merged.timestampoffset.txt due to error: Settings format not supported
Current Session: 20230628_111202_standard_comp_to_novel_agent_D1_subj_1-1vs2-2and1-2vs2-1
Skipping file 20230628_111202_standard_comp_to_novel_agent_D1_subj_1-2vs1-1and2-1_merged.timestampoffset.txt due to error: Settings format not supported
Skipping file 20230628_111202_standard_comp_to_novel_age

In [18]:
session_to_trodes_data

defaultdict(<function utilities.helper.create_recursive_dict()>,
            {'20230629_111937_standard_comp_to_novel_agent_D2_subj_1-1vs2-1and1-4vs2-2': defaultdict(dict,
                         {'20230629_111937_standard_comp_to_novel_agent_D2_subj_1-1v1-4and2-1_merged': {'DIO': {'dio_ECU_Dout4': {'description': 'State change data for one digital channel. Display_order is 1-based',
                             'byte_order': 'little endian',
                             'original_file': '20230629_111937_standard_comp_to_novel_agent_D2_subj_1-1v1-4and2-1_merged.rec',
                             'clockrate': '20000',
                             'trodes_version': '2.4.0',
                             'compile_date': 'May 24 2023',
                             'compile_time': '10:11:04',
                             'qt_version': '6.2.2',
                             'commit_tag': 'heads/Release_2.4.0-0-g1eecf3b7',
                             'controller_firmware': '3.17',
           

- Adding the video timestamps

In [19]:
for session_path in ALL_SESSION_DIR:   
    try:
        session_basename = os.path.splitext(os.path.basename(session_path))[0]
        print("Current Session: {}".format(session_basename))
        file_to_video_timestamps = {}
        for video_timestamps in glob.glob(os.path.join(session_path, "*cameraHWSync")):
            video_basename = os.path.basename(video_timestamps)
            print("Current Video Name: {}".format(video_basename))
            timestamp_array = trodes.read_exported.read_trodes_extracted_data_file(video_timestamps)
            if "video_timestamps" not in session_to_trodes_data[session_basename][session_basename]:
                session_to_trodes_data[session_basename][session_basename]["video_timestamps"] = defaultdict(dict)
            session_to_trodes_data[session_basename][session_basename]["video_timestamps"][video_basename.split(".")[-3]] = timestamp_array
    
    
    except Exception as e: 
        print(e)

Current Session: 20230629_111937_standard_comp_to_novel_agent_D2_subj_1-1vs2-1and1-4vs2-2
Current Video Name: 20230629_111937_standard_comp_to_novel_agent_D2_subj_1-1vs2-1and1-4vs2-2.2.videoTimeStamps.cameraHWSync
Current Video Name: 20230629_111937_standard_comp_to_novel_agent_D2_subj_1-1vs2-1and1-4vs2-2.1.videoTimeStamps.cameraHWSync
Current Session: 20230630_115506_standard_comp_to_novel_agent_D3_subj_1-4vs2-1and1-2vs2-2
Current Video Name: 20230630_115506_standard_comp_to_novel_agent_D3_subj_1-4vs2-1and1-2vs2-2.2.videoTimeStamps.cameraHWSync
Current Video Name: 20230630_115506_standard_comp_to_novel_agent_D3_subj_1-4vs2-1and1-2vs2-2.1.videoTimeStamps.cameraHWSync
Current Session: 20230628_111202_standard_comp_to_novel_agent_D1_subj_1-1vs2-2and1-2vs2-1
Current Video Name: 20230628_111202_standard_comp_to_novel_agent_D1_subj_1-1vs2-2and1-2vs2-1.1.videoTimeStamps.cameraHWSync
Current Video Name: 20230628_111202_standard_comp_to_novel_agent_D1_subj_1-1vs2-2and1-2vs2-1.3.videoTimeStamps

- Creating a dataframe the dictionary with a column for:
  - Session directory
  - Recording name
  - Metadata directory
  - Metadata file
  - And a column for each metadata

In [20]:
# Creating a dataframe from the nested dictionary
trodes_metadata_df = pd.DataFrame.from_dict({(i,j,k,l): session_to_trodes_data[i][j][k][l] 
                           for i in session_to_trodes_data.keys() 
                           for j in session_to_trodes_data[i].keys()
                           for k in session_to_trodes_data[i][j].keys()
                           for l in session_to_trodes_data[i][j][k].keys()},
                           orient='index')

# Resetting the index and renaming the columns
trodes_metadata_df = trodes_metadata_df.reset_index()
trodes_metadata_df = trodes_metadata_df.rename(columns={'level_0': 'session_dir', 'level_1': 'recording', 'level_2': 'metadata_dir', 'level_3': 'metadata_file'}, errors="ignore")

# Adding the session path to the dataframe
trodes_metadata_df["session_path"] = trodes_metadata_df["session_dir"].map(session_to_path)

In [21]:
trodes_metadata_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,first_timestamp,direction,id,display_order,fields,data,filename,decimation,clock rate,session_path
0,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Dout4,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,10693370,output,ECU_Dout4,5,<time uint32><state uint8>,"[[10693370, 0], [27239475, 1], [27239478, 0]]",20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...
1,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din4,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,10693370,input,ECU_Din4,9,<time uint32><state uint8>,"[[10693370, 0]]",20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...
2,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,10693370,input,ECU_Din2,6,<time uint32><state uint8>,"[[10693370, 0], [10925203, 1], [10929603, 0], ...",20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...
3,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,10693370,input,ECU_Din3,8,<time uint32><state uint8>,"[[10693370, 0], [10794401, 1], [10795801, 0], ...",20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...
4,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Dout2,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,10693370,output,ECU_Dout2,3,<time uint32><state uint8>,"[[10693370, 0], [27239475, 1], [27239478, 0]]",20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...


In [22]:
trodes_metadata_df.tail()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,first_timestamp,direction,id,display_order,fields,data,filename,decimation,clock rate,session_path
75,20230630_115506_standard_comp_to_novel_agent_D...,20230630_115506_standard_comp_to_novel_agent_D...,video_timestamps,1,,,,,,,...,,,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[10971928, 0, 0], [10973314, 0, 0], [10974700...",20230630_115506_standard_comp_to_novel_agent_D...,,30000,/scratch/back_up/reward_competition_extention/...
76,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,video_timestamps,1,,,,,,,...,,,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[3412009, 0, 0], [3413395, 0, 0], [3413395, 0...",20230628_111202_standard_comp_to_novel_agent_D...,,30000,/scratch/back_up/reward_competition_extention/...
77,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,video_timestamps,3,,,,,,,...,,,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[5691765, 0, 0], [5691765, 0, 0], [5693151, 0...",20230628_111202_standard_comp_to_novel_agent_D...,,30000,/scratch/back_up/reward_competition_extention/...
78,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,video_timestamps,4,,,,,,,...,,,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[70274672, 0, 0], [70276058, 0, 0], [70277444...",20230628_111202_standard_comp_to_novel_agent_D...,,30000,/scratch/back_up/reward_competition_extention/...
79,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,video_timestamps,2,,,,,,,...,,,,,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[3412009, 0, 0], [3413395, 0, 0], [3414781, 0...",20230628_111202_standard_comp_to_novel_agent_D...,,30000,/scratch/back_up/reward_competition_extention/...


- Getting the first item from each tuple in the arrays in the `data` column
  - This first item is usually just the timestamp

In [23]:
trodes_metadata_df["data"].iloc[0]

array([(10693370, 0), (27239475, 1), (27239478, 0)],
      dtype=[('time', '<u4'), ('state', 'u1')])

In [24]:
# Getting the dtype name of each column in the numpy array
trodes_metadata_df["first_dtype_name"] = trodes_metadata_df["data"].apply(lambda x: x.dtype.names[0])
# Getting the first item of each column in the numpy array
trodes_metadata_df["first_item_data"] = trodes_metadata_df["data"].apply(lambda x: x[x.dtype.names[0]])


In [25]:
# Same as above but for the last column
trodes_metadata_df["last_dtype_name"] = trodes_metadata_df["data"].apply(lambda x: x.dtype.names[-1])
trodes_metadata_df["last_item_data"] = trodes_metadata_df["data"].apply(lambda x: x[x.dtype.names[-1]])

In [26]:
trodes_metadata_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,fields,data,filename,decimation,clock rate,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data
0,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Dout4,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,<time uint32><state uint8>,"[[10693370, 0], [27239475, 1], [27239478, 0]]",20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...,time,"[10693370, 27239475, 27239478]",state,"[0, 1, 0]"
1,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din4,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,<time uint32><state uint8>,"[[10693370, 0]]",20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...,time,[10693370],state,[0]
2,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,<time uint32><state uint8>,"[[10693370, 0], [10925203, 1], [10929603, 0], ...",20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...,time,"[10693370, 10925203, 10929603, 10938409, 10941...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ..."
3,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,<time uint32><state uint8>,"[[10693370, 0], [10794401, 1], [10795801, 0], ...",20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...,time,"[10693370, 10794401, 10795801, 10799402, 10801...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ..."
4,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Dout2,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,<time uint32><state uint8>,"[[10693370, 0], [27239475, 1], [27239478, 0]]",20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...,time,"[10693370, 27239475, 27239478]",state,"[0, 1, 0]"


In [27]:
trodes_metadata_df.tail()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,fields,data,filename,decimation,clock rate,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data
75,20230630_115506_standard_comp_to_novel_agent_D...,20230630_115506_standard_comp_to_novel_agent_D...,video_timestamps,1,,,,,,,...,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[10971928, 0, 0], [10973314, 0, 0], [10974700...",20230630_115506_standard_comp_to_novel_agent_D...,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[10971928, 10973314, 10974700, 10976085, 10976...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
76,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,video_timestamps,1,,,,,,,...,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[3412009, 0, 0], [3413395, 0, 0], [3413395, 0...",20230628_111202_standard_comp_to_novel_agent_D...,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[3412009, 3413395, 3413395, 3414781, 3416167, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
77,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,video_timestamps,3,,,,,,,...,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[5691765, 0, 0], [5691765, 0, 0], [5693151, 0...",20230628_111202_standard_comp_to_novel_agent_D...,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[5691765, 5691765, 5693151, 5694537, 5695922, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
78,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,video_timestamps,4,,,,,,,...,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[70274672, 0, 0], [70276058, 0, 0], [70277444...",20230628_111202_standard_comp_to_novel_agent_D...,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[70274672, 70276058, 70277444, 70277444, 70278...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
79,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,video_timestamps,2,,,,,,,...,<PosTimestamp uint32><HWframeCount uint32><HWT...,"[[3412009, 0, 0], [3413395, 0, 0], [3414781, 0...",20230628_111202_standard_comp_to_novel_agent_D...,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[3412009, 3413395, 3414781, 3414781, 3416167, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."


In [28]:
trodes_metadata_df["recording"].unique()

array(['20230629_111937_standard_comp_to_novel_agent_D2_subj_1-1v1-4and2-1_merged',
       '20230629_111937_standard_comp_to_novel_agent_D2_subj_1-4vs1-1and2-2_merged',
       '20230630_115506_standard_comp_to_novel_agent_D3_subj_1-2vs1-4and2-2_merged',
       '20230630_115506_standard_comp_to_novel_agent_D3_subj_1-4vs1-2and2-1_merged',
       '20230628_111202_standard_comp_to_novel_agent_D1_subj_1-2vs1-1and2-1_merged',
       '20230628_111202_standard_comp_to_novel_agent_D1_subj_1-1vs1-2and2-2_merged',
       '20230629_111937_standard_comp_to_novel_agent_D2_subj_1-1vs2-1and1-4vs2-2',
       '20230630_115506_standard_comp_to_novel_agent_D3_subj_1-4vs2-1and1-2vs2-2',
       '20230628_111202_standard_comp_to_novel_agent_D1_subj_1-1vs2-2and1-2vs2-1'],
      dtype=object)

## Getting the subject information from the metadata

In [29]:
def split_by_multiple_delimiters(s, delimiters):
    """
    Splits a string by multiple delimiters.

    Parameters:
    - s (str): The string to split.
    - delimiters (list): A list of delimiters to split the string by.

    Returns:
    - list: A list of substrings.
    """
    return re.split('|'.join(map(re.escape, delimiters)), s)


In [30]:
trodes_metadata_df["all_subjects"] = trodes_metadata_df["session_dir"].apply(lambda x: x.split("subj")[-1].strip("_").replace("-", "."))#.split("t")[0].strip("_").replace("_",".").split(".and."))
trodes_metadata_df["all_subjects"] = trodes_metadata_df["all_subjects"].apply(lambda x: sorted(extract_floats(x)))

In [31]:
trodes_metadata_df["session_dir"].iloc[0]

'20230629_111937_standard_comp_to_novel_agent_D2_subj_1-1vs2-1and1-4vs2-2'

In [32]:
trodes_metadata_df["all_subjects"].apply(lambda x: tuple(x)).unique()

array([('1.1', '1.4', '2.1', '2.2'), ('1.2', '1.4', '2.1', '2.2'),
       ('1.1', '1.2', '2.1', '2.2')], dtype=object)

In [33]:
trodes_metadata_df["current_subject"] = trodes_metadata_df["recording"].apply(lambda x: x.split("subj")[-1].strip("_").replace("-", ".").replace("_", "."))#.split("t")[0].strip("_").replace("_",".").split(".and."))
trodes_metadata_df["current_subject"] = trodes_metadata_df["current_subject"].apply(lambda x: str(extract_floats(x)[0]).strip())


In [34]:
trodes_metadata_df["current_subject"].unique()

array(['1.1', '1.4', '1.2'], dtype=object)

## Dropping all the rows with unneeded metadata

In [35]:
trodes_metadata_df["metadata_dir"].unique()

array(['DIO', 'analog', 'raw', 'time', 'video_timestamps'], dtype=object)

In [36]:
METADATA_TO_KEEP = ['raw', 'DIO', 'video_timestamps']

In [37]:
trodes_metadata_df = trodes_metadata_df[trodes_metadata_df["metadata_dir"].isin(METADATA_TO_KEEP)]

In [38]:
trodes_metadata_df = trodes_metadata_df[~trodes_metadata_df["metadata_file"].str.contains("out")]
trodes_metadata_df = trodes_metadata_df[~trodes_metadata_df["metadata_file"].str.contains("coordinates")]


In [39]:
trodes_metadata_df = trodes_metadata_df.reset_index(drop=True)

# Getting the first time stamp of each recording

In [40]:
trodes_raw_df = trodes_metadata_df[(trodes_metadata_df["metadata_dir"] == "raw") & (trodes_metadata_df["metadata_file"] == "timestamps")].copy()


In [41]:
trodes_raw_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,filename,decimation,clock rate,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
4,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,raw,timestamps,Raw timestamps,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...,time,"[10693370, 10693371, 10693372, 10693373, 10693...",time,"[10693370, 10693371, 10693372, 10693373, 10693...","[1.1, 1.4, 2.1, 2.2]",1.1
9,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,raw,timestamps,Raw timestamps,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...,time,"[10693370, 10693371, 10693372, 10693373, 10693...",time,"[10693370, 10693371, 10693372, 10693373, 10693...","[1.1, 1.4, 2.1, 2.2]",1.4
10,20230630_115506_standard_comp_to_novel_agent_D...,20230630_115506_standard_comp_to_novel_agent_D...,raw,timestamps,Raw timestamps,little endian,20230630_115506_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,20230630_115506_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...,time,"[10971930, 10971931, 10971932, 10971933, 10971...",time,"[10971930, 10971931, 10971932, 10971933, 10971...","[1.2, 1.4, 2.1, 2.2]",1.2
19,20230630_115506_standard_comp_to_novel_agent_D...,20230630_115506_standard_comp_to_novel_agent_D...,raw,timestamps,Raw timestamps,little endian,20230630_115506_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,20230630_115506_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...,time,"[10971931, 10971932, 10971933, 10971934, 10971...",time,"[10971931, 10971932, 10971933, 10971934, 10971...","[1.2, 1.4, 2.1, 2.2]",1.4
24,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,raw,timestamps,Raw timestamps,little endian,20230628_111202_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,20230628_111202_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...,time,"[3412011, 3412012, 3412013, 3412014, 3412015, ...",time,"[3412011, 3412012, 3412013, 3412014, 3412015, ...","[1.1, 1.2, 2.1, 2.2]",1.2


In [42]:
trodes_raw_df["first_timestamp"] = trodes_raw_df["first_item_data"].apply(lambda x: x[0])

In [43]:
trodes_raw_df["recording"].iloc[0]

'20230629_111937_standard_comp_to_novel_agent_D2_subj_1-1v1-4and2-1_merged'

In [44]:
recording_to_first_timestamp = trodes_raw_df.set_index('session_dir')['first_timestamp'].to_dict()

In [45]:
recording_to_first_timestamp

{'20230629_111937_standard_comp_to_novel_agent_D2_subj_1-1vs2-1and1-4vs2-2': 10693370,
 '20230630_115506_standard_comp_to_novel_agent_D3_subj_1-4vs2-1and1-2vs2-2': 10971931,
 '20230628_111202_standard_comp_to_novel_agent_D1_subj_1-1vs2-2and1-2vs2-1': 3412011}

In [46]:
trodes_metadata_df["first_timestamp"] = trodes_metadata_df["session_dir"].map(recording_to_first_timestamp)

In [47]:
trodes_metadata_df["first_timestamp"]

0     10693370
1     10693370
2     10693370
3     10693370
4     10693370
5     10693370
6     10693370
7     10693370
8     10693370
9     10693370
10    10971931
11    10971931
12    10971931
13    10971931
14    10971931
15    10971931
16    10971931
17    10971931
18    10971931
19    10971931
20     3412011
21     3412011
22     3412011
23     3412011
24     3412011
25     3412011
26     3412011
27     3412011
28     3412011
29     3412011
30    10693370
31    10693370
32    10971931
33    10971931
34     3412011
35     3412011
36     3412011
37     3412011
Name: first_timestamp, dtype: int64

# Getting the event timestamps

In [48]:
trodes_metadata_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,filename,decimation,clock rate,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
0,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din4,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...,time,[10693370],state,[0],"[1.1, 1.4, 2.1, 2.2]",1.1
1,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...,time,"[10693370, 10925203, 10929603, 10938409, 10941...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.1
2,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...,time,"[10693370, 10794401, 10795801, 10799402, 10801...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.1
3,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...,time,"[10693370, 11812814, 12012817, 14212844, 14412...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.1
4,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,raw,timestamps,Raw timestamps,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...,time,"[10693370, 10693371, 10693372, 10693373, 10693...",time,"[10693370, 10693371, 10693372, 10693373, 10693...","[1.1, 1.4, 2.1, 2.2]",1.1


In [49]:
trodes_metadata_df.tail()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,filename,decimation,clock rate,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
33,20230630_115506_standard_comp_to_novel_agent_D...,20230630_115506_standard_comp_to_novel_agent_D...,video_timestamps,1,,,,,,,...,20230630_115506_standard_comp_to_novel_agent_D...,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[10971928, 10973314, 10974700, 10976085, 10976...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1.2, 1.4, 2.1, 2.2]",1.4
34,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,video_timestamps,1,,,,,,,...,20230628_111202_standard_comp_to_novel_agent_D...,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[3412009, 3413395, 3413395, 3414781, 3416167, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1.1, 1.2, 2.1, 2.2]",1.1
35,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,video_timestamps,3,,,,,,,...,20230628_111202_standard_comp_to_novel_agent_D...,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[5691765, 5691765, 5693151, 5694537, 5695922, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1.1, 1.2, 2.1, 2.2]",1.1
36,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,video_timestamps,4,,,,,,,...,20230628_111202_standard_comp_to_novel_agent_D...,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[70274672, 70276058, 70277444, 70277444, 70278...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1.1, 1.2, 2.1, 2.2]",1.1
37,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,video_timestamps,2,,,,,,,...,20230628_111202_standard_comp_to_novel_agent_D...,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[3412009, 3413395, 3414781, 3414781, 3416167, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1.1, 1.2, 2.1, 2.2]",1.1


In [50]:
# trodes_state_df = trodes_metadata_df[trodes_metadata_df["last_dtype_name"] == "state"].copy()

# Filtering for digital IO channels
trodes_state_df = trodes_metadata_df[trodes_metadata_df["metadata_dir"].isin(["DIO"])].copy()
# Filtering for tone and port entry related channels
trodes_state_df = trodes_metadata_df[trodes_metadata_df["id"].isin(["ECU_Din1", "ECU_Din2", "ECU_Din3"])].copy()


In [51]:
trodes_state_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,filename,decimation,clock rate,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
1,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...,time,"[10693370, 10925203, 10929603, 10938409, 10941...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.1
2,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...,time,"[10693370, 10794401, 10795801, 10799402, 10801...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.1
3,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...,time,"[10693370, 11812814, 12012817, 14212844, 14412...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.1
5,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...,time,"[10693370, 10794401, 10795801, 10799402, 10801...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.4
6,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,20230629_111937_standard_comp_to_novel_agent_D...,,,/scratch/back_up/reward_competition_extention/...,time,"[10693370, 11812814, 12012817, 14212844, 14412...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.4


In [52]:
trodes_state_df["event_indexes"] = trodes_state_df.apply(lambda x: np.column_stack([np.where(x["last_item_data"] == 1)[0], np.where(x["last_item_data"] == 1)[0]+1]), axis=1)

In [53]:
trodes_state_df["event_indexes"] = trodes_state_df.apply(lambda x: x["event_indexes"][x["event_indexes"][:, 1] <= x["first_item_data"].shape[0] - 1], axis=1)

In [54]:
trodes_state_df["event_timestamps"] = trodes_state_df.apply(lambda x: x["first_item_data"][x["event_indexes"]], axis=1)

## Updating the video timestamps

## Syncing up the video frame data

In [55]:
# Getting the rows that are the metadata for the video timestamps
trodes_video_df = trodes_metadata_df[trodes_metadata_df["metadata_dir"] == "video_timestamps"].copy().reset_index(drop=True)



In [56]:
# Filtering for the first video only
# This only applies to this pilot data where we are only looking the at competition data
# trodes_video_df = trodes_video_df[trodes_video_df["metadata_file"] == "1"].copy()

In [57]:
trodes_video_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,filename,decimation,clock rate,session_path,first_dtype_name,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject
0,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,video_timestamps,2,,,,,,,...,20230629_111937_standard_comp_to_novel_agent_D...,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[10693368, 10694754, 10694754, 10696140, 10697...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.1
1,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,video_timestamps,1,,,,,,,...,20230629_111937_standard_comp_to_novel_agent_D...,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[10693368, 10694754, 10694754, 10696140, 10697...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.1
2,20230630_115506_standard_comp_to_novel_agent_D...,20230630_115506_standard_comp_to_novel_agent_D...,video_timestamps,2,,,,,,,...,20230630_115506_standard_comp_to_novel_agent_D...,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[10971928, 10971928, 10973314, 10974700, 10976...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1.2, 1.4, 2.1, 2.2]",1.4
3,20230630_115506_standard_comp_to_novel_agent_D...,20230630_115506_standard_comp_to_novel_agent_D...,video_timestamps,1,,,,,,,...,20230630_115506_standard_comp_to_novel_agent_D...,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[10971928, 10973314, 10974700, 10976085, 10976...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1.2, 1.4, 2.1, 2.2]",1.4
4,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,video_timestamps,1,,,,,,,...,20230628_111202_standard_comp_to_novel_agent_D...,,30000,/scratch/back_up/reward_competition_extention/...,PosTimestamp,"[3412009, 3413395, 3413395, 3414781, 3416167, ...",HWTimestamp,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1.1, 1.2, 2.1, 2.2]",1.1


In [58]:
# Making the video timestamps into an evenly distributed array
trodes_video_df["video_timestamps"] = trodes_video_df["first_item_data"]

In [59]:
# Removing the columns that are no longer needed
trodes_video_df = trodes_video_df[["filename", "video_timestamps", "session_dir"]].copy()

In [60]:
# Renaming the filename so that we can merge with other dataframes with the same column name
trodes_video_df = trodes_video_df.rename(columns={"filename": "video_name"})

In [61]:
trodes_video_df.head()

Unnamed: 0,video_name,video_timestamps,session_dir
0,20230629_111937_standard_comp_to_novel_agent_D...,"[10693368, 10694754, 10694754, 10696140, 10697...",20230629_111937_standard_comp_to_novel_agent_D...
1,20230629_111937_standard_comp_to_novel_agent_D...,"[10693368, 10694754, 10694754, 10696140, 10697...",20230629_111937_standard_comp_to_novel_agent_D...
2,20230630_115506_standard_comp_to_novel_agent_D...,"[10971928, 10971928, 10973314, 10974700, 10976...",20230630_115506_standard_comp_to_novel_agent_D...
3,20230630_115506_standard_comp_to_novel_agent_D...,"[10971928, 10973314, 10974700, 10976085, 10976...",20230630_115506_standard_comp_to_novel_agent_D...
4,20230628_111202_standard_comp_to_novel_agent_D...,"[3412009, 3413395, 3413395, 3414781, 3416167, ...",20230628_111202_standard_comp_to_novel_agent_D...


- Adding each video as a row to each state row

In [62]:
trodes_state_df = pd.merge(trodes_state_df, trodes_video_df, on=["session_dir"], how="inner")

In [63]:
trodes_state_df.columns

Index(['session_dir', 'recording', 'metadata_dir', 'metadata_file',
       'description', 'byte_order', 'original_file', 'clockrate',
       'trodes_version', 'compile_date', 'compile_time', 'qt_version',
       'commit_tag', 'controller_firmware', 'headstage_firmware',
       'controller_serialnum', 'headstage_serialnum', 'autosettle', 'smartref',
       'gyro', 'accelerometer', 'magnetometer', 'time_offset',
       'system_time_at_creation', 'timestamp_at_creation', 'first_timestamp',
       'direction', 'id', 'display_order', 'fields', 'data', 'filename',
       'decimation', 'clock rate', 'session_path', 'first_dtype_name',
       'first_item_data', 'last_dtype_name', 'last_item_data', 'all_subjects',
       'current_subject', 'event_indexes', 'event_timestamps', 'video_name',
       'video_timestamps'],
      dtype='object')

## Finding the closest frame to each event

In [64]:
trodes_state_df["event_timestamps"].iloc[1]

array([[10925203, 10929603],
       [10938409, 10941403],
       [10966006, 10975006],
       ...,
       [78460043, 78553044],
       [78565447, 78792847],
       [78932251, 79027255]], dtype=uint32)

In [65]:
trodes_state_df["event_frames"] = trodes_state_df.apply(lambda x: utilities.helper.find_nearest_indices(x["event_timestamps"], x["video_timestamps"]), axis=1)

In [66]:
trodes_state_df.head()

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject,event_indexes,event_timestamps,video_name,video_timestamps,event_frames
0,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,"[10693370, 10925203, 10929603, 10938409, 10941...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.1,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[10925203, 10929603], [10938409, 10941403], [...",20230629_111937_standard_comp_to_novel_agent_D...,"[10693368, 10694754, 10694754, 10696140, 10697...","[[232, 236], [245, 248], [272, 282], [1237, 13..."
1,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,"[10693370, 10925203, 10929603, 10938409, 10941...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.1,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[10925203, 10929603], [10938409, 10941403], [...",20230629_111937_standard_comp_to_novel_agent_D...,"[10693368, 10694754, 10694754, 10696140, 10697...","[[232, 236], [245, 248], [272, 282], [1236, 13..."
2,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,"[10693370, 10794401, 10795801, 10799402, 10801...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.1,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[10794401, 10795801], [10799402, 10801204], [...",20230629_111937_standard_comp_to_novel_agent_D...,"[10693368, 10694754, 10694754, 10696140, 10697...","[[101, 103], [106, 108], [989, 989], [990, 992..."
3,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,"[10693370, 10794401, 10795801, 10799402, 10801...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.1,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[10794401, 10795801], [10799402, 10801204], [...",20230629_111937_standard_comp_to_novel_agent_D...,"[10693368, 10694754, 10694754, 10696140, 10697...","[[101, 102], [106, 108], [989, 989], [990, 991..."
4,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,"[10693370, 11812814, 12012817, 14212844, 14412...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.1,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[11812814, 12012817], [14212844, 14412846], [...",20230629_111937_standard_comp_to_novel_agent_D...,"[10693368, 10694754, 10694754, 10696140, 10697...","[[1118, 1318], [3513, 3713], [5810, 6009], [76..."


## Combine raw and state dataframes

In [67]:
trodes_state_df

Unnamed: 0,session_dir,recording,metadata_dir,metadata_file,description,byte_order,original_file,clockrate,trodes_version,compile_date,...,first_item_data,last_dtype_name,last_item_data,all_subjects,current_subject,event_indexes,event_timestamps,video_name,video_timestamps,event_frames
0,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,"[10693370, 10925203, 10929603, 10938409, 10941...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.1,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[10925203, 10929603], [10938409, 10941403], [...",20230629_111937_standard_comp_to_novel_agent_D...,"[10693368, 10694754, 10694754, 10696140, 10697...","[[232, 236], [245, 248], [272, 282], [1237, 13..."
1,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din2,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,"[10693370, 10925203, 10929603, 10938409, 10941...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.1,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[10925203, 10929603], [10938409, 10941403], [...",20230629_111937_standard_comp_to_novel_agent_D...,"[10693368, 10694754, 10694754, 10696140, 10697...","[[232, 236], [245, 248], [272, 282], [1236, 13..."
2,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,"[10693370, 10794401, 10795801, 10799402, 10801...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.1,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[10794401, 10795801], [10799402, 10801204], [...",20230629_111937_standard_comp_to_novel_agent_D...,"[10693368, 10694754, 10694754, 10696140, 10697...","[[101, 103], [106, 108], [989, 989], [990, 992..."
3,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,"[10693370, 10794401, 10795801, 10799402, 10801...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.1,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[10794401, 10795801], [10799402, 10801204], [...",20230629_111937_standard_comp_to_novel_agent_D...,"[10693368, 10694754, 10694754, 10696140, 10697...","[[101, 102], [106, 108], [989, 989], [990, 991..."
4,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,"[10693370, 11812814, 12012817, 14212844, 14412...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.1,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[11812814, 12012817], [14212844, 14412846], [...",20230629_111937_standard_comp_to_novel_agent_D...,"[10693368, 10694754, 10694754, 10696140, 10697...","[[1118, 1318], [3513, 3713], [5810, 6009], [76..."
5,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,"[10693370, 11812814, 12012817, 14212844, 14412...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.1,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[11812814, 12012817], [14212844, 14412846], [...",20230629_111937_standard_comp_to_novel_agent_D...,"[10693368, 10694754, 10694754, 10696140, 10697...","[[1118, 1318], [3513, 3712], [5810, 6009], [76..."
6,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,"[10693370, 10794401, 10795801, 10799402, 10801...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.4,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[10794401, 10795801], [10799402, 10801204], [...",20230629_111937_standard_comp_to_novel_agent_D...,"[10693368, 10694754, 10694754, 10696140, 10697...","[[101, 103], [106, 108], [989, 989], [990, 992..."
7,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din3,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,"[10693370, 10794401, 10795801, 10799402, 10801...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.4,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[10794401, 10795801], [10799402, 10801204], [...",20230629_111937_standard_comp_to_novel_agent_D...,"[10693368, 10694754, 10694754, 10696140, 10697...","[[101, 102], [106, 108], [989, 989], [990, 991..."
8,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,"[10693370, 11812814, 12012817, 14212844, 14412...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.4,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[11812814, 12012817], [14212844, 14412846], [...",20230629_111937_standard_comp_to_novel_agent_D...,"[10693368, 10694754, 10694754, 10696140, 10697...","[[1118, 1318], [3513, 3713], [5810, 6009], [76..."
9,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,DIO,dio_ECU_Din1,State change data for one digital channel. Dis...,little endian,20230629_111937_standard_comp_to_novel_agent_D...,20000,2.4.0,May 24 2023,...,"[10693370, 11812814, 12012817, 14212844, 14412...",state,"[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, ...","[1.1, 1.4, 2.1, 2.2]",1.4,"[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11,...","[[11812814, 12012817], [14212844, 14412846], [...",20230629_111937_standard_comp_to_novel_agent_D...,"[10693368, 10694754, 10694754, 10696140, 10697...","[[1118, 1318], [3513, 3712], [5810, 6009], [76..."


In [68]:
trodes_state_df = trodes_state_df[STATE_COLS_TO_KEEP].drop_duplicates(subset=["session_dir", "video_name", "metadata_file"]).sort_values(["session_dir", "video_name", "metadata_file"]).reset_index(drop=True).copy()

In [69]:
trodes_state_df.head()

Unnamed: 0,session_dir,metadata_file,event_timestamps,video_name,video_timestamps,event_frames
0,20230628_111202_standard_comp_to_novel_agent_D...,dio_ECU_Din1,"[[6920369, 7120370], [9220396, 9420401], [1102...",20230628_111202_standard_comp_to_novel_agent_D...,"[3412009, 3413395, 3413395, 3414781, 3416167, ...","[[3392, 3591], [5689, 5888], [7484, 7685], [85..."
1,20230628_111202_standard_comp_to_novel_agent_D...,dio_ECU_Din2,"[[3412011, 3493126], [3522924, 3547524], [3548...",20230628_111202_standard_comp_to_novel_agent_D...,"[3412009, 3413395, 3413395, 3414781, 3416167, ...","[[1, 82], [112, 136], [137, 153], [155, 166], ..."
2,20230628_111202_standard_comp_to_novel_agent_D...,dio_ECU_Din3,"[[3507126, 3511924], [4563937, 4565740], [4569...",20230628_111202_standard_comp_to_novel_agent_D...,"[3412009, 3413395, 3413395, 3414781, 3416167, ...","[[95, 101], [1151, 1152], [1155, 1163], [1165,..."
3,20230628_111202_standard_comp_to_novel_agent_D...,dio_ECU_Din1,"[[6920369, 7120370], [9220396, 9420401], [1102...",20230628_111202_standard_comp_to_novel_agent_D...,"[3412009, 3413395, 3414781, 3414781, 3416167, ...","[[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0..."
4,20230628_111202_standard_comp_to_novel_agent_D...,dio_ECU_Din2,"[[3412011, 3493126], [3522924, 3547524], [3548...",20230628_111202_standard_comp_to_novel_agent_D...,"[3412009, 3413395, 3414781, 3414781, 3416167, ...","[[1, 81], [112, 135], [136, 153], [154, 166], ..."


In [70]:
trodes_state_df = trodes_state_df.groupby(same_columns).agg({**{col: 'first' for col in trodes_state_df.columns if col not in same_columns + different_columns}, **{col: lambda x: x.tolist() for col in different_columns}}).reset_index()

In [71]:
trodes_state_df.head()

Unnamed: 0,session_dir,video_name,video_timestamps,metadata_file,event_frames,event_timestamps
0,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,"[3412009, 3413395, 3413395, 3414781, 3416167, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[3392, 3591], [5689, 5888], [7484, 7685], [8...","[[[6920369, 7120370], [9220396, 9420401], [110..."
1,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,"[3412009, 3413395, 3414781, 3414781, 3416167, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, ...","[[[6920369, 7120370], [9220396, 9420401], [110..."
2,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,"[5691765, 5691765, 5693151, 5694537, 5695922, ...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[1227, 1426], [3523, 3723], [5319, 5520], [6...","[[[6920369, 7120370], [9220396, 9420401], [110..."
3,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,"[70274672, 70276058, 70277444, 70277444, 70278...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, ...","[[[6920369, 7120370], [9220396, 9420401], [110..."
4,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,"[10693368, 10694754, 10694754, 10696140, 10697...","[dio_ECU_Din1, dio_ECU_Din2, dio_ECU_Din3]","[[[1118, 1318], [3513, 3712], [5810, 6009], [7...","[[[11812814, 12012817], [14212844, 14412846], ..."


In [72]:
trodes_state_df["tone_timestamps"] = trodes_state_df["event_timestamps"].apply(lambda x: x[0])
trodes_state_df["box_1_port_entry_timestamps"] = trodes_state_df["event_timestamps"].apply(lambda x: x[1])
trodes_state_df["box_2_port_entry_timestamps"] = trodes_state_df["event_timestamps"].apply(lambda x: x[2])

trodes_state_df["tone_frames"] = trodes_state_df["event_frames"].apply(lambda x: x[0])
trodes_state_df["box_1_port_entry_frames"] = trodes_state_df["event_frames"].apply(lambda x: x[1])
trodes_state_df["box_2_port_entry_frames"] = trodes_state_df["event_frames"].apply(lambda x: x[2])


In [73]:
trodes_state_df = trodes_state_df.drop(columns=["event_timestamps", "event_frames", "metadata_file"], errors="ignore")

In [74]:
trodes_state_df.head()

Unnamed: 0,session_dir,video_name,video_timestamps,tone_timestamps,box_1_port_entry_timestamps,box_2_port_entry_timestamps,tone_frames,box_1_port_entry_frames,box_2_port_entry_frames
0,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,"[3412009, 3413395, 3413395, 3414781, 3416167, ...","[[6920369, 7120370], [9220396, 9420401], [1102...","[[3412011, 3493126], [3522924, 3547524], [3548...","[[3507126, 3511924], [4563937, 4565740], [4569...","[[3392, 3591], [5689, 5888], [7484, 7685], [85...","[[1, 82], [112, 136], [137, 153], [155, 166], ...","[[95, 101], [1151, 1152], [1155, 1163], [1165,..."
1,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,"[3412009, 3413395, 3414781, 3414781, 3416167, ...","[[6920369, 7120370], [9220396, 9420401], [1102...","[[3412011, 3493126], [3522924, 3547524], [3548...","[[3507126, 3511924], [4563937, 4565740], [4569...","[[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0...","[[1, 81], [112, 135], [136, 153], [154, 166], ...","[[95, 100], [1151, 1152], [1155, 1163], [1164,..."
2,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,"[5691765, 5691765, 5693151, 5694537, 5695922, ...","[[6920369, 7120370], [9220396, 9420401], [1102...","[[3412011, 3493126], [3522924, 3547524], [3548...","[[3507126, 3511924], [4563937, 4565740], [4569...","[[1227, 1426], [3523, 3723], [5319, 5520], [64...","[[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0...","[[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0..."
3,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,"[70274672, 70276058, 70277444, 70277444, 70278...","[[6920369, 7120370], [9220396, 9420401], [1102...","[[3412011, 3493126], [3522924, 3547524], [3548...","[[3507126, 3511924], [4563937, 4565740], [4569...","[[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0...","[[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0...","[[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0..."
4,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,"[10693368, 10694754, 10694754, 10696140, 10697...","[[11812814, 12012817], [14212844, 14412846], [...","[[10925203, 10929603], [10938409, 10941403], [...","[[10794401, 10795801], [10799402, 10801204], [...","[[1118, 1318], [3513, 3712], [5810, 6009], [76...","[[232, 236], [245, 248], [272, 282], [1236, 13...","[[101, 102], [106, 108], [989, 989], [990, 991..."


In [75]:
trodes_raw_df = trodes_raw_df[RAW_COLS_TO_KEEP].reset_index(drop=True).copy()

In [76]:
trodes_raw_df.head()

Unnamed: 0,session_dir,recording,original_file,session_path,current_subject,first_item_data,first_timestamp,all_subjects
0,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,/scratch/back_up/reward_competition_extention/...,1.1,"[10693370, 10693371, 10693372, 10693373, 10693...",10693370,"[1.1, 1.4, 2.1, 2.2]"
1,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,20230629_111937_standard_comp_to_novel_agent_D...,/scratch/back_up/reward_competition_extention/...,1.4,"[10693370, 10693371, 10693372, 10693373, 10693...",10693370,"[1.1, 1.4, 2.1, 2.2]"
2,20230630_115506_standard_comp_to_novel_agent_D...,20230630_115506_standard_comp_to_novel_agent_D...,20230630_115506_standard_comp_to_novel_agent_D...,/scratch/back_up/reward_competition_extention/...,1.2,"[10971930, 10971931, 10971932, 10971933, 10971...",10971930,"[1.2, 1.4, 2.1, 2.2]"
3,20230630_115506_standard_comp_to_novel_agent_D...,20230630_115506_standard_comp_to_novel_agent_D...,20230630_115506_standard_comp_to_novel_agent_D...,/scratch/back_up/reward_competition_extention/...,1.4,"[10971931, 10971932, 10971933, 10971934, 10971...",10971931,"[1.2, 1.4, 2.1, 2.2]"
4,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,20230628_111202_standard_comp_to_novel_agent_D...,/scratch/back_up/reward_competition_extention/...,1.2,"[3412011, 3412012, 3412013, 3412014, 3412015, ...",3412011,"[1.1, 1.2, 2.1, 2.2]"


In [77]:
trodes_final_df = pd.merge(trodes_raw_df, trodes_state_df, on=["session_dir"], how="inner")

In [78]:
trodes_final_df.shape

(16, 16)

In [79]:
trodes_final_df = trodes_final_df.rename(columns={"first_item_data": "raw_timestamps"})
trodes_final_df = trodes_final_df.drop(columns=["metadata_file"], errors="ignore")
trodes_final_df = trodes_final_df.sort_values(["session_dir", "recording"]).reset_index(drop=True).copy()

## Making the timestamps 0 indexed

In [80]:
trodes_final_df[[col for col in trodes_final_df.columns if "timestamps" in col]].head()

Unnamed: 0,raw_timestamps,video_timestamps,tone_timestamps,box_1_port_entry_timestamps,box_2_port_entry_timestamps
0,"[3412011, 3412012, 3412013, 3412014, 3412015, ...","[3412009, 3413395, 3413395, 3414781, 3416167, ...","[[6920369, 7120370], [9220396, 9420401], [1102...","[[3412011, 3493126], [3522924, 3547524], [3548...","[[3507126, 3511924], [4563937, 4565740], [4569..."
1,"[3412011, 3412012, 3412013, 3412014, 3412015, ...","[3412009, 3413395, 3414781, 3414781, 3416167, ...","[[6920369, 7120370], [9220396, 9420401], [1102...","[[3412011, 3493126], [3522924, 3547524], [3548...","[[3507126, 3511924], [4563937, 4565740], [4569..."
2,"[3412011, 3412012, 3412013, 3412014, 3412015, ...","[5691765, 5691765, 5693151, 5694537, 5695922, ...","[[6920369, 7120370], [9220396, 9420401], [1102...","[[3412011, 3493126], [3522924, 3547524], [3548...","[[3507126, 3511924], [4563937, 4565740], [4569..."
3,"[3412011, 3412012, 3412013, 3412014, 3412015, ...","[70274672, 70276058, 70277444, 70277444, 70278...","[[6920369, 7120370], [9220396, 9420401], [1102...","[[3412011, 3493126], [3522924, 3547524], [3548...","[[3507126, 3511924], [4563937, 4565740], [4569..."
4,"[3412011, 3412012, 3412013, 3412014, 3412015, ...","[3412009, 3413395, 3413395, 3414781, 3416167, ...","[[6920369, 7120370], [9220396, 9420401], [1102...","[[3412011, 3493126], [3522924, 3547524], [3548...","[[3507126, 3511924], [4563937, 4565740], [4569..."


In [81]:
trodes_final_df["last_timestamp"] = trodes_final_df["raw_timestamps"].apply(lambda x: x[-1])

- Dropping raw timestamps because of memory issues

In [82]:
trodes_final_df = trodes_final_df.drop(columns=["raw_timestamps", "original_file"], errors="ignore")

In [83]:
copy_trodes_final_df = trodes_final_df.copy

In [84]:
for col in [col for col in trodes_final_df.columns if "timestamps" in col]:
    trodes_final_df[col] = trodes_final_df.apply(lambda x: x[col].astype(np.int32) - np.int32(x["first_timestamp"]), axis=1)

for col in [col for col in trodes_final_df.columns if "frames" in col]:
    trodes_final_df[col] = trodes_final_df[col].apply(lambda x: x.astype(np.int32))

In [85]:
sorted_columns = sorted(trodes_final_df.columns
, key=lambda x: x.split("_")[-1])
trodes_final_df = trodes_final_df[sorted_columns].copy()

## Saving to a file

In [86]:
trodes_final_df.to_pickle(os.path.join(OUTPUT_DIR, "{}_00_trodes_metadata.pkl".format(OUTPUT_PREFIX)))

In [87]:
trodes_final_df.head()

Unnamed: 0,session_dir,tone_frames,box_1_port_entry_frames,box_2_port_entry_frames,video_name,session_path,recording,current_subject,all_subjects,first_timestamp,last_timestamp,video_timestamps,tone_timestamps,box_1_port_entry_timestamps,box_2_port_entry_timestamps
0,20230628_111202_standard_comp_to_novel_agent_D...,"[[3392, 3591], [5689, 5888], [7484, 7685], [85...","[[1, 82], [112, 136], [137, 153], [155, 166], ...","[[95, 101], [1151, 1152], [1155, 1163], [1165,...",20230628_111202_standard_comp_to_novel_agent_D...,/scratch/back_up/reward_competition_extention/...,20230628_111202_standard_comp_to_novel_agent_D...,1.1,"[1.1, 1.2, 2.1, 2.2]",3412011,71742308,"[-2, 1384, 1384, 2770, 4156, 4156, 5542, 6928,...","[[3508358, 3708359], [5808385, 6008390], [7608...","[[0, 81115], [110913, 135513], [136916, 153319...","[[95115, 99913], [1151926, 1153729], [1157126,..."
1,20230628_111202_standard_comp_to_novel_agent_D...,"[[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0...","[[1, 81], [112, 135], [136, 153], [154, 166], ...","[[95, 100], [1151, 1152], [1155, 1163], [1164,...",20230628_111202_standard_comp_to_novel_agent_D...,/scratch/back_up/reward_competition_extention/...,20230628_111202_standard_comp_to_novel_agent_D...,1.1,"[1.1, 1.2, 2.1, 2.2]",3412011,71742308,"[-2, 1384, 2770, 2770, 4156, 5542, 6928, 6928,...","[[3508358, 3708359], [5808385, 6008390], [7608...","[[0, 81115], [110913, 135513], [136916, 153319...","[[95115, 99913], [1151926, 1153729], [1157126,..."
2,20230628_111202_standard_comp_to_novel_agent_D...,"[[1227, 1426], [3523, 3723], [5319, 5520], [64...","[[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0...","[[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0...",20230628_111202_standard_comp_to_novel_agent_D...,/scratch/back_up/reward_competition_extention/...,20230628_111202_standard_comp_to_novel_agent_D...,1.1,"[1.1, 1.2, 2.1, 2.2]",3412011,71742308,"[2279754, 2279754, 2281140, 2282526, 2283911, ...","[[3508358, 3708359], [5808385, 6008390], [7608...","[[0, 81115], [110913, 135513], [136916, 153319...","[[95115, 99913], [1151926, 1153729], [1157126,..."
3,20230628_111202_standard_comp_to_novel_agent_D...,"[[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0...","[[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0...","[[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0...",20230628_111202_standard_comp_to_novel_agent_D...,/scratch/back_up/reward_competition_extention/...,20230628_111202_standard_comp_to_novel_agent_D...,1.1,"[1.1, 1.2, 2.1, 2.2]",3412011,71742308,"[66862661, 66864047, 66865433, 66865433, 66866...","[[3508358, 3708359], [5808385, 6008390], [7608...","[[0, 81115], [110913, 135513], [136916, 153319...","[[95115, 99913], [1151926, 1153729], [1157126,..."
4,20230628_111202_standard_comp_to_novel_agent_D...,"[[3392, 3591], [5689, 5888], [7484, 7685], [85...","[[1, 82], [112, 136], [137, 153], [155, 166], ...","[[95, 101], [1151, 1152], [1155, 1163], [1165,...",20230628_111202_standard_comp_to_novel_agent_D...,/scratch/back_up/reward_competition_extention/...,20230628_111202_standard_comp_to_novel_agent_D...,1.2,"[1.1, 1.2, 2.1, 2.2]",3412011,71742308,"[-2, 1384, 1384, 2770, 4156, 4156, 5542, 6928,...","[[3508358, 3708359], [5808385, 6008390], [7608...","[[0, 81115], [110913, 135513], [136916, 153319...","[[95115, 99913], [1151926, 1153729], [1157126,..."


In [88]:
trodes_final_df["session_dir"].unique()

array(['20230628_111202_standard_comp_to_novel_agent_D1_subj_1-1vs2-2and1-2vs2-1',
       '20230629_111937_standard_comp_to_novel_agent_D2_subj_1-1vs2-1and1-4vs2-2',
       '20230630_115506_standard_comp_to_novel_agent_D3_subj_1-4vs2-1and1-2vs2-2'],
      dtype=object)

In [89]:
trodes_final_df["video_name"].unique()

array(['20230628_111202_standard_comp_to_novel_agent_D1_subj_1-1vs2-2and1-2vs2-1.1.videoTimeStamps.cameraHWSync',
       '20230628_111202_standard_comp_to_novel_agent_D1_subj_1-1vs2-2and1-2vs2-1.2.videoTimeStamps.cameraHWSync',
       '20230628_111202_standard_comp_to_novel_agent_D1_subj_1-1vs2-2and1-2vs2-1.3.videoTimeStamps.cameraHWSync',
       '20230628_111202_standard_comp_to_novel_agent_D1_subj_1-1vs2-2and1-2vs2-1.4.videoTimeStamps.cameraHWSync',
       '20230629_111937_standard_comp_to_novel_agent_D2_subj_1-1vs2-1and1-4vs2-2.1.videoTimeStamps.cameraHWSync',
       '20230629_111937_standard_comp_to_novel_agent_D2_subj_1-1vs2-1and1-4vs2-2.2.videoTimeStamps.cameraHWSync',
       '20230630_115506_standard_comp_to_novel_agent_D3_subj_1-4vs2-1and1-2vs2-2.1.videoTimeStamps.cameraHWSync',
       '20230630_115506_standard_comp_to_novel_agent_D3_subj_1-4vs2-1and1-2vs2-2.2.videoTimeStamps.cameraHWSync'],
      dtype=object)