# Notebook 1: MED-PC Extracting the Recording Data and Metadata

# Importing the Python Libraries

In [1]:
import sys
import glob
import collections
from collections import defaultdict
import os
from datetime import datetime
import pathlib

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from medpc2excel.medpc_read import medpc_read
from moviepy.editor import *

In [3]:
# setting path
sys.path.append('../../src')

In [4]:
# All the libraries that were created for this repository
import extract.dataframe
import processing.tone
import extract.metadata

# Getting the Metadata from all the MED-PC Recording Files

- Every MED-PC Recording file has Metadata about time, data, subject, group, experiment name, script used, and the MED-PC box number on the top of the recording file. We will first extract that information
    - Metadata can be background information or context about data(or files) that is often different from the actual data that is intended to be recorded.  
        For more information on Metadata: https://en.wikipedia.org/wiki/Metadata

- The path of the directory that this notebook is in. This will be where relative paths will be based off of

In [5]:
current_working_directory = os.getcwd()

In [6]:
current_working_directory

'/root/projects/behavioral_dataframe_processing/results/2022_07_20_repeated_id_fix'

- All the other files in this directory. If you want to use a folder in here such as the `data` folder, you'd type `./data/{name_of_folder}` where you replace `{name_of_folder}` with the name of the folder without the `{}`. You will do this in the cells following the one below
    - The `./` means the path will reference the current directory that the command is being used from

In [7]:
os.listdir(current_working_directory)

['data',
 'scripts',
 '03_calculating_port_entry_precision.ipynb',
 '.ipynb_checkpoints',
 'README.MD',
 'proc',
 '02_calculating_latencies.ipynb',
 '04_syncing_with_video.ipynb',
 '01_extracting_recording_data_and_metadata.ipynb']

## NOTE: If you are using your own data, the following path must be changed to the directory where your MED-PC recording files are located(if they are not in the specied folder). It is recommended to create a sub folder in the `data` folder(that is in the same folder as this notebook) for each group of session you want to process and put the recording files in there

- Use the cell below instead if you're using your own data. You must change the cell below by clicking on the cell and then pressing the `esc` key. Then press the `Y` button on your keyboard. If you want to switch it back do the same but press the `R` button instead. This switches it from a cell that is run as if it's code, or back to a cell where it's just treated as text
    - **NOTE: If you are using a different folder, then change the path in the cell below**
    - The asterisk is called a wild card which will tell the computer to find all files/folders that match this pattern. The `*` means that any pattern of any length can be replaced with it. The `**` tells the computer to look in all folders that are in the specified folder. For more information: https://linuxhint.com/bash_wildcard_tutorial/
    - For more information on finding the path of your folder that contains the recording files: https://www.computerhope.com/issues/ch001708.htm

In [8]:
all_med_pc_file = glob.glob("./data/pilot_c57_vs_cd1/reward_training/med_pc_text_files/*.txt")

In [9]:
all_med_pc_file[:10]

['./data/pilot_c57_vs_cd1/reward_training/med_pc_text_files/2022-07-22_12h26m_Subject 1.txt',
 './data/pilot_c57_vs_cd1/reward_training/med_pc_text_files/2022-07-23_14h55m_Subject 4.txt',
 './data/pilot_c57_vs_cd1/reward_training/med_pc_text_files/2022-07-23_10h06m_Subject 2.txt',
 './data/pilot_c57_vs_cd1/reward_training/med_pc_text_files/2022-07-22_10h11m_Subject 2.txt',
 './data/pilot_c57_vs_cd1/reward_training/med_pc_text_files/2022-07-22_09h00m_Subject 3.txt',
 './data/pilot_c57_vs_cd1/reward_training/med_pc_text_files/2022-07-22_11h19m_Subject 3.txt',
 './data/pilot_c57_vs_cd1/reward_training/med_pc_text_files/2022-07-23_14h55m_Subject 3.txt',
 './data/pilot_c57_vs_cd1/reward_training/med_pc_text_files/2022-07-19_13h38m_Subject 1.txt',
 './data/pilot_c57_vs_cd1/reward_training/med_pc_text_files/2022-07-18_15h10m_Subject 4.txt',
 './data/pilot_c57_vs_cd1/reward_training/med_pc_text_files/2022-07-20_09h35m_Subject 4.txt']

- Example of what the MED-PC Recording file looks like

In [10]:
with open(all_med_pc_file[0]) as f:
    lines = f.readlines()
    for line in lines[:20]:
        print(line)

File: C:\MED-PC\Data\2022-07-22_12h26m_Subject 1.txt







Start Date: 07/22/22

End Date: 07/22/22

Subject: 1

Experiment: C57_vs_CD1_comparison 

Group: Cage4

Box: 1

Start Time: 12:26:57

End Time: 13:36:44

MSN: CD1_reward_training

A:    4399.000

D:    9000.000

E:       0.000

L:       0.000

M:       0.000

O:       0.000

T:    3660.000



- We will be extracting the first 10 or so lines that look like:

```
File: C:\MED-PC\Data\2022-05-06_12h59m_Subject 3.4 (2).txt

Start Date: 05/06/22

End Date: 05/06/22

Subject: 3.4 (2)

Experiment: Pilot of Pilot

Group: Cage 4

Box: 1

Start Time: 12:59:58

End Time: 14:02:38

MSN: levelNP_CS_reward_laserepochON1st_noshock
```

- We will just find all the lines that start with `"File", "Start Date", "End Date", "Subject", "Experiment", "Group", "Box", "Start Time", "End Time", or "MSN"`. And get the metadata from those lines. And then stop once all the metadata types have been collected

In [11]:
# This makes a nested dictionary of file paths to each individual metadata type
file_path_to_meta_data = extract.metadata.get_all_med_pc_meta_data_from_files(list_of_files=all_med_pc_file)

In [12]:
# The metadata for the first file
for key, value in file_path_to_meta_data.items():
    print("File path: {}".format(key))
    print("Metadata types and associated values: {}".format(value))
    break

File path: ./data/pilot_c57_vs_cd1/reward_training/med_pc_text_files/2022-07-22_12h26m_Subject 1.txt
Metadata types and associated values: {'File': 'C:\\MED-PC\\Data\\2022-07-22_12h26m_Subject 1.txt', 'Start Date': '07/22/22', 'End Date': '07/22/22', 'Subject': '1', 'Experiment': 'C57_vs_CD1_comparison', 'Group': 'Cage4', 'Box': '1', 'Start Time': '12:26:57', 'End Time': '13:36:44', 'MSN': 'CD1_reward_training'}


## Making a Dataframe out of the Metadata

- A Dataframe is essentially a "programmable" spreadsheet. But instead of clicking on cells, you will have to tell Python how you want to interact with the spreadsheet
    - For more information: https://realpython.com/pandas-dataframe/

In [13]:
# Turning the dictionary into a Pandas Dataframe
metadata_df = pd.DataFrame.from_dict(file_path_to_meta_data, orient="index")
# Resetting the index because currently the file path is the index 
metadata_df = metadata_df.reset_index()

In [14]:
metadata_df.head()

Unnamed: 0,index,File,Start Date,End Date,Subject,Experiment,Group,Box,Start Time,End Time,MSN
0,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-22_12h26m_Subject 1.txt,07/22/22,07/22/22,1,C57_vs_CD1_comparison,Cage4,1,12:26:57,13:36:44,CD1_reward_training
1,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-23_14h55m_Subject 4.txt,07/23/22,07/23/22,4,C57_vs_CD1_Comparison,Cage 6,1,14:55:47,16:05:03,CD1_reward_training
2,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-23_10h06m_Subject 2.txt,07/23/22,07/23/22,2,C57_vs_CD1_comparison,Cage2,3,10:06:22,11:09:48,CD1_reward_training
3,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-22_10h11m_Subject 2.txt,07/22/22,07/22/22,2,C57_vs_CD1_comparison,Cage2,2,10:11:03,11:14:37,CD1_reward_training
4,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-22_09h00m_Subject 3.txt,07/22/22,07/22/22,3,C57_vs_CD1_comparison,Cage1,3,09:00:32,10:05:02,CD1_reward_training


In [15]:
metadata_df.tail()

Unnamed: 0,index,File,Start Date,End Date,Subject,Experiment,Group,Box,Start Time,End Time,MSN
164,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-20_10h48m_Subject 4.txt,07/20/22,07/20/22,4,C57_vs_CD1_Comparison,Cage3,2,10:48:32,11:53:40,C57_reward_training
165,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-24_10h17m_Subject 2.txt,07/24/22,07/24/22,2,C57_vs_CD1_Comparison,Cage2,4,10:17:01,11:22:34,C57_reward_training
166,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-22_10h11m_Subject 1.txt,07/22/22,07/22/22,1,C57_vs_CD1_comparison,Cage2,1,10:11:03,11:14:37,C57_reward_training
167,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-20_10h48m_Subject 3.txt,07/20/22,07/20/22,3,C57_vs_CD1_Comparison,Cage3,1,10:48:32,11:53:40,C57_reward_training
168,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-24_14h53m_Subject 3.txt,07/24/22,07/24/22,3,C57_vs_CD1_comparison,Cage 6,1,14:53:31,15:56:03,CD1_reward_training


- Make a column that is just the file name

In [16]:
# Changing the file path to Windows format, then back to Linux format
# Then just taking the file name
metadata_df["file_name"] = metadata_df["File"].apply(lambda x: os.path.basename(pathlib.Path(pathlib.PureWindowsPath(x))).strip())

In [17]:
metadata_df

Unnamed: 0,index,File,Start Date,End Date,Subject,Experiment,Group,Box,Start Time,End Time,MSN,file_name
0,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-22_12h26m_Subject 1.txt,07/22/22,07/22/22,1,C57_vs_CD1_comparison,Cage4,1,12:26:57,13:36:44,CD1_reward_training,2022-07-22_12h26m_Subject 1.txt
1,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-23_14h55m_Subject 4.txt,07/23/22,07/23/22,4,C57_vs_CD1_Comparison,Cage 6,1,14:55:47,16:05:03,CD1_reward_training,2022-07-23_14h55m_Subject 4.txt
2,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-23_10h06m_Subject 2.txt,07/23/22,07/23/22,2,C57_vs_CD1_comparison,Cage2,3,10:06:22,11:09:48,CD1_reward_training,2022-07-23_10h06m_Subject 2.txt
3,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-22_10h11m_Subject 2.txt,07/22/22,07/22/22,2,C57_vs_CD1_comparison,Cage2,2,10:11:03,11:14:37,CD1_reward_training,2022-07-22_10h11m_Subject 2.txt
4,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-22_09h00m_Subject 3.txt,07/22/22,07/22/22,3,C57_vs_CD1_comparison,Cage1,3,09:00:32,10:05:02,CD1_reward_training,2022-07-22_09h00m_Subject 3.txt
...,...,...,...,...,...,...,...,...,...,...,...,...
164,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-20_10h48m_Subject 4.txt,07/20/22,07/20/22,4,C57_vs_CD1_Comparison,Cage3,2,10:48:32,11:53:40,C57_reward_training,2022-07-20_10h48m_Subject 4.txt
165,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-24_10h17m_Subject 2.txt,07/24/22,07/24/22,2,C57_vs_CD1_Comparison,Cage2,4,10:17:01,11:22:34,C57_reward_training,2022-07-24_10h17m_Subject 2.txt
166,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-22_10h11m_Subject 1.txt,07/22/22,07/22/22,1,C57_vs_CD1_comparison,Cage2,1,10:11:03,11:14:37,C57_reward_training,2022-07-22_10h11m_Subject 1.txt
167,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-20_10h48m_Subject 3.txt,07/20/22,07/20/22,3,C57_vs_CD1_Comparison,Cage3,1,10:48:32,11:53:40,C57_reward_training,2022-07-20_10h48m_Subject 3.txt


- Getting the numbers out of the column that contains the cage information

In [18]:
metadata_df["cage"] = metadata_df["Group"].apply(lambda x: x.lower().strip("cage").strip())

In [19]:
metadata_df["cage"].head()

0    4
1    6
2    2
3    2
4    1
Name: cage, dtype: object

- Adding a new column that combines the cage and subject ID into a new ID

In [20]:
metadata_df["cage_and_subject"] = metadata_df.apply(lambda x: "cage_{}_subject_{}".format(x["cage"], x["Subject"]), axis=1)

In [21]:
metadata_df

Unnamed: 0,index,File,Start Date,End Date,Subject,Experiment,Group,Box,Start Time,End Time,MSN,file_name,cage,cage_and_subject
0,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-22_12h26m_Subject 1.txt,07/22/22,07/22/22,1,C57_vs_CD1_comparison,Cage4,1,12:26:57,13:36:44,CD1_reward_training,2022-07-22_12h26m_Subject 1.txt,4,cage_4_subject_1
1,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-23_14h55m_Subject 4.txt,07/23/22,07/23/22,4,C57_vs_CD1_Comparison,Cage 6,1,14:55:47,16:05:03,CD1_reward_training,2022-07-23_14h55m_Subject 4.txt,6,cage_6_subject_4
2,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-23_10h06m_Subject 2.txt,07/23/22,07/23/22,2,C57_vs_CD1_comparison,Cage2,3,10:06:22,11:09:48,CD1_reward_training,2022-07-23_10h06m_Subject 2.txt,2,cage_2_subject_2
3,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-22_10h11m_Subject 2.txt,07/22/22,07/22/22,2,C57_vs_CD1_comparison,Cage2,2,10:11:03,11:14:37,CD1_reward_training,2022-07-22_10h11m_Subject 2.txt,2,cage_2_subject_2
4,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-22_09h00m_Subject 3.txt,07/22/22,07/22/22,3,C57_vs_CD1_comparison,Cage1,3,09:00:32,10:05:02,CD1_reward_training,2022-07-22_09h00m_Subject 3.txt,1,cage_1_subject_3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
164,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-20_10h48m_Subject 4.txt,07/20/22,07/20/22,4,C57_vs_CD1_Comparison,Cage3,2,10:48:32,11:53:40,C57_reward_training,2022-07-20_10h48m_Subject 4.txt,3,cage_3_subject_4
165,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-24_10h17m_Subject 2.txt,07/24/22,07/24/22,2,C57_vs_CD1_Comparison,Cage2,4,10:17:01,11:22:34,C57_reward_training,2022-07-24_10h17m_Subject 2.txt,2,cage_2_subject_2
166,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-22_10h11m_Subject 1.txt,07/22/22,07/22/22,1,C57_vs_CD1_comparison,Cage2,1,10:11:03,11:14:37,C57_reward_training,2022-07-22_10h11m_Subject 1.txt,2,cage_2_subject_1
167,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-20_10h48m_Subject 3.txt,07/20/22,07/20/22,3,C57_vs_CD1_Comparison,Cage3,1,10:48:32,11:53:40,C57_reward_training,2022-07-20_10h48m_Subject 3.txt,3,cage_3_subject_3


- Making new columns so that we can use the Start date and End date in other steps

In [22]:
metadata_df["start_date_datetime"] = metadata_df["Start Date"].apply(lambda x: datetime.strptime(x, '%m/%d/%y'))
metadata_df["start_date_int"] = metadata_df["start_date_datetime"].apply(lambda x: int(x.strftime('%Y%m%d')))

metadata_df["end_date_datetime"] = metadata_df["End Date"].apply(lambda x: datetime.strptime(x, '%m/%d/%y'))
metadata_df["end_date_int"] = metadata_df["end_date_datetime"].apply(lambda x: int(x.strftime('%Y%m%d')))

In [23]:
metadata_df

Unnamed: 0,index,File,Start Date,End Date,Subject,Experiment,Group,Box,Start Time,End Time,MSN,file_name,cage,cage_and_subject,start_date_datetime,start_date_int,end_date_datetime,end_date_int
0,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-22_12h26m_Subject 1.txt,07/22/22,07/22/22,1,C57_vs_CD1_comparison,Cage4,1,12:26:57,13:36:44,CD1_reward_training,2022-07-22_12h26m_Subject 1.txt,4,cage_4_subject_1,2022-07-22,20220722,2022-07-22,20220722
1,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-23_14h55m_Subject 4.txt,07/23/22,07/23/22,4,C57_vs_CD1_Comparison,Cage 6,1,14:55:47,16:05:03,CD1_reward_training,2022-07-23_14h55m_Subject 4.txt,6,cage_6_subject_4,2022-07-23,20220723,2022-07-23,20220723
2,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-23_10h06m_Subject 2.txt,07/23/22,07/23/22,2,C57_vs_CD1_comparison,Cage2,3,10:06:22,11:09:48,CD1_reward_training,2022-07-23_10h06m_Subject 2.txt,2,cage_2_subject_2,2022-07-23,20220723,2022-07-23,20220723
3,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-22_10h11m_Subject 2.txt,07/22/22,07/22/22,2,C57_vs_CD1_comparison,Cage2,2,10:11:03,11:14:37,CD1_reward_training,2022-07-22_10h11m_Subject 2.txt,2,cage_2_subject_2,2022-07-22,20220722,2022-07-22,20220722
4,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-22_09h00m_Subject 3.txt,07/22/22,07/22/22,3,C57_vs_CD1_comparison,Cage1,3,09:00:32,10:05:02,CD1_reward_training,2022-07-22_09h00m_Subject 3.txt,1,cage_1_subject_3,2022-07-22,20220722,2022-07-22,20220722
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
164,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-20_10h48m_Subject 4.txt,07/20/22,07/20/22,4,C57_vs_CD1_Comparison,Cage3,2,10:48:32,11:53:40,C57_reward_training,2022-07-20_10h48m_Subject 4.txt,3,cage_3_subject_4,2022-07-20,20220720,2022-07-20,20220720
165,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-24_10h17m_Subject 2.txt,07/24/22,07/24/22,2,C57_vs_CD1_Comparison,Cage2,4,10:17:01,11:22:34,C57_reward_training,2022-07-24_10h17m_Subject 2.txt,2,cage_2_subject_2,2022-07-24,20220724,2022-07-24,20220724
166,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-22_10h11m_Subject 1.txt,07/22/22,07/22/22,1,C57_vs_CD1_comparison,Cage2,1,10:11:03,11:14:37,C57_reward_training,2022-07-22_10h11m_Subject 1.txt,2,cage_2_subject_1,2022-07-22,20220722,2022-07-22,20220722
167,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-20_10h48m_Subject 3.txt,07/20/22,07/20/22,3,C57_vs_CD1_Comparison,Cage3,1,10:48:32,11:53:40,C57_reward_training,2022-07-20_10h48m_Subject 3.txt,3,cage_3_subject_3,2022-07-20,20220720,2022-07-20,20220720


- Making a column that is the trial number for each subject

In [24]:
subject_to_metadata_df = {}
for subject in metadata_df["cage_and_subject"].unique():
    subject_df = metadata_df[metadata_df["cage_and_subject"] == subject]
    subject_df = subject_df.sort_values("file_name", ascending=True)
    subject_df.insert(0, 'trial_number', range(1, 1 + len(subject_df)))
    subject_to_metadata_df[subject] = subject_df

In [25]:
subject_to_metadata_df = collections.OrderedDict(sorted(subject_to_metadata_df.items()))

In [26]:
all_subject_to_metadata_df = [v for k,v in subject_to_metadata_df.items()]

In [27]:
metadata_df = pd.concat(all_subject_to_metadata_df).reset_index(drop=True)

In [28]:
metadata_df.head(n=25)

Unnamed: 0,trial_number,index,File,Start Date,End Date,Subject,Experiment,Group,Box,Start Time,End Time,MSN,file_name,cage,cage_and_subject,start_date_datetime,start_date_int,end_date_datetime,end_date_int
0,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-21_12h24m_Subject 0.txt,07/21/22,07/21/22,0,0,0,1,12:24:06,12:35:30,pumptest,2022-07-21_12h24m_Subject 0.txt,0,cage_0_subject_0,2022-07-21,20220721,2022-07-21,20220721
1,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-18_08h24m_Subject 1.txt,07/18/22,07/18/22,1,C57_vs_CD1_Comparison,Cage1,1,08:24:13,09:50:24,C57_reward_training,2022-07-18_08h24m_Subject 1.txt,1,cage_1_subject_1,2022-07-18,20220718,2022-07-18,20220718
2,2,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-18_16h39m_Subject 1.txt,07/18/22,07/19/22,1,C57_vs_CD1_Comparison,Cage 1,2,16:39:59,09:52:50,C57_reward_training,2022-07-18_16h39m_Subject 1.txt,1,cage_1_subject_1,2022-07-18,20220718,2022-07-19,20220719
3,3,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-20_08h09m_Subject 1.txt,07/20/22,07/20/22,1,C57_vs_CD1_comparison,Cage 1,3,08:09:22,09:27:22,C57_reward_training,2022-07-20_08h09m_Subject 1.txt,1,cage_1_subject_1,2022-07-20,20220720,2022-07-20,20220720
4,4,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-21_08h50m_Subject 1.txt,07/21/22,07/21/22,1,C57_vs_CD1_comparison,Cage 1,4,08:50:56,09:59:08,C57_reward_training,2022-07-21_08h50m_Subject 1.txt,1,cage_1_subject_1,2022-07-21,20220721,2022-07-21,20220721
5,5,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-22_09h00m_Subject 1.txt,07/22/22,07/22/22,1,C57_vs_CD1_comparison,Cage1,1,09:00:32,10:05:02,C57_reward_training,2022-07-22_09h00m_Subject 1.txt,1,cage_1_subject_1,2022-07-22,20220722,2022-07-22,20220722
6,6,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-23_08h43m_Subject 1.txt,07/23/22,07/23/22,1,C57_vs_CD1_Comparison,Cage1,2,08:43:28,09:57:37,CD1_reward_training,2022-07-23_08h43m_Subject 1.txt,1,cage_1_subject_1,2022-07-23,20220723,2022-07-23,20220723
7,7,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-24_08h28m_Subject 1.txt,07/24/22,07/24/22,1,C57_vs_CD1_Comparison,Cage1,3,08:28:38,10:13:15,C57_reward_training,2022-07-24_08h28m_Subject 1.txt,1,cage_1_subject_1,2022-07-24,20220724,2022-07-24,20220724
8,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-18_08h25m_Subject 2.txt,07/18/22,07/18/22,2,C57_vs_CD1_Comparison,Cage1,2,08:25:48,09:50:24,C57_reward_training,2022-07-18_08h25m_Subject 2.txt,1,cage_1_subject_2,2022-07-18,20220718,2022-07-18,20220718
9,2,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-18_16h39m_Subject 2.txt,07/18/22,07/19/22,2,C57_vs_CD1_Comparison,Cage 1,3,16:39:59,09:52:50,C57_reward_training,2022-07-18_16h39m_Subject 2.txt,1,cage_1_subject_2,2022-07-18,20220718,2022-07-19,20220719


In [29]:
metadata_df.tail(n=25)

Unnamed: 0,trial_number,index,File,Start Date,End Date,Subject,Experiment,Group,Box,Start Time,End Time,MSN,file_name,cage,cage_and_subject,start_date_datetime,start_date_int,end_date_datetime,end_date_int
144,4,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-21_14h53m_Subject 1.txt,07/21/22,07/21/22,1,C57_vs_CD1_Comparison,Cage 6,4,14:53:27,16:02:17,CD1_reward_training,2022-07-21_14h53m_Subject 1.txt,6,cage_6_subject_1,2022-07-21,20220721,2022-07-21,20220721
145,5,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-22_15h03m_Subject 1.txt,07/22/22,07/22/22,1,C57_vs_CD1_Comparison,Cage 6,1,15:03:41,16:06:51,CD1_reward_training,2022-07-22_15h03m_Subject 1.txt,6,cage_6_subject_1,2022-07-22,20220722,2022-07-22,20220722
146,6,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-23_14h55m_Subject 1.txt,07/23/22,07/23/22,1,C57_vs_CD1_Comparison,Cage 6,2,14:55:47,16:05:03,CD1_reward_training,2022-07-23_14h55m_Subject 1.txt,6,cage_6_subject_1,2022-07-23,20220723,2022-07-23,20220723
147,7,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-24_14h53m_Subject 1.txt,07/24/22,07/24/22,1,C57_vs_CD1_comparison,Cage 6,3,14:53:31,15:56:03,CD1_reward_training,2022-07-24_14h53m_Subject 1.txt,6,cage_6_subject_1,2022-07-24,20220724,2022-07-24,20220724
148,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-18_15h10m_Subject 2.txt,07/18/22,07/18/22,2,C57_vs_CD1_Comparison,Cage 6,2,15:10:30,16:20:28,CD1_reward_training,2022-07-18_15h10m_Subject 2.txt,6,cage_6_subject_2,2022-07-18,20220718,2022-07-18,20220718
149,2,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-20_12h10m_Subject 2.txt,07/20/22,07/20/22,2,C57_vs_CD1_comparison,Cage 6,3,12:10:37,13:16:30,CD1_reward_training,2022-07-20_12h10m_Subject 2.txt,6,cage_6_subject_2,2022-07-20,20220720,2022-07-20,20220720
150,3,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-20_16h01m_Subject 2.txt,07/20/22,07/20/22,2,C57_vs_CD1_Comparison,Cage6,4,16:01:17,17:06:16,CD1_reward_training,2022-07-20_16h01m_Subject 2.txt,6,cage_6_subject_2,2022-07-20,20220720,2022-07-20,20220720
151,4,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-21_14h53m_Subject 2.txt,07/21/22,07/21/22,2,C57_vs_CD1_Comparison,Cage 6,1,14:53:27,16:02:17,CD1_reward_training,2022-07-21_14h53m_Subject 2.txt,6,cage_6_subject_2,2022-07-21,20220721,2022-07-21,20220721
152,5,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-22_15h03m_Subject 2.txt,07/22/22,07/22/22,2,C57_vs_CD1_Comparison,Cage 6,2,15:03:41,16:06:51,CD1_reward_training,2022-07-22_15h03m_Subject 2.txt,6,cage_6_subject_2,2022-07-22,20220722,2022-07-22,20220722
153,6,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-23_14h55m_Subject 2.txt,07/23/22,07/23/22,2,C57_vs_CD1_Comparison,Cage 6,3,14:55:47,16:05:03,CD1_reward_training,2022-07-23_14h55m_Subject 2.txt,6,cage_6_subject_2,2022-07-23,20220723,2022-07-23,20220723


- Getting the number of files that are associated with each subject

In [30]:
# How many files there are for each subject
metadata_df.groupby("cage_and_subject").count()

Unnamed: 0_level_0,trial_number,index,File,Start Date,End Date,Subject,Experiment,Group,Box,Start Time,End Time,MSN,file_name,cage,start_date_datetime,start_date_int,end_date_datetime,end_date_int
cage_and_subject,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
cage_0_subject_0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
cage_1_subject_1,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7
cage_1_subject_2,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7
cage_1_subject_3,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7
cage_1_subject_4,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7
cage_2_subject_1,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7
cage_2_subject_2,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7
cage_2_subject_3,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7
cage_2_subject_4,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7
cage_3_subject_1,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7


## Looking over the MED-PC scripts

- MED-PC has scripts(lines of code that act as the instructions for how to operate the MED-PC boxes) that contain the descriptions of what each value in the recordings are. We will extract the descriptions from the MED-PC scripts so that we can label the data points in the MED-PC recordings  
- Below is a list of all the MED-PC Scripts. We will only be looking at the first one    

In [31]:
all_medpc_scripts = glob.glob("./data/**/*.MPC")

In [32]:
all_medpc_scripts

['./data/textfiles/C57_reward_competition.MPC',
 './data/textfiles/CD1_reward_training.MPC',
 './data/textfiles/CD1_reward_competition.MPC',
 './data/textfiles/C57_reward_training.MPC']

- Example of what the MED-PC Script looks like that was ran when recording the behaviors

In [33]:
with open(all_medpc_scripts[0]) as f:
    lines = f.readlines()
    for line in lines[:100]:
        print(line)

\v3 stop tone with poke

\v3.2 monitor port entries AND exits



\INPUTS

^port = 8



\OUTPUTS

^fan = 16

^houselight = 11

^tone1 = 2

^tone2 = 3

^tone3 = 4

^tone4 = 5

^pump = 9

^whitenoise = 1

^csout = 5

^peout = 15

^cs1out = 17

^cs2out = 13

^cs3out = 14



\EXP SETTINGS

^ncsNoShock = 0

^initCS1trials = 3



\ARRAYS

DIM P = 20000 \Port entry time stamp array

DIM Q = 2500 \US delivery time stamp array (absolute)

DIM R = 2500 \US time stamp array (relative to last CS)

DIM W = 2500 \ITI values used for CS

DIM S = 2500 \CS presentation values (absolute - every time light turns on)

DIM N = 20000 \Port exit time stamp array

DIM K = 2500 \CS type

DIM G = 2500 \controlled_stimulus_seconds computer clock time (seconds on clock every time light turns on)

DIM H = 2500 \controlled_stimulus_minutes computer clock time (minutes on clock every time light turns on)

DIM I = 2500 \controlled_stimulus_hours computer clock time (hours every time light turns on)

DIM B = 2500 \port

- We will be using the comments in the MED-PC script(Everything after the `\` for each line) to create a name for the variables. By default, MED-PC uses a single letter as the name of the variable(programming object that holds some information).
    - This will use the medpc2excel library found in https://github.com/cyf203/medpc2excel
- Example of the variable names and their comments in the MED-PC script that we will get the descriptive names from:
    - MED-PC will squish the first few words to create the name for every variable

```
DIM P = 20000 \Port entry time stamp array

DIM Q = 2500 \US delivery time stamp array (absolute)

DIM R = 2500 \US time stamp array (relative to last CS)

DIM W = 2500 \ITI values used for CS

DIM S = 2500 \CS presentation values (absolute - every time light turns on)

DIM N = 20000 \Port exit time stamp array

DIM K = 2500 \CS type

DIM B = 2500 \shock intensity
```

- In the MED-PC recording files, there are values that are labelled with letters. These data type of these letters is described in the MED-PC script file that we just looked at.

In [34]:
with open(all_med_pc_file[0]) as f:
    lines = f.readlines()
    for line in lines[:20]:
        print(line)

File: C:\MED-PC\Data\2022-07-22_12h26m_Subject 1.txt







Start Date: 07/22/22

End Date: 07/22/22

Subject: 1

Experiment: C57_vs_CD1_comparison 

Group: Cage4

Box: 1

Start Time: 12:26:57

End Time: 13:36:44

MSN: CD1_reward_training

A:    4399.000

D:    9000.000

E:       0.000

L:       0.000

M:       0.000

O:       0.000

T:    3660.000



## **NOTE: Please make sure that the corresponding `.mpc` file (aka the MED-PC script) that was ran to create the log file, is also in the same folder as the recording files. This notebook will fail to extract the data from the recording file if it is missing**

- Spreadsheet of the data from the MED-PC recording file for all the files combined into one.
- **NOTE: Each row does not correspond to the same data point. Each row represents the "n"-th data point for each category for each file. AKA, the first row is the first data point for the time the subject entered the port and the first data point for the time that the tone that is played, two seperate things. The second row is the second data point for all the categories, the third row is the third data point and so on. This repeats for all data points in a given file, then it starts over to the next file where the first row of that file is the first for all its data points.**

In [35]:
concatted_medpc_df = extract.dataframe.get_medpc_dataframe_from_list_of_files(medpc_files=all_med_pc_file)

Traceback (most recent call last):
  File "/root/projects/behavioral_dataframe_processing/results/2022_07_20_repeated_id_fix/../../src/extract/dataframe.py", line 71, in get_medpc_dataframe_from_list_of_files
    ts_df, medpc_log = medpc_read(file=file_path, override=True, replace=False)
  File "/root/projects/behavioral_dataframe_processing/behavioral_processing_env/lib/python3.9/site-packages/medpc2excel/medpc_read.py", line 134, in medpc_read
    for var, nm in TS_var_name_maps[program_nm].items():
KeyError: 'pumptest'

Invalid Formatting for file: ./data/pilot_c57_vs_cd1/reward_training/med_pc_text_files/2022-07-21_12h24m_Subject 0.txt


In [36]:
concatted_medpc_df.head()

Unnamed: 0,(P)Portentry,(Q)USdelivery,(R)UStime,(W)ITIvalues,(S)CSpresentation,(N)Portexit,(K)CStype,(G)controlled_stimulus_secondscomputer,(H)controlled_stimulus_minutescomputer,(I)controlled_stimulus_hourscomputer,(B)port_entry_secondscomputer,(F)port_entry_minutescomputer,(J)port_entry_hourscomputer,date,subject,file_path
0,6.22,64.0,399.0,0.0,60.01,6.26,1.0,44.0,36.0,12.0,50.0,35.0,12.0,20220722,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...
1,6.28,144.0,399.0,0.0,140.01,6.36,1.0,4.0,38.0,12.0,50.0,35.0,12.0,20220722,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...
2,6.38,234.0,399.0,0.0,230.01,6.41,1.0,34.0,39.0,12.0,50.0,35.0,12.0,20220722,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...
3,6.44,314.0,399.0,0.0,310.01,6.52,1.0,54.0,40.0,12.0,51.0,35.0,12.0,20220722,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...
4,6.64,389.0,399.0,0.0,385.01,6.69,1.0,9.0,42.0,12.0,51.0,35.0,12.0,20220722,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...


In [37]:
concatted_medpc_df.tail()

Unnamed: 0,(P)Portentry,(Q)USdelivery,(R)UStime,(W)ITIvalues,(S)CSpresentation,(N)Portexit,(K)CStype,(G)controlled_stimulus_secondscomputer,(H)controlled_stimulus_minutescomputer,(I)controlled_stimulus_hourscomputer,(B)port_entry_secondscomputer,(F)port_entry_minutescomputer,(J)port_entry_hourscomputer,date,subject,file_path
2805,3651.72,,,,,3651.74,,,,,,,,20220724,3,./data/pilot_c57_vs_cd1/reward_training/med_pc...
2806,3651.81,,,,,3651.88,,,,,,,,20220724,3,./data/pilot_c57_vs_cd1/reward_training/med_pc...
2807,3655.18,,,,,3655.24,,,,,,,,20220724,3,./data/pilot_c57_vs_cd1/reward_training/med_pc...
2808,3655.31,,,,,3655.36,,,,,,,,20220724,3,./data/pilot_c57_vs_cd1/reward_training/med_pc...
2809,3655.42,,,,,3655.46,,,,,,,,20220724,3,./data/pilot_c57_vs_cd1/reward_training/med_pc...


- Combining the recording and the metadata into one dataframe

In [38]:
recording_and_metadata_df = metadata_df.merge(concatted_medpc_df, right_on='file_path', left_on='index')

In [39]:
recording_and_metadata_df

Unnamed: 0,trial_number,index,File,Start Date,End Date,Subject,Experiment,Group,Box,Start Time,...,(K)CStype,(G)controlled_stimulus_secondscomputer,(H)controlled_stimulus_minutescomputer,(I)controlled_stimulus_hourscomputer,(B)port_entry_secondscomputer,(F)port_entry_minutescomputer,(J)port_entry_hourscomputer,date,subject,file_path
0,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-18_08h24m_Subject 1.txt,07/18/22,07/18/22,1,C57_vs_CD1_Comparison,Cage1,1,08:24:13,...,1.0,24.0,50.0,8.0,47.0,50.0,8.0,20220718,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...
1,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-18_08h24m_Subject 1.txt,07/18/22,07/18/22,1,C57_vs_CD1_Comparison,Cage1,1,08:24:13,...,1.0,44.0,51.0,8.0,48.0,50.0,8.0,20220718,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...
2,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-18_08h24m_Subject 1.txt,07/18/22,07/18/22,1,C57_vs_CD1_Comparison,Cage1,1,08:24:13,...,1.0,14.0,53.0,8.0,11.0,53.0,8.0,20220718,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...
3,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-18_08h24m_Subject 1.txt,07/18/22,07/18/22,1,C57_vs_CD1_Comparison,Cage1,1,08:24:13,...,1.0,34.0,54.0,8.0,26.0,57.0,8.0,20220718,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...
4,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-18_08h24m_Subject 1.txt,07/18/22,07/18/22,1,C57_vs_CD1_Comparison,Cage1,1,08:24:13,...,1.0,49.0,55.0,8.0,27.0,57.0,8.0,20220718,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
426071,7,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-24_14h53m_Subject 4.txt,07/24/22,07/24/22,4,C57_vs_CD1_comparison,Cage 6,2,14:53:31,...,,,,,0.0,,,20220724,4,./data/pilot_c57_vs_cd1/reward_training/med_pc...
426072,7,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-24_14h53m_Subject 4.txt,07/24/22,07/24/22,4,C57_vs_CD1_comparison,Cage 6,2,14:53:31,...,,,,,0.0,,,20220724,4,./data/pilot_c57_vs_cd1/reward_training/med_pc...
426073,7,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-24_14h53m_Subject 4.txt,07/24/22,07/24/22,4,C57_vs_CD1_comparison,Cage 6,2,14:53:31,...,,,,,40.0,,,20220724,4,./data/pilot_c57_vs_cd1/reward_training/med_pc...
426074,7,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-24_14h53m_Subject 4.txt,07/24/22,07/24/22,4,C57_vs_CD1_comparison,Cage 6,2,14:53:31,...,,,,,0.0,,,20220724,4,./data/pilot_c57_vs_cd1/reward_training/med_pc...


In [40]:
recording_and_metadata_df

Unnamed: 0,trial_number,index,File,Start Date,End Date,Subject,Experiment,Group,Box,Start Time,...,(K)CStype,(G)controlled_stimulus_secondscomputer,(H)controlled_stimulus_minutescomputer,(I)controlled_stimulus_hourscomputer,(B)port_entry_secondscomputer,(F)port_entry_minutescomputer,(J)port_entry_hourscomputer,date,subject,file_path
0,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-18_08h24m_Subject 1.txt,07/18/22,07/18/22,1,C57_vs_CD1_Comparison,Cage1,1,08:24:13,...,1.0,24.0,50.0,8.0,47.0,50.0,8.0,20220718,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...
1,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-18_08h24m_Subject 1.txt,07/18/22,07/18/22,1,C57_vs_CD1_Comparison,Cage1,1,08:24:13,...,1.0,44.0,51.0,8.0,48.0,50.0,8.0,20220718,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...
2,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-18_08h24m_Subject 1.txt,07/18/22,07/18/22,1,C57_vs_CD1_Comparison,Cage1,1,08:24:13,...,1.0,14.0,53.0,8.0,11.0,53.0,8.0,20220718,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...
3,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-18_08h24m_Subject 1.txt,07/18/22,07/18/22,1,C57_vs_CD1_Comparison,Cage1,1,08:24:13,...,1.0,34.0,54.0,8.0,26.0,57.0,8.0,20220718,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...
4,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-18_08h24m_Subject 1.txt,07/18/22,07/18/22,1,C57_vs_CD1_Comparison,Cage1,1,08:24:13,...,1.0,49.0,55.0,8.0,27.0,57.0,8.0,20220718,1,./data/pilot_c57_vs_cd1/reward_training/med_pc...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
426071,7,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-24_14h53m_Subject 4.txt,07/24/22,07/24/22,4,C57_vs_CD1_comparison,Cage 6,2,14:53:31,...,,,,,0.0,,,20220724,4,./data/pilot_c57_vs_cd1/reward_training/med_pc...
426072,7,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-24_14h53m_Subject 4.txt,07/24/22,07/24/22,4,C57_vs_CD1_comparison,Cage 6,2,14:53:31,...,,,,,0.0,,,20220724,4,./data/pilot_c57_vs_cd1/reward_training/med_pc...
426073,7,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-24_14h53m_Subject 4.txt,07/24/22,07/24/22,4,C57_vs_CD1_comparison,Cage 6,2,14:53:31,...,,,,,40.0,,,20220724,4,./data/pilot_c57_vs_cd1/reward_training/med_pc...
426074,7,./data/pilot_c57_vs_cd1/reward_training/med_pc...,C:\MED-PC\Data\2022-07-24_14h53m_Subject 4.txt,07/24/22,07/24/22,4,C57_vs_CD1_comparison,Cage 6,2,14:53:31,...,,,,,0.0,,,20220724,4,./data/pilot_c57_vs_cd1/reward_training/med_pc...


## Getting the cage numbers and the dates so that we can include it in the file name

- We will get the metadata from the recording files and use it to name the files we will create. This will help organize the files and make it easy to know where each file came from 

- Getting the group numbers(original cage names)

In [41]:
# removing blank spaces
group_numbers = ["_".join(number.split()) for number in recording_and_metadata_df["Group"].unique() if number]
# sorting numbers
group_numbers = sorted(group_numbers)
group_numbers_for_title = "_".join(group_numbers)

In [42]:
group_numbers_for_title

'Cage1_Cage2_Cage3_Cage4_Cage5_Cage6_Cage_1_Cage_2_Cage_4_Cage_5_Cage_6_cage5'

- Getting the cage numbers

In [43]:
# removing blank spaces
cage_numbers = ["_".join(number.split()) for number in recording_and_metadata_df["cage"].unique() if number]
# sorting numbers
cage_numbers = sorted(cage_numbers)
cage_numbers_for_title = "_".join(cage_numbers)

In [44]:
cage_numbers_for_title

'1_2_3_4_5_6'

- Getting the dates

In [45]:
# Getting the first and last recording date to get a range
earliest_date = recording_and_metadata_df["end_date_int"].min()
latest_date = recording_and_metadata_df["end_date_int"].max()

In [46]:
earliest_date

20220718

In [47]:
latest_date

20220724

- Getting the subject names

In [48]:
recording_and_metadata_df["Subject"].unique()

array(['1', '2', '3', '4'], dtype=object)

In [49]:
# removing blank spaces
subject_numbers = ["subject-" + number for number in recording_and_metadata_df["Subject"].unique() if number]
# sorting numbers
subject_numbers = sorted(subject_numbers)
subject_numbers_for_title = "_".join(subject_numbers)

In [50]:
subject_numbers_for_title

'subject-1_subject-2_subject-3_subject-4'

- Getting the experiment name

In [51]:
experiment_names = ["_".join(name.upper().split()) for name in recording_and_metadata_df["Experiment"].unique() if name]
# sorting experiment_names
experiment_names = list(set(sorted(experiment_names)))
experiment_names_for_title = "AND".join(experiment_names)

In [52]:
experiment_names_for_title

'C57_VS_CD1_COMPARISOMANDC57_VS_CD1_COMPARISON'

- Getting the box numbers

In [53]:
# removing blank spaces
box_numbers = ["_".join(number.split()) for number in recording_and_metadata_df["Box"].unique() if number]
# sorting numbers
box_numbers = sorted(box_numbers)
box_numbers_for_title = "_".join(box_numbers)

In [54]:
box_numbers_for_title

'1_2_3_4'

- Getting the script names

In [55]:
# removing blank spaces
script_names = ["_".join(name.split()) for name in recording_and_metadata_df["MSN"].unique() if name]
# sorting names
script_names = sorted(script_names)
script_names_for_title = "_".join(script_names)

In [56]:
script_names_for_title

'C57_reward_training_CD1_reward_training'

## Saving the dataframes(spreadsheets to files) with the metadata as part of the name

- Making necessary directories
    - If you want to use any of the other metadata as part of the name, you will have to swap out the variables in the `format()` and change the name of the folder to match your new name. The variable names are the word that's in front of the `=` at the last line of each cell. The `{}` are where the metadata variables will be inserted into the file name. For more information on formatting strings: https://www.w3schools.com/python/ref_string_format.asp
    - You can also just manually rename the files by replacing everything in `""` and removing the `.format()` part
- **NOTE: You may get an error that the file does not exist. If this is the case, it could be the file name is too long(an issue that may happen when using Jupyter Notebooks on Windows)**

In [57]:
output_directory = "./proc/extracted_recording_data_and_metadata/experiment_{}_cage_{}_date_{}_{}".format(experiment_names_for_title, cage_numbers_for_title, earliest_date, latest_date)

In [58]:
output_directory

'./proc/extracted_recording_data_and_metadata/experiment_C57_VS_CD1_COMPARISOMANDC57_VS_CD1_COMPARISON_cage_1_2_3_4_5_6_date_20220718_20220724'

In [59]:
os.makedirs(output_directory, exist_ok=True)

In [60]:
metadata_df.to_csv(os.path.join(output_directory, "metadata_cage_{}_date_{}_{}.csv".format(cage_numbers_for_title, earliest_date, latest_date)))
# metadata_df.to_excel(os.path.join(output_directory, "metadata_cage_{}_date_{}_{}.xlsx".format(cage_numbers_for_title, earliest_date, latest_date)))

In [61]:
concatted_medpc_df.to_csv(os.path.join(output_directory, "MEDPC_recording_cage_{}_date_{}_{}.csv".format(cage_numbers_for_title, earliest_date, latest_date)))
# concatted_medpc_df.to_excel(os.path.join(output_directory, "MEDPC_recording_cage_{}_date_{}_{}.xlsx".format(cage_numbers_for_title, earliest_date, latest_date)))

In [62]:
recording_and_metadata_df.to_csv(os.path.join(output_directory, "recording_metadata_cage_{}_date_{}_{}.csv".format(cage_numbers_for_title, earliest_date, latest_date)))
# recording_and_metadata_df.to_excel(os.path.join(output_directory, "recording_and_metadata_cage_{}_date_{}_{}.xlsx".format(cage_numbers_for_title, earliest_date, latest_date)))