# notebook n01b: Transform raw data from DeepLabCut > euclidean distance and Speed 

Jose Oliveira da Cruz, PhD  | LeDoux Lab  
jose.cruz@nyu.edu  

<img src="https://misophoniainternational.com/wp-content/uploads/2016/04/LedouxLab.jpg" style="width: 200.464px; height: 200px; margin: 0px;">

This notebooks takes raw datafiles (coord x, y) from deeplabcut and tranform the data to:

A) Prepare first ``_individual_preprocessing_dlc.csv`` file with

1. Load deeplabcut data
2. Fetch information from each individual animal
3. Calculate euclidean distance (cm) between each set of coordinates (i.e. bodypart)
    - Data interpolation to correct for variable frame aquisition.
4. Calculate the speed (cm/sec) for each point calculated in 1.
5. Save the data

B) Visualization of euclidean distance / speed

In [None]:
# Import dependencies
#import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.signal import savgol_filter, correlate
import os
import sys


# Import my code
sys.path.append(r'D:\GoogleDrive\work\postdoc_nyu\scientific_projects\individual_differences\src')
from tools.utils.organization import *
from analysis.motion_analysis import *
from analysis.freezing_analysis import *
from visualization import *
from visualization.plot_events import *


## Specify where to save the data for each step
### Step A: directory `individual_preprocessing_dlc`

In [None]:
step_a_save_dir = r'D:\GoogleDrive\work\postdoc_nyu\scientific_projects\individual_differences\data\interim\EXP004\individual_preprocessing_dlc'

print(f'Does the directory exist? \n a: {os.path.isdir(step_a_save_dir)}') 

# Step A: ``_individual_preprocessing_dlc.csv``

## 1) Load **individual** raw data from deeplabcut

In [None]:
# Open dataframe

dpath = r'D:\GoogleDrive\work\postdoc_nyu\scientific_projects\individual_differences\data\interim\EXP004\deeplabcut_pose_extraction'
fpath = 'JC_EXP004_20200110_TES01_R_286600_T00DLC_resnet50_threat_conditioningMay18shuffle1_300000.h5'

print(f'File exists?\n- {os.path.isfile(os.path.join(dpath, fpath))}')

In [None]:
# Load data
df = pd.read_hdf(
    os.path.join(dpath, fpath), 
    header=[1, 2],
)
#Inspect dataframe
df.head(5)

## 2) Fetch information about a specific rat

The code bellow will read the video key( e.g `JC_EXP005_20200124_TES01_R_287073_T00`) and search for the complete information about this specific animal.  
Then it creates an instance of the Animal class with the complete information of the animal.

### 2.1) Load Global Animal Record and Experiment information

In [None]:
# Where is the main record?
main_record_directory = r'D:\GoogleDrive\work\postdoc_nyu\scientific_projects\individual_differences\animal_record\main_record'
main_record_basename = 'main_record_20200325_151527.csv'

main_record = os.path.join(main_record_directory,
                           main_record_basename)


# Where is the information about the experiment?
experiment_info_directory = r'D:\GoogleDrive\work\postdoc_nyu\scientific_projects\individual_differences\data\interim\EXP004\bonsai_extraction_led_epochs_frame_rate'
experiment_info_basename = 'JC_EXP004_20200110_TES01_cs_index_plus_frame_rate.csv'
experiment_info = os.path.join(experiment_info_directory,
                               experiment_info_basename)

### 2.2) Fetch animal information

In [None]:
# Search object with regular expressions
pattern = r'(\w\w_\w\w\w\d\d\d_\d\d\d\d\d\d\d\d_\w\w\w\d\d_\w_\d\d\d\d\d\d_\w\d\d)'
video_key = re.search(pattern, fpath).group()

# Fetch information
rat = fetch_animal_info(
    video_key, 
    main_record, 
    experiment_info,
)

## 3) Generate a dataframe with the euclidean distance for each bodypart
**[May take a while]**


The first step is to create a dataframe with the information fetched in 2) and calculate the euclidean distance based on the raw data points provided by the deeplabcut.
This step will also correct the frame rate aquisition to 30 fps using a numpy interpolation function.

In [None]:
dataframe, ed_dict = calculate_euclidean_distance_dataframe(df, rat)
dataframe.head()

## 4) Calculate the speed for each body part

Built on the results from the step 3), this section calculates the body speed for each bodypart.

In [None]:
# Extract the bodyparts
idx = pd.IndexSlice

bodyparts_list = list({bodypart for scorer, bodypart, coord in df.columns})

dataframe = calculate_speed_dataframe(dataframe, bodyparts_list, frame_rate=30)
dataframe.head(3)

## 5) Save the data as 

In [None]:
saving_basename = f'{rat.video_basename}_individual_preprocessing_dlc.csv'.lower()

dataframe.to_csv(os.path.join(step_a_save_dir, saving_basename))

# Step B: Visualization of euclidean distance and speed

Quick visualization of cumulative euclidean distance and speed. Currently allows the selection of all the bodyparts  labelled by dlc, all the cs_id and pre/peri/post_cs epochs

In [None]:
bodypart = 'head'
cs = 'cs_03'
epoch = 'peri_cs'

# Extract the arrays to be used for plotting
speed, distance = extract_speed_distance_from_dataframe(dataframe, bodypart, cs, epoch)

In [None]:
#Plot pre_cs
fig, ax = plt.subplots()

plot_distance_speed(
    ax,
    speed,
    distance,
    'speed (cm/sec)',
    'distance (cumsum, cm)',
    epoch,
    cs,
    (0, 300),
)
plt.show()