RAINSTORM

## Prepare positions

Welcome!

This is the first notebook of the Rainstorm project. Here you'll find the initial steps to prepare the data for analysis. The pose estimation data obtained with DeepLabCut is stored in HDF files, those will be our starting point.

- This notebook will take HDF files of mouse tracking data (obtained with DeepLabCut) and prepare the position files to be analyzed.
- It filters out low likelihood positions, interpolates and smoothens the data.
- The positions are also scaled from pixels to centimeters for better generalization.

#### Requirements:
- A folder with HDF files (from DeepLabCut) containing:
    - The position of the desired bodyparts (and objects) on the video.
    - The position filenames should contain the trial name, it will be used to organize the files into subfolders.

- If you dont have your position files with you, don't worry!
    - You can demo the pipeline by working with the example data provided in the downloaded repository. It contains pose estimation files from mice on a Novel Object Recognition (NOR) task.


---
#### Load the necessary modules

In [1]:
import os
from glob import glob
import rainstorm.prepare_positions as rst

---
#### 1. State your project path & thresholds
`base` : The path to the downloaded repository. If you are using a Windows path with backslashes, place an ‘r’ in front of the directory path to avoid an error (e.g. r'C:\Users\dhers\Rainstorm').

`trials` : If your experiment has multiple trials, specify the trial names here.

`confidence` : State how many std_dev away from the mean the points can be without being erased (it is similar to asking "how good is your tracking?").

`tolerance` : If the mean likelihood is below this value, the whole point will be erased (because it is probably not there).

`median_filter` : State how many frames to use for the median filter. It must be an odd number.

`bodyparts` : List the bodyparts that you tracked with DLC.

`objects` : List the stationary objects that you tracked with DLC.

`measured_points` : Measure the distance between two bodyparts to scale the video from pixels to cm. I use the left and right ears.

`measured_dist` : State the distance between the measured points. Distance between the ears is 1.8 cm in my c57 mice.

`video_fps` : State the frames per second of the videos.

In [2]:
# State your path:
base = r'C:\Users\dhers\Desktop\Rainstorm' # For the downloaded repository
folder_path = os.path.join(base, r'docs\examples\NOR_example')

trials  = ["Hab", "TR", "TS"]

confidence = 2
tolerance = 0.8
median_filter = 3 

bodyparts = ['nose', 'L_ear', 'R_ear', 'head', 'neck', 'body', 'tail_1', 'tail_2', 'tail_3']
objects = ['obj_1', 'obj_2']

measured_points = ['L_ear', 'R_ear']
measured_dist = 1.8 # in cm

video_fps = 25

---
#### 2. Rename the files to end in '_position.h5'
To ease the analysis, we should start by editing the filenames. We are looking for the following:
- Position files must end with the word '_position'.
- Since we use the data from DeepLabCut, the filenames end with something like 'DLC_{Network_used + name + date + snapshot}.h5'.
- (Optional) If the files belong to different trials of an experiment, they should contain the name of the trial in the filename.

We can find an easy way to rename files below.

In [3]:
# Lets first make a copy of the example position_files (so that we have a backup in case things go south)
rst.backup_folder(folder_path)

# Change the file name as needed
before = 'DLC_resnet50_shuffle2_200000.h5'
after = '_position.h5'

rst.rename_files(folder_path, before, after)

Copied folder to 'C:\Users\dhers\Desktop\Rainstorm\docs\examples\NOR_example_backup'.
Renamed: C:\Users\dhers\Desktop\Rainstorm\docs\examples\NOR_example\NOR_Hab_C1_ADLC_resnet50_shuffle2_200000.h5 to C:\Users\dhers\Desktop\Rainstorm\docs\examples\NOR_example\NOR_Hab_C1_A_position.h5
Renamed: C:\Users\dhers\Desktop\Rainstorm\docs\examples\NOR_example\NOR_Hab_C1_BDLC_resnet50_shuffle2_200000.h5 to C:\Users\dhers\Desktop\Rainstorm\docs\examples\NOR_example\NOR_Hab_C1_B_position.h5
Renamed: C:\Users\dhers\Desktop\Rainstorm\docs\examples\NOR_example\NOR_Hab_C1_CDLC_resnet50_shuffle2_200000.h5 to C:\Users\dhers\Desktop\Rainstorm\docs\examples\NOR_example\NOR_Hab_C1_C_position.h5
Renamed: C:\Users\dhers\Desktop\Rainstorm\docs\examples\NOR_example\NOR_Hab_C1_DDLC_resnet50_shuffle2_200000.h5 to C:\Users\dhers\Desktop\Rainstorm\docs\examples\NOR_example\NOR_Hab_C1_D_position.h5
Renamed: C:\Users\dhers\Desktop\Rainstorm\docs\examples\NOR_example\NOR_Hab_C2_ADLC_resnet50_shuffle2_200000.h5 to C:\

---
#### 3. Open an example file and see what is inside

In [5]:
all_h5_files = glob(os.path.join(folder_path,"*position.h5"))

# Choose a random example file to plot:
example_path = rst.choose_example(all_h5_files, filter_word = 'TS') # You can use the 'filter_by' variable to specify the file you want to use (e.g. 'TS_C1_A').

Plotting coordinates from NOR_TS_C5_C_position.h5


In [6]:
# Open the example file:
df_raw = rst.open_h5_file(example_path, print_data=True, num_sd=confidence)

Positions obtained by model: DLC_resnet50_SauronSep30shuffle1_200000
Points in df: ['L_ear', 'R_ear', 'body', 'head', 'neck', 'nose', 'obj_1', 'obj_2', 'tail_1', 'tail_2', 'tail_3']
L_ear 	 median: 1.00 	 mean: 0.98 	 std_dev: 0.14 	 tolerance: 0.69
R_ear 	 median: 1.00 	 mean: 0.98 	 std_dev: 0.14 	 tolerance: 0.70
body 	 median: 1.00 	 mean: 0.98 	 std_dev: 0.14 	 tolerance: 0.70
head 	 median: 1.00 	 mean: 0.98 	 std_dev: 0.14 	 tolerance: 0.70
neck 	 median: 1.00 	 mean: 0.98 	 std_dev: 0.14 	 tolerance: 0.70
nose 	 median: 1.00 	 mean: 0.96 	 std_dev: 0.17 	 tolerance: 0.63
obj_1 	 median: 1.00 	 mean: 0.97 	 std_dev: 0.15 	 tolerance: 0.67
obj_2 	 median: 1.00 	 mean: 1.00 	 std_dev: 0.01 	 tolerance: 0.99
tail_1 	 median: 1.00 	 mean: 0.98 	 std_dev: 0.14 	 tolerance: 0.70
tail_2 	 median: 1.00 	 mean: 0.98 	 std_dev: 0.13 	 tolerance: 0.71
tail_3 	 median: 1.00 	 mean: 0.97 	 std_dev: 0.14 	 tolerance: 0.69


Notice that, if the DLC model is working properly, the median likelihood of an existing point is very close to 1. However, some points have lower mean likelihoods and higher standard deviations. This is because those points are harder to find (e.g. the nose tends to disappear during grooming). We will adjust our tolerance for each point, and erase only the positions that are below it.

---
#### 4. Now that we have our file, lets test our processing parameters in an example video

In [7]:
df_smooth = rst.filter_and_smooth_df(df_raw, bodyparts, objects, med_filt_window = median_filter, drop_below = tolerance, num_sd = confidence)

In [8]:
rst.plot_raw_vs_smoothed(df_raw, df_smooth, bodypart = 'nose', num_sd = confidence)

---
#### 5. (Optional, but recommended) If we want to scale our data from pixels to cm, we can use a measured distance between two bodyparts (e.g. the distance between both ears).

In [9]:
# plot scale
scale = rst.find_scale_factor(df_smooth, measured_dist, measured_points, print_results=True)

median distance is 35.331677926915084, mean distance is 35.11362401770559. Scale factor is 0.0509 (1 cm = 19.63 px).


---
#### 6. Now that we know what we are doing, we can apply all previous steps to all the files in our folder and store the results into csv files (lets face it, they are less scary).

In [10]:
# Process every file in the folder
rst.process_position_files(all_h5_files, bodyparts, objects, measured_dist, measured_points, scale = True, fps = video_fps, med_filt_window = median_filter, drop_below = tolerance, num_sd = confidence)

NOR_Hab_C1_A_position.h5 has 18 columns. The mouse took 2.32 sec to enter. Scale factor is 0.0524 (1 cm = 19.09 px).
NOR_Hab_C1_B_position.h5 has 18 columns. The mouse took 3.36 sec to enter. Scale factor is 0.0528 (1 cm = 18.95 px).
NOR_Hab_C1_C_position.h5 has 18 columns. The mouse took 2.24 sec to enter. Scale factor is 0.0515 (1 cm = 19.42 px).
NOR_Hab_C1_D_position.h5 has 18 columns. The mouse took 4.32 sec to enter. Scale factor is 0.0511 (1 cm = 19.58 px).
NOR_Hab_C2_A_position.h5 has 18 columns. The mouse took 2.12 sec to enter. Scale factor is 0.0500 (1 cm = 19.99 px).
NOR_Hab_C2_B_position.h5 has 18 columns. The mouse took 1.16 sec to enter. Scale factor is 0.0529 (1 cm = 18.91 px).
NOR_Hab_C2_C_position.h5 has 18 columns. The mouse took 7.52 sec to enter. Scale factor is 0.0521 (1 cm = 19.19 px).
NOR_Hab_C2_D_position.h5 has 18 columns. The mouse took 9.12 sec to enter. Scale factor is 0.0499 (1 cm = 20.03 px).
NOR_Hab_C3_A_position.h5 has 18 columns. The mouse took 1.60 sec

---
#### 7. Finally, we can organize the files into subfolders corresponding to different trials of the experiment.

In [11]:
# Clean the folder
rst.filter_and_move_files(folder_path, trials)

Files filtered and moved successfully.
All .H5 files are stored away


---
---
#### Our experiment folder now has subfolders according to the number of trials, each containing csv files with mice position.
We can move on to the next notebook, 2-Geometric_analysis.ipynb

---
RAINSTORM - Created on Aug 27, 2023 - @author: Santiago D'hers
