RAINSTORM - Created on Aug 27, 2023 - @author: Santiago D'hers

## Prepare positions
- This notebook will take H5 files of mouse tracking data (obtained with DeepLabCut) and prepare the position.csv files to be analyzed.
- It filters out low likelihood positions, interpolates and smoothens the data.
- The positions are also scaled from pixels to cm for better generalization.

#### Requirements:
- A folder with files of extention .H5 (from DeepLabCut) containing:
    - The position of the desired bodyparts (and objects) on the video.
    - The position filenames should contain the trial name, it will be used to organize the files into subfolders.


---
#### 1. Load the necessary modules

In [1]:
import os
from glob import glob
import rainstorm.prepare_positions as rst

---
#### 2. State your project path & thresholds
`base` : The path to the downloaded repository. If you are using a Windows path with backslashes, place an ‘r’ in front of the directory path to avoid an error (e.g. r'C:\Users\dhers\Desktop\RAINSTORM').

`filter_by` : A word that will be used to filter the position file used as an example.

`trials` : If your experiment has multiple trials, specify the trial names here.

`confidence` : State how many std_dev away from the mean the points can be without being erased (it is similar to asking "how good is your tracking?").

`tolerance` : If the mean likelihood is below this value, the whole point will be erased (because it is probably not there).

`median_filter` : State how many frames to use for the median filter. It must be an odd number.

`bodyparts` : List the bodyparts that you tracked with DLC.

`example_bodypart` : State which bodypart you'd like to plot as an example.

`objects` : List the stationary objects that you tracked with DLC.

`measured_points` : Measure the distance between two bodyparts to scale the video from pixels to cm. I use the left and right ears.

`measured_dist` : State the distance between the measured points. Distance between the ears is 1.8 cm in my c57 mice.

`video_fps` : State the frames per second of the videos.

In [2]:
# State your path:
base = r'C:\Users\dhers\Desktop\RAINSTORM'
folder_path = os.path.join(base, 'docs\examples\position_files_copy')

all_h5_files = glob(os.path.join(folder_path,"*position.h5"))
filter_by = 'TS' # You can use this variable to specify even more the filename you want to use (e.g. 'TS_C1_A').

trials  = ["Hab", "TR", "TS"]

confidence = 2
tolerance = 0.5
median_filter = 3 

bodyparts = ['nose', 'L_ear', 'R_ear', 'head', 'neck', 'body', 'tail_1', 'tail_2', 'tail_3']
example_bodypart = 'nose'
objects = ['obj_1', 'obj_2']

measured_points = ['L_ear', 'R_ear']
measured_dist = 1.8 # in cm

video_fps = 25

---
#### 3. We can open an example file and see what is inside

In [3]:
# Choose an example file to plot:
example_path = rst.choose_example(all_h5_files, filter_by)

Plotting coordinates from 2024-09_NOR_TS_C5_B_position.h5


In [4]:
# Open the example file:
df_raw = rst.open_h5_file(example_path, print_data=True, num_sd=confidence)

Positions obtained by model: DLC_resnet50_VaderDec1shuffle1_200000
Points in df: ['L_ear', 'R_ear', 'body', 'head', 'neck', 'nose', 'obj_1', 'obj_2', 'tail_1', 'tail_2', 'tail_3']
L_ear 	 median: 1.00 	 mean: 1.00 	 std_dev: 0.03 	 tolerance: 0.93
R_ear 	 median: 1.00 	 mean: 0.99 	 std_dev: 0.03 	 tolerance: 0.93
body 	 median: 1.00 	 mean: 1.00 	 std_dev: 0.00 	 tolerance: 0.99
head 	 median: 1.00 	 mean: 0.99 	 std_dev: 0.07 	 tolerance: 0.84
neck 	 median: 1.00 	 mean: 1.00 	 std_dev: 0.01 	 tolerance: 0.97
nose 	 median: 1.00 	 mean: 0.95 	 std_dev: 0.19 	 tolerance: 0.57
obj_1 	 median: 1.00 	 mean: 1.00 	 std_dev: 0.00 	 tolerance: 1.00
obj_2 	 median: 1.00 	 mean: 1.00 	 std_dev: 0.02 	 tolerance: 0.95
tail_1 	 median: 1.00 	 mean: 0.93 	 std_dev: 0.23 	 tolerance: 0.48
tail_2 	 median: 0.99 	 mean: 0.89 	 std_dev: 0.26 	 tolerance: 0.38
tail_3 	 median: 1.00 	 mean: 0.90 	 std_dev: 0.26 	 tolerance: 0.39


Notice that, if the DLC model is working properly, the median likelihood of an existing point is very close to 1. However, some points have lower mean likelihoods and higher standard deviations. This is because those points are harder to find (e.g. the nose tends to disappear during grooming). We will adjust our tolerance for each point, and erase only the positions that are below it.

---
#### 4. Now that we have our file, lets test our processing parameters in an example video

In [5]:
df_smooth = rst.filter_and_smooth_df(df_raw, bodyparts, objects, med_filt_window = median_filter, drop_below = tolerance, num_sd = confidence)

In [6]:
rst.plot_raw_vs_smoothed(df_raw, df_smooth, bodypart = example_bodypart, num_sd = confidence)

---
#### 5. (Optional, but recommended) If we want to scale our data from pixels to cm, we can use a measured distance between two bodyparts (e.g. the distance between both ears).

In [7]:
# plot scale
scale = rst.find_scale_factor(df_smooth, measured_dist, measured_points, print_results=True)

median distance is 33.87199400008042, mean distance is 34.22021291734502. Scale factor is 0.0531 (1 cm = 18.82 px).


---
#### 6. Now that we know what we are doing, we can apply all previous steps to all the files in our folder and store the results into csv files (lets face it, they are less scary).

In [8]:
# Process every file in the folder
rst.process_position_files(all_h5_files, bodyparts, objects, measured_dist, measured_points, scale = True, fps = video_fps, med_filt_window = median_filter, drop_below = tolerance, num_sd = confidence)

2024-09_NOR_Hab_C1_A_position.h5 has 18 columns. The mouse took 0.32 sec to enter. Scale factor is 0.0483 (1 cm = 20.72 px).
2024-09_NOR_Hab_C1_B_position.h5 has 18 columns. The mouse took 0.68 sec to enter. Scale factor is 0.0475 (1 cm = 21.05 px).
2024-09_NOR_Hab_C1_C_position.h5 has 18 columns. The mouse took 1.32 sec to enter. Scale factor is 0.0483 (1 cm = 20.71 px).
2024-09_NOR_Hab_C2_A_position.h5 has 18 columns. The mouse took 0.00 sec to enter. Scale factor is 0.0513 (1 cm = 19.50 px).
2024-09_NOR_Hab_C2_B_position.h5 has 18 columns. The mouse took 1.36 sec to enter. Scale factor is 0.0529 (1 cm = 18.91 px).
2024-09_NOR_Hab_C2_C_position.h5 has 18 columns. The mouse took 0.00 sec to enter. Scale factor is 0.0502 (1 cm = 19.90 px).
2024-09_NOR_Hab_C3_A_position.h5 has 18 columns. The mouse took 1.68 sec to enter. Scale factor is 0.0474 (1 cm = 21.09 px).
2024-09_NOR_Hab_C3_B_position.h5 has 18 columns. The mouse took 0.00 sec to enter. Scale factor is 0.0499 (1 cm = 20.05 px).


---
#### 7. Finally, we can organize the files into subfolders corresponding to different trials of the experiment.

In [9]:
# Clean the folder
rst.filter_and_move_files(folder_path, trials)

Files filtered and moved successfully.
All .H5 files are stored away
