RAIN - Real & Artificial Intelligence for Neuroscience

## Prepare positions

Welcome!

This is the first oficial notebook of the Rainstorm project. Here you'll find the initial steps to prepare your data for analysis.

The position data obtained with pose estimation software (e.g. DeepLabCut or SLEAP) is usually stored in HDF files, with extension '.h5'.

This notebook will:

- Read HDF files of rodent tracking data.
- Filter out low likelihood positions, interpolate and smoothen the data.
- Prepare the position files to be analyzed.

#### Requirements:
- A folder with:
    - HDF files containing:
        - The position of the mouse bodyparts on the video.
        - The position of the **exploration targets** (Optional, since they can be added from the ROIs.json) .
    - A JSON file containing the ROIs of the exploration targets (Optional, since they can be added from the HDF files).

If you dont have your position files with you, don't worry! You can demo the pipeline by working on the example data provided in the Rainstorm repository. It contains:
- A **Novel Object Recognition** (NOR) task, with positions from each **mouse bodyparts and two objects** analized using **DeepLabCut**.
- A **Social Preference** (SP) task, containing the position for each **mouse bodyparts**, analized using **SLEAP**. Locations of the **exploration targets** are added using points selected with the `0-Video_handling` notebook.

---
#### Load the necessary modules

In [1]:
import os
import rainstorm.prepare_positions as rst

---
#### 1. State your project path
`base` : The path to the downloaded repository. If you are using a Windows path with backslashes, place an ‘r’ in front of the directory path to avoid an error (e.g. r'C:\Users\dhers\Rainstorm').

`folder_path` : The path to the folder containing the pose estimation files you want to use.

`ROIs_path` : The path to the json file that was generated using the `draw_rois` function on the `0-Video_handling` notebook.

In [2]:
# State your path:
base = r'C:\Users\dhers\Desktop\Rainstorm' # For the downloaded repository
folder_path = os.path.join(base, r'docs\examples\NOR_example') # For the folder containing the pose estimation files you want to use
ROIs_path = os.path.join(folder_path, 'ROIs.json') # For the ROIs.json file (optional)

---
#### 2. Rename the files to end in '_position.h5'
To ease the analysis, we should start by editing the filenames. We are looking for the following:
- Position files must end with the word '_position'.
- Since we use the data from different softwares, filenames end with something like '{DLC or SLEAP}_{Network_used + name + date + snapshot}.h5'.
- (Optional) If the files belong to different trials of an experiment, they should contain the name of the trial in the filename.

We can find an easy way to rename files below.

In [3]:
# Lets first make a copy of the example position_files (so that we have a backup in case things go south)
rst.backup_folder(folder_path)

The folder 'C:\Users\dhers\Desktop\Rainstorm\docs\examples\NOR_example_backup' already exists.


In [4]:
# Change the filenames as needed
before =  'DLC_resnet50_shuffle2_200000.h5' 
# 'DLC_resnet50_shuffle2_200000.h5'for the NOR_example 
# 'DLC_Resnet50_rainstormFeb17shuffle4_snapshot_200.h5' # for the SP example
after = '_position.h5'

rst.rename_files(folder_path, before, after)

#### 3.  Create the params.yaml file

The params.yaml file is a configuration file that contains all the parameters needed to run the analysis. It is located in the experiment folder. It contains the following parameters:

`path` : Path to the experiment folder containing the pose estimation files.

`filenames` : List of the pose estimation filenames.

`software` : State the software used to generate the tracking files ('DLC' or 'SLEAP').

`bodyparts` : List the tracked bodyparts.

`targets` : List the exploratory targets.

`trials` : If your experiment has multiple trials, specify the trial names here.

`filtering & smoothing` : Parameters for processing positions:
- confidence : State how many std_dev away from the mean the points can be without being erased (it is similar to asking "how good is your tracking?").
- tolerance : If the mean likelihood is below this value, the whole point will be erased (because it is probably not there).
- median_filter : State how many frames to use for the median filter. It must be an odd number.

`video_fps` : State the frames per second of the videos.

`roi_data` : Information about the ROIs. It is a dictionary with the following keys:
- frame_shape: Shape of the video frames.
- scale: Scale of the video in px/cm.
- areas: Defined ROIs (areas) in the frame.
- points: Key points within the frame.

`geometric analysis` : Parameters for defining exploration and freezing behavior:
- distance : State the maximum nose-object distance to consider exploration.
- angle : State the maximum head-object orientation angle to consider exploration.
- freezing_threshold : State the movement threshold for freezing, computed as mean std of all body parts over 1 second.

In [6]:
# Create the YAML file
params = rst.create_params(folder_path, ROIs_path)

Error loading ROI data: ROIs_path 'C:\Users\dhers\Desktop\Rainstorm\docs\examples\NOR_example\ROIs.json' does not exist.
Edit the params.yaml file manually to add ROIs.
Parameters saved to C:\Users\dhers\Desktop\Rainstorm\docs\examples\NOR_example\params.yaml


---
#### 3. Open an example file and see what is inside

In [None]:
example_path = rst.choose_example_h5(params, look_for = 'TS') # You can use the 'look_for' variable to specify the file you want to use (e.g. 'TS_C1_A').

# Open the example file:
df_raw = rst.open_h5_file(params, example_path, print_data=True)

Positions obtained by: DLC_resnet50_SauronSep30shuffle1_200000
Points in df: ['L_ear', 'R_ear', 'body', 'head', 'neck', 'nose', 'obj_1', 'obj_2', 'tail_1', 'tail_2', 'tail_3']
Frame count: 7500
L_ear 	 median: 1.00 	 mean: 0.97 	 std_dev: 0.16 	 tolerance: 0.64
R_ear 	 median: 1.00 	 mean: 0.97 	 std_dev: 0.16 	 tolerance: 0.65
body 	 median: 1.00 	 mean: 0.97 	 std_dev: 0.15 	 tolerance: 0.67
head 	 median: 1.00 	 mean: 0.97 	 std_dev: 0.17 	 tolerance: 0.64
neck 	 median: 1.00 	 mean: 0.97 	 std_dev: 0.16 	 tolerance: 0.65
nose 	 median: 1.00 	 mean: 0.95 	 std_dev: 0.20 	 tolerance: 0.55
obj_1 	 median: 1.00 	 mean: 0.99 	 std_dev: 0.10 	 tolerance: 0.80
obj_2 	 median: 1.00 	 mean: 0.98 	 std_dev: 0.15 	 tolerance: 0.68
tail_1 	 median: 1.00 	 mean: 0.96 	 std_dev: 0.17 	 tolerance: 0.62
tail_2 	 median: 1.00 	 mean: 0.97 	 std_dev: 0.16 	 tolerance: 0.65
tail_3 	 median: 1.00 	 mean: 0.96 	 std_dev: 0.17 	 tolerance: 0.62


Notice that, if the model is working properly, the mean likelihood of an existing point is very close to 1. However, some points have lower mean likelihoods and higher standard deviations. This is because those points are harder to find (e.g. the nose tends to disappear during grooming). We will adjust our tolerance for each point, and erase only the positions that are below it.

---
#### 4. Add the position of stationary exploration targets to the DataFrame
As we talked about in the introduction, the position of the exploration targets can be tracked using the same software we use to track our animals, or not.

If our pose estimation model doesn't track the exploration targets, we can add them to the DataFrame using the following `add_targets` function.

The `add_targets` function will use the points from the `ROIs.json` file if they are named in the `targets` list from the params.yaml file.

In [8]:
df_raw = rst.add_targets(params, df_raw)

---
#### 5. Now that we have our file, lets test our processing parameters in an example video

In [9]:
df_smooth = rst.filter_and_smooth_df(params, df_raw)

rst.plot_raw_vs_smooth(params, df_raw, df_smooth, bodypart='nose')

---
#### 6. Now that we know what we are doing, we can apply all previous steps to all the files in our folder and store the results into csv files (lets face it, they are less scary).

In [10]:
# Process every file in the folder
rst.process_position_files(params)

NOR_Hab_C1_A_position.h5 has 18 columns. Mouse entered after 1.93 sec.
NOR_Hab_C1_B_position.h5 has 18 columns. Mouse entered after 2.80 sec.
NOR_Hab_C1_C_position.h5 has 18 columns. Mouse entered after 1.87 sec.
NOR_Hab_C1_D_position.h5 has 18 columns. Mouse entered after 3.60 sec.
NOR_Hab_C2_A_position.h5 has 18 columns. Mouse entered after 1.77 sec.
NOR_Hab_C2_B_position.h5 has 18 columns. Mouse entered after 0.97 sec.
NOR_Hab_C2_C_position.h5 has 18 columns. Mouse entered after 6.27 sec.
NOR_Hab_C2_D_position.h5 has 18 columns. Mouse entered after 7.60 sec.
NOR_Hab_C3_A_position.h5 has 18 columns. Mouse entered after 1.33 sec.
NOR_Hab_C3_B_position.h5 has 18 columns. Mouse entered after 4.50 sec.
NOR_Hab_C3_C_position.h5 has 18 columns. Mouse entered after 0.10 sec.
NOR_Hab_C3_D_position.h5 has 18 columns. Mouse entered after 1.87 sec.
NOR_Hab_C4_A_position.h5 has 18 columns. Mouse entered after 2.87 sec.
NOR_Hab_C4_B_position.h5 has 18 columns. Mouse entered after 0.97 sec.
NOR_Ha

---
#### 7. Finally, we can organize the files into subfolders corresponding to different trials of the experiment.

In [11]:
# Clean the folder
rst.filter_and_move_files(params)

Files filtered and moved successfully.
All .H5 files are stored away


---
---
#### Our experiment folder now has subfolders according to the number of trials, each containing csv files with mice position.
We can move on to the next notebook, 2-Geometric_analysis.ipynb

---
RAINSTORM - Created on Aug 27, 2023 - @author: Santiago D'hers
