RAIN - Real & Artificial Intelligence for Neuroscience

## Prepare positions

Welcome!

This is the first oficial notebook of the Rainstorm project. Here you'll find the initial steps to prepare your data for analysis.

The position data obtained with pose estimation software (e.g. DeepLabCut or SLEAP) is usually stored in HDF files, with extension '.h5'.

This notebook will:

- Read HDF files of rodent tracking data.
- Filter out low likelihood positions, interpolate and smoothen the data.
- Prepare the position files to be analyzed.

#### Requirements:
- A folder with:
    - HDF files containing:
        - The position of the mouse bodyparts on the video.
        - The position of the **exploration targets** (Optional, since they can be added from the ROIs.json) .
    - A JSON file containing the ROIs of the exploration targets (Optional, since they can be added from the HDF files).

If you dont have your position files with you, don't worry! You can demo the pipeline by working on the example data provided in the Rainstorm repository. It contains:
- A **Novel Object Recognition** (NOR) task, with positions from each **mouse bodyparts**, analized using **DLC**. Locations of the **exploration targets** are added using points selected with the `0-Video_handling` notebook.

---
#### Load the necessary modules

In [1]:
import os
import rainstorm.prepare_positions as rst

---
#### 1. State your project path
`base` : The path to the downloaded repository. If you are using a Windows path with backslashes, place an ‘r’ in front of the directory path to avoid an error (e.g. r'C:\Users\dhers\Rainstorm').

`folder_path` : The path to the folder containing the pose estimation files you want to use.

`ROIs_path` : The path to the file with the Regions of Interest (ROIs). The ROIs.json file can be created using the `draw_rois` function on the `0-Video_handling` notebook.

In [2]:
base = r'C:\Users\dhers\Desktop\Rainstorm' # For the downloaded repository

# folder_path = os.path.join(base, r'docs\examples\PD')
folder_path = r'C:\Users\dhers\OneDrive\Doctorado\Experimentos\PD\PD'
ROIs_path = os.path.join(folder_path, 'ROIs.json') # For the ROIs.json file (optional)

---
#### 2. Rename the files to end in '_position.h5'
To ease the analysis, we should start by editing the filenames. We are looking for the following:
- Position files must end with the word '_position'.
- Since we use the data from different softwares, filenames end with something like '{Software_used + Network + name + date + snapshot}.h5'.
- (Optional) If the files belong to different trials of an experiment, they should contain the name of the trial in the filename.

We can find an easy way to rename files below.

In [3]:
# Lets first make a copy of the example position_files (so that we have a backup in case things go south)
rst.backup_folder(folder_path)

INFO:rainstorm.prepare_positions:Backup created at 'C:\Users\dhers\OneDrive\Doctorado\Experimentos\PD\PD_backup'


'C:\\Users\\dhers\\OneDrive\\Doctorado\\Experimentos\\PD\\PD_backup'

In [5]:
# Change the filenames as needed
before =  'DLC_Resnet50_rainstormFeb17shuffle1_snapshot_200.h5' # 'DLC_Resnet50_rainstormFeb17shuffle4_snapshot_200.h5'for the NOR_example 
after = '_positions.h5'

rst.rename_files(folder_path, before, after)

INFO:root:Renamed: C:\Users\dhers\OneDrive\Doctorado\Experimentos\PD\PD\2024_05-PD_45-Hab-R13_C6nDLC_Resnet50_rainstormFeb17shuffle1_snapshot_200.h5 → C:\Users\dhers\OneDrive\Doctorado\Experimentos\PD\PD\2024_05-PD_45-Hab-R13_C6n_positions.h5
INFO:root:Renamed: C:\Users\dhers\OneDrive\Doctorado\Experimentos\PD\PD\2024_05-PD_45-Hab-R14_C6iDLC_Resnet50_rainstormFeb17shuffle1_snapshot_200.h5 → C:\Users\dhers\OneDrive\Doctorado\Experimentos\PD\PD\2024_05-PD_45-Hab-R14_C6i_positions.h5
INFO:root:Renamed: C:\Users\dhers\OneDrive\Doctorado\Experimentos\PD\PD\2024_05-PD_45-Hab-R15_C6dDLC_Resnet50_rainstormFeb17shuffle1_snapshot_200.h5 → C:\Users\dhers\OneDrive\Doctorado\Experimentos\PD\PD\2024_05-PD_45-Hab-R15_C6d_positions.h5
INFO:root:Renamed: C:\Users\dhers\OneDrive\Doctorado\Experimentos\PD\PD\2024_05-PD_45-Hab-R16_C6aDLC_Resnet50_rainstormFeb17shuffle1_snapshot_200.h5 → C:\Users\dhers\OneDrive\Doctorado\Experimentos\PD\PD\2024_05-PD_45-Hab-R16_C6a_positions.h5
INFO:root:Renamed: C:\Users\

INFO:root:Renamed: C:\Users\dhers\OneDrive\Doctorado\Experimentos\PD\PD\2024_05-PD_45-Hab-R19_C7dDLC_Resnet50_rainstormFeb17shuffle1_snapshot_200.h5 → C:\Users\dhers\OneDrive\Doctorado\Experimentos\PD\PD\2024_05-PD_45-Hab-R19_C7d_positions.h5
INFO:root:Renamed: C:\Users\dhers\OneDrive\Doctorado\Experimentos\PD\PD\2024_05-PD_45-Hab-R21_C8nDLC_Resnet50_rainstormFeb17shuffle1_snapshot_200.h5 → C:\Users\dhers\OneDrive\Doctorado\Experimentos\PD\PD\2024_05-PD_45-Hab-R21_C8n_positions.h5
INFO:root:Renamed: C:\Users\dhers\OneDrive\Doctorado\Experimentos\PD\PD\2024_05-PD_45-Hab-R22_C8iDLC_Resnet50_rainstormFeb17shuffle1_snapshot_200.h5 → C:\Users\dhers\OneDrive\Doctorado\Experimentos\PD\PD\2024_05-PD_45-Hab-R22_C8i_positions.h5
INFO:root:Renamed: C:\Users\dhers\OneDrive\Doctorado\Experimentos\PD\PD\2024_05-PD_45-Hab-R23_C8dDLC_Resnet50_rainstormFeb17shuffle1_snapshot_200.h5 → C:\Users\dhers\OneDrive\Doctorado\Experimentos\PD\PD\2024_05-PD_45-Hab-R23_C8d_positions.h5
INFO:root:Renamed: C:\Users\

---
#### 3.  Create the params.yaml file

The params.yaml file is a configuration file that contains all the parameters needed to run the analysis. It will be located in the experiment folder.


In [6]:
params = rst.create_params(folder_path, ROIs_path)

INFO:rainstorm.prepare_positions:Parameters saved to C:\Users\dhers\OneDrive\Doctorado\Experimentos\PD\PD\params.yaml



#### Open the params.yaml file and modify the following parameters:

`path` : Path to the experiment folder containing the pose estimation files

`filenames` : Pose estimation filenames

`software` : Software used to generate the pose estimation files ('DLC' or 'SLEAP')

`fps` : Video frames per second

`bodyparts` : Tracked bodyparts

`targets` : Exploration targets

`prepare_positions` : Parameters for processing positions:
- confidence : How many std_dev away from the mean the points likelihood can be without being erased (it is similar to asking 'how good is your tracking?')
- median_filter : Number of frames to use for the median filter (it must be an odd number)

`geometric_analysis` : Parameters for geometric analysis:
- roi_data : Loaded from ROIs.json
  - frame_shape: Shape of the video frames ([width, height])
  - scale: Scale of the video in px/cm
  - areas: Defined ROIs (areas) in the frame
  - points: Key points within the frame
- distance : Maximum nose-target distance to consider exploration
- orientation: Set up orientation analysis
  - degree: Maximum head-target orientation angle to consider exploration (in degrees)
  - front: Ending bodypart of the orientation line
  - pivot: Starting bodypart of the orientation line
- freezing_threshold : Movement threshold to consider freezing, computed as the mean std of all body parts over 1 second

`automatic_analysis` : Parameters for automatic analysis:
- model_path : Path to the model file
- model_bodyparts : Bodyparts used to train the model
- rescaling : Whether to rescale the data
- reshaping : Whether to reshape the data (set to True for RNN)
- RNN_width : Defines the shape of the RNN
  - past : Number of past frames to include
  - future : Number of future frames to include
  - broad : Broaden the window by skipping some frames as we stray further from the present

`seize_labels` : Parameters for the analysis of the experiment results:
- groups : Experimental groups you want to compare
- trials : If your experiment has multiple trials, list the trial names here
- target_roles : Role/novelty of each target in the experiment
- label_type : Type of labels used to measure exploration (geolabels, autolabels, labels, etc)

---
#### 4. Open an example file and see what is inside

In [7]:
# Select an example file
example_path = rst.choose_example_h5(params, look_for = 'TS') # You can use the 'look_for' variable to specify the file you want to use (e.g. 'TS_C1_A').

# Open the example file:
df_raw = rst.open_h5_file(params, example_path, print_data=True)

INFO:rainstorm.prepare_positions:Found 23 filtered file(s). Using: 2025_04-PD_45-TS-R04_C1a_positions.h5
INFO:rainstorm.prepare_positions:Positions obtained by: DLC_Resnet50_rainstormFeb17shuffle1_snapshot_200
INFO:rainstorm.prepare_positions:Tracked points: ['body', 'head', 'left_ear', 'left_hip', 'left_midside', 'left_shoulder', 'neck', 'nose', 'right_ear', 'right_hip', 'right_midside', 'right_shoulder', 'tail_base', 'tail_end', 'tail_mid']
INFO:rainstorm.prepare_positions:Total frames: 18000
INFO:rainstorm.prepare_positions:body	 mean_likelihood: 0.81	 std_dev: 0.11	 tolerance: 0.60
INFO:rainstorm.prepare_positions:head	 mean_likelihood: 0.77	 std_dev: 0.11	 tolerance: 0.55
INFO:rainstorm.prepare_positions:left_ear	 mean_likelihood: 0.75	 std_dev: 0.13	 tolerance: 0.50
INFO:rainstorm.prepare_positions:left_hip	 mean_likelihood: 0.74	 std_dev: 0.12	 tolerance: 0.51
INFO:rainstorm.prepare_positions:left_midside	 mean_likelihood: 0.75	 std_dev: 0.12	 tolerance: 0.52
INFO:rainstorm.prep

Notice that, if the model is working properly, the mean likelihood of an existing point is very close to 1.

However, some points have lower mean likelihoods and higher standard deviations. This is because those points are harder to find (e.g. the nose tends to disappear during grooming).

We will adjust our tolerance for each point, and erase only the positions that are below it.

---
#### 5. Add the position of stationary exploration targets
As we talked about in the introduction, the position of the exploration targets can either be tracked using the same software we use to track our animals, or we can add them here.

If our pose estimation model doesn't track the exploration targets, we can add them to the DataFrame using the following `add_targets` function.

The `add_targets` function will add the points from `roi_data` in the params file **only** if they are also named in the `targets` list.

In [8]:
df_raw = rst.add_targets(params, df_raw, verbose=True)

INFO:rainstorm.prepare_positions:Added target columns for: obj_1
INFO:rainstorm.prepare_positions:Added target columns for: obj_2
INFO:rainstorm.prepare_positions:2 target(s) added to DataFrame.


---
#### 6. Now that we have our file, lets test our processing parameters in an example video

In [9]:
# Create the smooth position data
df_smooth = rst.filter_and_smooth_df(params, df_raw)

# Plot the original vs the smooth positions
rst.plot_raw_vs_smooth(params, df_raw, df_smooth, bodypart='nose')

---
#### 7. Batch process all position files
Now that we know what we are doing, we can apply all previous steps to all the files in our folder and store the results into csv files (lets face it, they are less scary).

In [10]:
rst.process_position_files(params, targetless_trials=['Hab'])

INFO:rainstorm.prepare_positions:Processed 2024_05-PD_45-Hab-R13_C6n_positions.h5 → 2024_05-PD_45-Hab-R13_C6n_positions.csv: 30 cols, mouse enters at 1.60s
INFO:rainstorm.prepare_positions:Processed 2024_05-PD_45-Hab-R14_C6i_positions.h5 → 2024_05-PD_45-Hab-R14_C6i_positions.csv: 30 cols, mouse enters at 0.27s
INFO:rainstorm.prepare_positions:Processed 2024_05-PD_45-Hab-R15_C6d_positions.h5 → 2024_05-PD_45-Hab-R15_C6d_positions.csv: 30 cols, mouse enters at 3.70s
INFO:rainstorm.prepare_positions:Processed 2024_05-PD_45-Hab-R16_C6a_positions.h5 → 2024_05-PD_45-Hab-R16_C6a_positions.csv: 30 cols, mouse enters at 3.87s
INFO:rainstorm.prepare_positions:Processed 2024_05-PD_45-Hab-R17_C7n_positions.h5 → 2024_05-PD_45-Hab-R17_C7n_positions.csv: 30 cols, mouse enters at 2.10s
INFO:rainstorm.prepare_positions:Processed 2024_05-PD_45-Hab-R18_C7i_positions.h5 → 2024_05-PD_45-Hab-R18_C7i_positions.csv: 30 cols, mouse enters at 1.37s
INFO:rainstorm.prepare_positions:Processed 2024_05-PD_45-Hab-R19

---
#### 8. Finally, we can organize the files into subfolders corresponding to different trials of the experiment.

In [11]:
rst.filter_and_move_files(params)

INFO:rainstorm.prepare_positions:Moved CSV 2024_05-PD_45-Hab-R13_C6n_positions.csv → Hab/positions/
INFO:rainstorm.prepare_positions:Moved CSV 2024_05-PD_45-Hab-R14_C6i_positions.csv → Hab/positions/
INFO:rainstorm.prepare_positions:Moved CSV 2024_05-PD_45-Hab-R15_C6d_positions.csv → Hab/positions/
INFO:rainstorm.prepare_positions:Moved CSV 2024_05-PD_45-Hab-R16_C6a_positions.csv → Hab/positions/
INFO:rainstorm.prepare_positions:Moved CSV 2024_05-PD_45-Hab-R17_C7n_positions.csv → Hab/positions/
INFO:rainstorm.prepare_positions:Moved CSV 2024_05-PD_45-Hab-R18_C7i_positions.csv → Hab/positions/
INFO:rainstorm.prepare_positions:Moved CSV 2024_05-PD_45-Hab-R19_C7d_positions.csv → Hab/positions/
INFO:rainstorm.prepare_positions:Moved CSV 2024_05-PD_45-Hab-R21_C8n_positions.csv → Hab/positions/
INFO:rainstorm.prepare_positions:Moved CSV 2024_05-PD_45-Hab-R22_C8i_positions.csv → Hab/positions/
INFO:rainstorm.prepare_positions:Moved CSV 2024_05-PD_45-Hab-R23_C8d_positions.csv → Hab/positions/


---
---
#### Our experiment folder now has subfolders according to the number of trials, each containing csv files with mice position.
We can move on to the next notebook, `2b-Geometric_analysis.ipynb`

---
RAINSTORM - Created on Aug 27, 2023 - @author: Santiago D'hers
