RAIN - Real & Artificial Intelligence for Neuroscience

## Prepare positions

Welcome!

This is the first oficial notebook of the Rainstorm project. Here you'll find the initial steps to prepare your data for analysis.

The position data obtained with pose estimation software is usually stored in HDF files (those ending in '.h5'), that will be our starting point.

- This notebook will take HDF files of mouse tracking data (obtained for example with DeepLabCut or SLEAP) and prepare the position files to be analyzed.
- It filters out low likelihood positions, interpolates and smoothens the data.
- The positions can also be scaled from pixels to centimeters for better generalization.

#### Requirements:
- A folder with HDF files containing:
    - The position of the mouse bodyparts on the video.
    - The position of the **exploration targets** (Optional, but recommended, since they can be added manually later).

If you dont have your position files with you, don't worry! You can demo the pipeline by working on the example data provided in the downloaded repository. It contains:
- A **Novel Object Recognition** (NOR) task, with positions from each **mouse bodyparts and two objects** analized using **DeepLabCut**.
- A **Social Preference** (SP) task, containing the position for each **mouse bodyparts**, analized using **SLEAP**. Locations of the **exploration targets** are added using points selected with the  Draw_ROIs notebook.


---
#### Load the necessary modules

In [1]:
import os
from glob import glob
import rainstorm.prepare_positions as rst

---
#### 1. State your project path
`base` : The path to the downloaded repository. If you are using a Windows path with backslashes, place an ‘r’ in front of the directory path to avoid an error (e.g. r'C:\Users\dhers\Rainstorm').

`folder_path` : The path to the folder containing the pose estimation files you want to use.

`ROIs_path` : The path to the json file that was generated using the Draw_ROIs.ipynb notebook.

In [2]:
# State your path:
base = r'C:\Users\dhers\Desktop\Rainstorm' # For the downloaded repository
folder_path = os.path.join(base, r'docs\examples\NOR_example') # For the folder containing the pose estimation files you want to use

folder_path = r'D:\sdhers\NOR full videos\3xTg_B2 (TORM)\2025_02-Y_Maze_12_m\modified\Y_Maze'
ROIs_path = os.path.join(folder_path, 'ROIs.json') # For the ROIs.json file (optional)

---
#### 2. Rename the files to end in '_position.h5'
To ease the analysis, we should start by editing the filenames. We are looking for the following:
- Position files must end with the word '_position'.
- Since we use the data from different softwares, filenames end with something like '{DLC or SLEAP}_{Network_used + name + date + snapshot}.h5'.
- (Optional) If the files belong to different trials of an experiment, they should contain the name of the trial in the filename.

We can find an easy way to rename files below.

In [3]:
# Lets first make a copy of the example position_files (so that we have a backup in case things go south)
rst.backup_folder(folder_path)

The folder 'D:\sdhers\NOR full videos\3xTg_B2 (TORM)\2025_02-Y_Maze_12_m\modified\Y_Maze_backup' already exists.


In [4]:
# Change the file name as needed

before =  '_position_position.h5' 
# 'DLC_resnet50_shuffle2_200000.h5'for the NOR_example 
# 'DLC_Resnet50_rainstormFeb17shuffle4_snapshot_200.h5' # for the SP example
after = '_position.h5'

rst.rename_files(folder_path, before, after)

#### 3.  Create the params.yaml file

the params.yaml file is a configuration file that contains all the parameters needed to run the analysis. It is located in the experiment folder. It contains the following parameters:

`path` : Path to the experiment folder containing the pose estimation files.

`filenames` : List of the pose estimation filenames.

`software` : State the software used to generate the tracking files ('DLC' or 'SLEAP').

`bodyparts` : List the tracked bodyparts.

`targets` : List the exploratory targets.

`trials` : If your experiment has multiple trials, specify the trial names here.

`filtering & smoothing` : Parameters for processing positions:
- confidence : State how many std_dev away from the mean the points can be without being erased (it is similar to asking "how good is your tracking?").
- tolerance : If the mean likelihood is below this value, the whole point will be erased (because it is probably not there).
- median_filter : State how many frames to use for the median filter. It must be an odd number.

`scaling` : Measure the distance between two bodyparts to scale the video from pixels to cm.
- measured_points : State the two points that will be used to measure the distance between them.
- measured_dist : State the distance between the measured points. Distance between the ears is 1.8 cm in my c57 mice.

`video_fps` : State the frames per second of the videos.

`roi_data` : Information about the ROIs. It is a dictionary with the following keys:
- frame_shape: Shape of the video frames.
- areas: Defined ROIs (areas) in the frame.
- points: Key points within the frame.

`geometric analysis` : Parameters for defining exploration and freezing behavior:
- distance : State the maximum nose-object distance to consider exploration.
- angle : State the maximum head-object orientation angle to consider exploration.
- freezing_threshold : State the movement threshold for freezing, computed as mean std of all body parts over 1 second.

In [5]:
# Create the YAML file
params = rst.create_params(folder_path, ROIs_path)

params.yaml already exists in D:\sdhers\NOR full videos\3xTg_B2 (TORM)\2025_02-Y_Maze_12_m\modified\Y_Maze. Skipping creation.


---
#### 3. Open an example file and see what is inside

In [6]:
# Choose a random example file to plot:
all_h5_files = glob(os.path.join(folder_path,"*position.h5"))
example_path = rst.choose_example(all_h5_files, look_for = 'TS') # You can use the 'look_for' variable to specify the file you want to use (e.g. 'TS_C1_A').

No files found with the specified word
Plotting coordinates from 2025_03-Y_Maze-R17_C6d_position.h5


In [7]:
# Open the example file:
df_raw = rst.open_h5_file(params, example_path, print_data=True)

Positions obtained by: DLC_Resnet50_rainstormFeb17shuffle4_snapshot_200
Points in df: ['body', 'head', 'left_ear', 'left_hip', 'left_midside', 'left_shoulder', 'neck', 'nose', 'right_ear', 'right_hip', 'right_midside', 'right_shoulder', 'tail_base', 'tail_end', 'tail_mid']
Frame count: 13975
body 	 median: 0.95 	 mean: 0.93 	 std_dev: 0.07 	 tolerance: 0.80
head 	 median: 0.87 	 mean: 0.84 	 std_dev: 0.12 	 tolerance: 0.60
left_ear 	 median: 0.84 	 mean: 0.80 	 std_dev: 0.12 	 tolerance: 0.57
left_hip 	 median: 0.90 	 mean: 0.88 	 std_dev: 0.10 	 tolerance: 0.67
left_midside 	 median: 0.89 	 mean: 0.87 	 std_dev: 0.11 	 tolerance: 0.64
left_shoulder 	 median: 0.85 	 mean: 0.82 	 std_dev: 0.11 	 tolerance: 0.60
neck 	 median: 0.85 	 mean: 0.83 	 std_dev: 0.10 	 tolerance: 0.64
nose 	 median: 0.81 	 mean: 0.78 	 std_dev: 0.12 	 tolerance: 0.54
right_ear 	 median: 0.85 	 mean: 0.83 	 std_dev: 0.10 	 tolerance: 0.63
right_hip 	 median: 0.87 	 mean: 0.86 	 std_dev: 0.10 	 tolerance: 0.66
ri

Notice that, if the model is working properly, the mean likelihood of an existing point is very close to 1. However, some points have lower mean likelihoods and higher standard deviations. This is because those points are harder to find (e.g. the nose tends to disappear during grooming). We will adjust our tolerance for each point, and erase only the positions that are below it.

---
#### 4. Add the position of stationary exploration targets to the DataFrame
As we talked about in the introduction, the position of the exploration targets can be tracked using the same software we use to track our animals, or not.

If our pose estimation model do not track the exploration targets, we can add them to the DataFrame using the following `add_targets` function.

The `add_targets` function will use the points selected using the Draw_ROIs tool, all we need to do is to add the ROIs and the points name to the `targets` list in the params.yaml file.

In [8]:
df_raw = rst.add_targets(params, df_raw)

---
#### 5. Now that we have our file, lets test our processing parameters in an example video

In [None]:
df_smooth = rst.filter_and_smooth_df(params, df_raw)

rst.plot_raw_vs_smooth(params, df_raw, df_smooth, bodypart='nose')

---
#### 6. Measure ROI activity.


In [None]:
import pandas as pd
import yaml

def load_yaml(params_path: str) -> dict:
    """Loads a YAML file."""
    with open(params_path, "r") as file:
        return yaml.safe_load(file)

def is_inside_area(x, y, area):
    cx, cy = area['center']
    w, h = area['width'], area['height']
    return (cx - w // 2 <= x <= cx + w // 2) and (cy - h // 2 <= y <= cy + h // 2)

def assign_areas(params_path, df):
    params = load_yaml(params_path)
    bodyparts = params.get("bodyparts", [])
    roi_data = params.get("roi_data", {})
    areas = roi_data.get("areas", [])
    
    ROI_activity = pd.DataFrame(index=df.index)
    
    for bodypart in bodyparts:
        x_col, y_col = f'{bodypart}_x', f'{bodypart}_y'
        labels = []
        for _, row in df.iterrows():
            x, y = row[x_col], row[y_col]
            area_name = next((area['name'] for area in areas if is_inside_area(x, y, area)), 'outside')
            labels.append(area_name)
        ROI_activity[bodypart] = labels
    
    return ROI_activity

---
#### (Optional, but recommended) If we want to scale our data from pixels to cm, we can use a measured distance between two points.
How do we choose which points to choose?
- Best case scenario, we use the same two points that we selected when we aligned the videos (e.g. two corners of the arena).
- If we didn't align the videos, we can use two points that are static throughout the experiment (e.g. two objects used as exploration targets).
- If we don't have two points that are static throughout the experiment, we can use two bodyparts that keep a constant distance from each other (e.g. the distance between both ears).

In [None]:
scale = rst.find_scale_factor(params, df_smooth, print_results=True, plot_results=True)

median distance is 240.95850265139018, mean distance is 240.95850265139018. Scale factor is 0.1868 (1 cm = 5.35 px).
Distance between points is a single value. Skipping plot.


---
#### 6. Now that we know what we are doing, we can apply all previous steps to all the files in our folder and store the results into csv files (lets face it, they are less scary).

In [12]:
# Process every file in the folder
rst.process_position_files(params, scale = True)

2025_03-Y_Maze-R01_C1i_position.h5 has 30 columns. Mouse entered after 2.07 sec. Scale factor: 0.1868 (1 cm = 5.35 px). 
2025_03-Y_Maze-R02_C1d_position.h5 has 30 columns. Mouse entered after 2.30 sec. Scale factor: 0.1868 (1 cm = 5.35 px). 
2025_03-Y_Maze-R03_C1a_position.h5 has 30 columns. Mouse entered after 1.57 sec. Scale factor: 0.1868 (1 cm = 5.35 px). 
2025_03-Y_Maze-R04_C2i_position.h5 has 30 columns. Mouse entered after 1.50 sec. Scale factor: 0.1868 (1 cm = 5.35 px). 
2025_03-Y_Maze-R05_C2d_position.h5 has 30 columns. Mouse entered after 2.13 sec. Scale factor: 0.1868 (1 cm = 5.35 px). 
2025_03-Y_Maze-R06_C2a_position.h5 has 30 columns. Mouse entered after 1.77 sec. Scale factor: 0.1868 (1 cm = 5.35 px). 
2025_03-Y_Maze-R07_C3i_position.h5 has 30 columns. Mouse entered after 1.47 sec. Scale factor: 0.1868 (1 cm = 5.35 px). 
2025_03-Y_Maze-R08_C3d_position.h5 has 30 columns. Mouse entered after 0.90 sec. Scale factor: 0.1868 (1 cm = 5.35 px). 
2025_03-Y_Maze-R09_C3n_position.

---
#### 7. Finally, we can organize the files into subfolders corresponding to different trials of the experiment.

In [13]:
# Clean the folder
rst.filter_and_move_files(params)

Files filtered and moved successfully.
All .H5 files are stored away


---
---
#### Our experiment folder now has subfolders according to the number of trials, each containing csv files with mice position.
We can move on to the next notebook, 2-Geometric_analysis.ipynb

---
RAINSTORM - Created on Aug 27, 2023 - @author: Santiago D'hers
