RAIN - Real & Artificial Intelligence for Neuroscience

## Prepare positions

Welcome!

Here you'll find the initial steps to prepare your data for behavioral analysis.

The position data obtained with pose estimation software (e.g., DeepLabCut or SLEAP) is usually stored in HDF files, with extension '.h5'.

This notebook will:

- Read HDF files of rodent tracking data.
- Filter out low likelihood positions, interpolate and smoothen the data.
- Prepare the position files to be analyzed.

#### Requirements:
A folder with:
- HDF files containing the position of the mouse **bodyparts** and the **exploration targets** on the video.

Or:
- HDF files containing the position of the mouse **bodyparts**.
- A separeate JSON file containing the ROIs of the exploration targets (see [0-Video_handling](0-Video_handling.ipynb)).

If you dont have your position files with you, don't worry! You can demo the pipeline by working on the example data provided in the Rainstorm repository. It contains:
- A **Novel Object Recognition** (NOR) task, with positions from each **mouse bodyparts**, analized using **DLC**. Locations of the **exploration targets** are added using points selected with the Draw ROIs tool in the [0-Video_handling](0-Video_handling.ipynb) notebook.

---
#### Load the necessary modules

In [None]:
from pathlib import Path
import rainstorm.prepare_positions as rst

---
#### 1. State your project path
`base` : The path to the downloaded repository. If you are using a Windows path with backslashes, place an ‘r’ in front of the directory path to avoid an error (e.g. r'C:\Users\dhers\Rainstorm').

`folder_path` : The path to the folder containing the pose estimation files you want to use.

`ROIs_path` : The path to the file with the Regions of Interest (ROIs). The ROIs.json file can be created using the `draw_rois` function on the [0-Video_handling](0-Video_handling.ipynb) notebook.

In [None]:
# Define your base path (e.g., to the Rainstorm repository)
base = Path.cwd() # On my end, this is equivalent to Path(r'C:\Users\dhers\Desktop\Rainstorm') 

# Define the path to your experiment folder containing the pose estimation files
folder_path = base / 'examples' / 'NOR' # To use the demo data, set folder_path to: base / 'examples' / 'NOR'

# Define the path to your ROIs.json file (optional)
ROIs_path = folder_path / 'ROIs.json' 

---
#### 2. Rename the files to end in '_position.h5'
Since we use the data from different softwares, filenames end with something like '{Software_used + Network + name + date + snapshot}.h5'

We start by editing the filenames. We are looking for the following:
- Position files **must** end with the suffix '_position'.
- (Optional) If the files belong to different trials of an experiment, they should contain the name of the trial in the filename.

We can find an easy way to rename files below.

In [None]:
# Let's first make a copy of the example position_files (so that we have a backup in case things go south)
rst.backup_folder(folder_path, overwrite=False) # Set overwrite=True if you want to replace an existing backup

In [None]:
# Change the filenames as needed
before =  'DLC_Resnet50_rainstormFeb17shuffle4_snapshot_200.h5' # 'DLC_Resnet50_rainstormFeb17shuffle4_snapshot_200.h5'for the NOR_example 
after = '_positions.h5'

rst.rename_files(folder_path, before, after)

---
#### 3.  Create the params.yaml file

The params.yaml file is a configuration file that contains all the parameters needed to run the analysis. It will be located in the experiment folder.


In [None]:
params = rst.create_params(folder_path, ROIs_path, targets_present=True)


#### Open the params.yaml file and modify the following parameters:

`path` : Path to the experiment folder containing the pose estimation files

`filenames` : Pose estimation filenames

`software` : Software used to generate the pose estimation files ('DLC' or 'SLEAP')

`fps` : Video frames per second

`bodyparts` : Tracked bodyparts

`targets` : Exploration targets

`prepare_positions` : Parameters for processing positions:
- confidence : How many std_dev away from the mean the points likelihood can be without being erased (it is similar to asking 'how good is your tracking?')
- median_filter : Number of frames to use for the median filter (it must be an odd number)

`geometric_analysis` : Parameters for geometric analysis:
- roi_data : Loaded from ROIs.json
  - frame_shape: Shape of the video frames ([width, height])
  - scale: Scale of the video in px/cm
  - areas: Defined ROIs (areas) in the frame
  - points: Key points within the frame
  - circles: Circular areas in the frame
- target_exploration:
  - distance : Maximum nose-target distance to consider exploration
  - orientation: Set up orientation analysis
    - degree: Maximum head-target orientation angle to consider exploration (in degrees)
    - front: Ending bodypart of the orientation line
    - pivot: Starting bodypart of the orientation line
- freezing_threshold : Movement threshold to consider freezing, computed as the mean std of all body parts over 1 second

`automatic_analysis` : Parameters for automatic analysis:
- model_path : Path to the model file
- model_bodyparts : Bodyparts used to train the model
- rescaling : Whether to rescale the data
- reshaping : Whether to reshape the data (set to True for RNN)
- RNN_width : Defines the shape of the RNN
  - past : Number of past frames to include
  - future : Number of future frames to include
  - broad : Broaden the window by skipping some frames as we stray further from the present

`seize_labels` : Parameters for the analysis of the experiment results:
- trials : If your experiment has multiple trials, list the trial names here
- target_roles : Role/novelty of each target in the experiment
- label_type : Type of labels used to measure exploration (geolabels, autolabels, labels, etc)

---
#### 4. Open an example file and see what is inside

In [None]:
example_file_path = rst.choose_example_positions(params, look_for='TS_02', suffix='_positions.h5')
df_raw = rst.open_h5_file(params, example_file_path, print_data=True)

Notice that, if the model is working properly, the mean likelihood of an existing point is close to 1.

However, some points have lower mean likelihoods and higher standard deviations. This is because those points are harder to find (e.g. the nose tends to disappear during grooming).

We will adjust our tolerance for each point, and erase only the positions that are below it.

---
#### 5. Add the position of stationary exploration targets
As we talked about in the introduction, the position of the exploration targets can either be tracked using the same software we use to track our animals, or we can add them here.

If our pose estimation model doesn't track the exploration targets, we can add them to the DataFrame using the following `add_targets` function.

The `add_targets` function will add the points from `roi_data` in the params file **only** if they are also named in the `targets` list.

In [None]:
df_raw = rst.add_targets(params, df_raw, verbose=True)

---
#### 6. Now that we have our file, lets test our processing parameters in an example video

In [None]:
# Create the smooth position data
df_smooth = rst.filter_and_smooth_df(params, df_raw)

In [None]:
rst.plot_raw_vs_smooth(params, df_raw, df_smooth, bodypart='nose')

---
#### 7. Batch process all position files
Now that we know what we are doing, we can apply all previous steps to all the files in our folder and store the results into csv files (lets face it, they are less scary).

In [None]:
rst.process_position_files(params, targetless_trials=['Hab'])

---
#### 8. Finally, we can organize the files into subfolders corresponding to different trials of the experiment.

In [None]:
rst.filter_and_move_files(params)

---
---
#### Our experiment folder now has subfolders according to the number of trials, each containing csv files with mice position.
We can move on to the next notebook, [2b-Geometric_analysis](2b-Geometric_analysis.ipynb)

---
RAINSTORM - Created on Aug 27, 2023 - @author: Santiago D'hers
