-
Notifications
You must be signed in to change notification settings - Fork 0
Techical Docs
This page provides detailed instructions on how to use the applications in the repository by explaining how they are currently being used in the context of the DevEx research project, specifically in relation to the first set of experiments. It also includes documentation for the modules and functions that make up libratools and DevExDashboard.
To start using the applications in this repository, it is assumed that a production ready data collection pipeline has already been set up. As suggested in Getting Started, such a pipeline may involve generating MP4 recordings using motif and tracking each video recording using BioTracker (as DevEx does) so that one is left with individual trajectories. These may be stored locally, in the cloud or using a network-attached storage (NAS) device. It is also assumed that this repository has been cloned/installed locally (see the separate repo-install-instructions).
For illustration, the output of your data collection pipeline may look as follows:
├──loopbio_data/ # <-- root folder on a NAS
├──camSN/ # <-- short for camera serial number
├──dateOfRecording_startTimeofRecording.camSN/
├──000001.npz # <-- created by Motif
├──000001.mp4 # <-- created by Motif
├──000002.npz
├──000002.mp4
├──metadata.yaml # <-- created by Motif
├──camSN_dateOfRecording_chunkNumber.csv # <-- generated by BioTracker (e.g.: 23520258_20210408_02.csv)
├──camSN_dateOfRecording_processed.csv # <-- generated using libratools (e.g.: 23520258_20210408_processed.csv)
After data collection, a script can be created that relies on the functions provided by libratools to conduct pre-processing and some automated post-processing tasks. In the case of the first DevEx experiment, this script is process.py, which is stored in the Processing/ folder. This script can be configured using the config.ini file contained in the same folder which includes paths, default values for free parameters, and other global variables. The most important variable is DATA_DIR, which tells the script where the data is stored (discussed further in Choosing Default Parameter Settings).
In the above example folder structure, the file 23520258_20210408_processed.csv is the final trajectory outputted by process.py, which can in turn be visualized using DevExDashboard. In the above case, where there are multiple recordings, this file will be a merger of the various CSV files corresponding to individual MP4 files with a fixed number of frames, or 'chunks', generated by BioTracker. The number of chunks is set by the StoreChunkSize variable in motif and stored in the metadata.yaml file (to generate vide chunks equivalent to 33 minutes of video at 5 FPS this would need to be set to 10000).
As described in the repo-install-instructions, each of the BioTracker-genrated CSV files that need to be processed can be located by their camera serial number. Hence, alongside process.py and config.ini, there is a camera_ids.yaml file in the Processing/ folder. By default, tracks will be processed for all cameras listed in this file (discussed further in Choosing Default Parameter Settings).
libratools is itself made up of several modules which each contain specific functions to handle tasks associated with loading, pre-processing, and post-processing trajectories. These can be contained in one or more scripts to handle pre-processing and post-processing tasks. For a detailed overview of each function, see the module's source code linked in the summary table in the below section libratools modules.
If using a production ready data pipeline that relies on motif for recording and BioTracker for tracking, process.py is a ready-made script that processes BioTracker-generated CSV files. To use it you simply need to run one command after configuring the script and deciding which cameras to process trajectories for or choosing to rely on the default settings (see the section Choosing Default Parameter Settings for libratools, process.py and DevExDashboard). First open the terminal and change into the directory developing-exploration-behavior/Processing (i.e., into your clone of the repository; if you are using one of the tracking PCs in the office, use powershell). Then run:
$ python process.py -d YYYYMMDD
where YYYYMMDD should be the date for which you want to process trajectory files generated by BioTracker. By default, the merged and processed trajectories will be saved to the same folder where the raw files are stored.
Note that process.py is a custom script written specifically for DevEx hence it provides just one use-case of how to make use of all of the methods offered by libratools. It also does not contain detailed docstrings, relying instead on the clarity of code and the docstrings that each libratools function has. For an overview of the functions implemented in process.py, which should all be self-explanatory, see the below table:
| All functions | |
|---|---|
main() |
Call date input and run processing pipeline function. |
get_input_args() |
Return camera ids and date for which to process CSV files. |
run_pipeline() |
Run processing pipeline and save processed trajectory to disk. |
locate_data() |
Return file paths to loopbio NPZ files and Biotracker-generated CSV files. |
load_data() |
Return merged trajectory segments as pandas.DataFrame |
preprocess_data() |
Load and impute missing merged trajectory. |
postprocess_data() |
Compute metrics and implement outlier detection. |
The package functions are conveniently documented at the package website: https://vincejstraub.github.io/tools-libratools/.