Skip to content

Techical Docs

Vincent (Vince) J. Straub edited this page Mar 20, 2022 · 3 revisions

Overview

This page provides detailed instructions on how to use the applications in the repository by explaining how they are currently being used in the context of the DevEx research project, specifically in relation to the first set of experiments. It also includes documentation for the modules and functions that make up libratools and DevExDashboard.

Setting up a Data Collection Pipeline

To start using the applications in this repository, it is assumed that a production ready data collection pipeline has already been set up. As suggested in Getting Started, such a pipeline may involve generating MP4 recordings using motif and tracking each video recording using BioTracker (as DevEx does) so that one is left with individual trajectories. These may be stored locally, in the cloud or using a network-attached storage (NAS) device. It is also assumed that this repository has been cloned/installed locally (see the separate repo-install-instructions).

For illustration, the output of your data collection pipeline may look as follows:

├──loopbio_data/    # <-- root folder on a NAS
  ├──camSN/    # <-- short for camera serial number
      ├──dateOfRecording_startTimeofRecording.camSN/     
          ├──000001.npz     # <-- created by Motif
          ├──000001.mp4     # <-- created by Motif
          ├──000002.npz 
          ├──000002.mp4
          ├──metadata.yaml      # <-- created by Motif
          ├──camSN_dateOfRecording_chunkNumber.csv    # <-- generated by BioTracker (e.g.: 23520258_20210408_02.csv)
          ├──camSN_dateOfRecording_processed.csv    # <-- generated using libratools (e.g.: 23520258_20210408_processed.csv)

Creating a Data Processing Workflow

After data collection, a script can be created that relies on the functions provided by libratools to conduct pre-processing and some automated post-processing tasks. In the case of the first DevEx experiment, this script is process.py, which is stored in the Processing/ folder. This script can be configured using the config.ini file contained in the same folder which includes paths, default values for free parameters, and other global variables. The most important variable is DATA_DIR, which tells the script where the data is stored (discussed further in Choosing Default Parameter Settings).

In the above example folder structure, the file 23520258_20210408_processed.csv is the final trajectory outputted by process.py, which can in turn be visualized using DevExDashboard. In the above case, where there are multiple recordings, this file will be a merger of the various CSV files corresponding to individual MP4 files with a fixed number of frames, or 'chunks', generated by BioTracker. The number of chunks is set by the StoreChunkSize variable in motif and stored in the metadata.yaml file (to generate vide chunks equivalent to 33 minutes of video at 5 FPS this would need to be set to 10000).

As described in the repo-install-instructions, each of the BioTracker-genrated CSV files that need to be processed can be located by their camera serial number. Hence, alongside process.py and config.ini, there is a camera_ids.yaml file in the Processing/ folder. By default, tracks will be processed for all cameras listed in this file (discussed further in Choosing Default Parameter Settings).

How libratools can be used

libratools is itself made up of several modules which each contain specific functions to handle tasks associated with loading, pre-processing, and post-processing trajectories. These can be contained in one or more scripts to handle pre-processing and post-processing tasks. For a detailed overview of each function, see the module's source code linked in the summary table in the below section libratools modules.

How to use process.py

If using a production ready data pipeline that relies on motif for recording and BioTracker for tracking, process.py is a ready-made script that processes BioTracker-generated CSV files. To use it you simply need to run one command after configuring the script and deciding which cameras to process trajectories for or choosing to rely on the default settings (see the section Choosing Default Parameter Settings for libratools, process.py and DevExDashboard). First open the terminal and change into the directory developing-exploration-behavior/Processing (i.e., into your clone of the repository; if you are using one of the tracking PCs in the office, use powershell). Then run:

  • $ python process.py -d YYYYMMDD

where YYYYMMDD should be the date for which you want to process trajectory files generated by BioTracker. By default, the merged and processed trajectories will be saved to the same folder where the raw files are stored.

Note that process.py is a custom script written specifically for DevEx hence it provides just one use-case of how to make use of all of the methods offered by libratools. It also does not contain detailed docstrings, relying instead on the clarity of code and the docstrings that each libratools function has. For an overview of the functions implemented in process.py, which should all be self-explanatory, see the below table:

All functions
main() Call date input and run processing pipeline function.
get_input_args() Return camera ids and date for which to process CSV files.
run_pipeline() Run processing pipeline and save processed trajectory to disk.
locate_data() Return file paths to loopbio NPZ files and Biotracker-generated CSV files.
load_data() Return merged trajectory segments as pandas.DataFrame
preprocess_data() Load and impute missing merged trajectory.
postprocess_data() Compute metrics and implement outlier detection.

How to use DevExDashboard

DevExDashboard relies on a single file app.py, which contains a list of simple python functions to load and visualize data, create interactive widgets, and generate outputs. It is built using Streamlit, an open-source Python library that makes it easy to create and share custom web apps.

The dashboard itself has four main sections; a small panel entitled Notifications, which displays informational messages. A section entitled Treatment allocation, which contains table with metrics to aid in treatment allocation. A section entitled Error Diagnosis, which contains two tables with metrics to aid in diagnosing errors. And an Individual monitoring section, which visualizes each trajectory including key metrics. When loading the dashboard, the user is presented with instructions for how to use the dashboard hence these are omitted here.

For an overview of the functions implemented in app.py, which show the structure of how the app is built, see the table in the below section DevExDashboard functions.

See here for a video demo of how to load the dashboard, and here for how to use the dashboard.

Choosing Default Parameter Settings for libratools, process.py and DevExDashboard

libratools and DevExDashboard rely on a number of parameters that can be changed in configuration files. This also holds for process.py which, as already mentioned, depends on variables like DATA_DIR to run.

libratools only has one configuration file config.ini, located in libratools/. It only contains two variables BIOTRACKER_COLS and LIBRATOOLS_COLS, which list the column names currently used for CSV files by BioTracker and libratools in the context of DevEx. These variables should not need to be changed unless major changes are made to the libratools library or different default parameter settings are used when generating CSV files with BioTracker.

If using the process.py script, there are two configuration files, config.ini and camera_ids.yaml, both located in the Processing/ folder. The former contains file names, file paths, default parameters for data processing, default settings used in loopbio, and default file extensions. The most important variable that needs to be configured is DATA_DIR, which determines where the script should access BioTracker-generated CSV files from. The other variables that may need to be changed are located under the section [DEFAULTS]. These include MISSING_VAL_METHOD, which determines how missing data-points are handled. MAX_MISSING_VAL_PER, which determines what percentage of data-points can be missing data-points for a track to still be considered usable or otherwise labelled as corrupt. MAX_STEP_THRESH, which determines the maximum step length currently used in the context of DevEx to detect outlying data-points. SKIP_INITIAL, which determines for how many seconds systematic errors are expected to occur in tracking at the very start of BioTracker-generated CSV files. All parameters stored in [DEFAULTS] should only be changed in the context of DevEx if a substantially different experimental set-up is used. Similarly, all variables located in the section [LOOPBIO] should be self-explanatory and should only be changed if different motif camera settings are used.

DevExDashboard has one configuration file config.yaml, located in DevExDashboard. It contains file names, directory paths, default further settings, and variables like LIBRATOOLS_COLS. In general, the only variables that need to be configured after installing the dashboard are DATA_DIR an DASHBOARD_OUTPUTS_DIR, which determine from where the dashboard should access CSV files from disk and where to save any outputs, i.e. treatment allocation metrics. The other variables should not need to be changed unless major changes are made to the libratools library, the dashboard itself or different default parameter settings are used when generating CSV files with BioTracker.

N. B. When changing any path files in any configuration files, it is important that you use the forward-slash character (/) instead of a backward-slash. Similarly, if you are using process.py and only wish to process tracks for a certain list of cameras, you will need to edit camera_ids.yaml by commenting out the cameras you don't want to be processed. To do this add the hash symbol (#) before the camera, it's number, ID, colour, and group (see the repo-install-instructions for an example).

libratools modules

All modules
lbt_datasets() Includes utilities to load, manipulate and save trajectory datasets.
lbt_experiment() Contains functions related to various experimental procedures, such as defining and assigning a treatment value.
lbt_impute() Includes methods for detecting and imputing missing values.
lbt_inspect() Includes functions for inspecting a trajectory dataset.
lbt_metrics() Includes performance metrics, pairwise metrics and distance computations to summarize a dataset.
lbt_outlier_detection() Includes methods for detecting point and subsequence outliers.
lbt_utils() Includes various utilities and private functions.

DevExDashboard functions

all functions
main() Load pages.
configure_homepage_display() Configure homepage main display.
configure_homepage_sidebar() Configure primary sidebar options.
configure_homepage_sidebar_further() Configure further settings sidebar for homepage.
configure_table() Stylize and display data and text for section one.
tabulate_metadata_table() Compute treatment times, describe metadata, and tabulate results.
describe_metadata() Return pandas.DataFrame of metadata.
load_data() Check whether to load data via path argument or directly.
load_sample_data() Load sample trajectories.
raise_error() Format page and load text.
visualize_trajectory() Visualize a trajectory as a scatter plot.
visualize_subplots() Visualize each step length, activity rate per time interval, and distribution of turning angles.