MuViS: Multimodal Virtual Sensing Benchmark

This repository contains the MuViS codebase - dataset preprocessing, unified time-series I/O, configuration driven experiment runners and logging utilities for reproducible benchmarking across datasets. The corresponding paper is available at: <PLACEHOLDER>

Abstract: Virtual sensing infers hard-to-measure quantities from accessible measurements and is central to perception and control in physical systems. Despite rapid progress from first-principle and hybrid models to modern data-driven methods research remains siloed, leaving no established default approach that transfers across processes, modalities, and sensing configurations. We introduce \textsc{MuViS}, a domain-agnostic benchmarking suite for multimodal virtual sensing that consolidates diverse datasets into a unified interface for standardized preprocessing and evaluation. Using this framework, we benchmark representative approaches spanning gradient-boosted decision trees and deep neural network (NN) architectures, and quantify how close current methods come to a broadly useful default. \textsc{MuViS} is released as an open-source, extensible platform for reproducible comparison and future integration of new datasets and model classes.

Overview

Virtual sensing aims to infer hard-to-measure quantities from accessible primary measurements and is central to perceiving and controlling physical systems. Despite rapid progress, research is typically siloed in narrow application domains, limiting insight into how well approaches generalize.

MuViS is a comprehensive, domain-agnostic benchmarking suite for multimodal virtual sensing. It addresses the heterogeneity in file formats, split definitions, and sequence lengths by providing a framework that:

Standardizes data preprocessing: Converts raw datasets from data/raw/<dataset>/ into a consistent .ts format in data/processed/<dataset>/ with predefined train-test splits (train.ts, test.ts), uniform sample shapes (X: N×T×C, y: N), and consistent missing value treatment.
Enables reproducible experiments: Provides config-driven training pipelines to systematically benchmark neural networks and tree-based models across multiple datasets.

Model-specific preprocessing operations (e.g., standardization, sequence flattening) are performed within training scripts to maintain flexibility in model architecture design.

Datasets

MuViS aggregates six benchmark datasets spanning environmental monitoring, health sensing, vehicle dynamics, tire thermodynamics, chemical process monitoring, and electrochemical energy systems.

Dataset	Domain	Target	Inputs	Features ($D$)	Steps ($T$)
Beijing Air Quality	Environmental	PM2.5 / PM10	Pollutants & Meteorology	9	24
Revs Program	Automotive	Lateral Velocity ($v_y$)	Driver inputs, IMU, Wheel speeds	12	20
Tire Temperature	Automotive	Tire Temp ($t_{tire}$)	Vehicle motion, Control inputs	11	50
Tennessee Eastman	Industrial	Chemical Conc.	Process vars & Manipulated vars	33	20
Panasonic 18650PF	Energy	State-of-Charge (SoC)	Voltage, Current, Temp	7	120
PPG-DaLiA	Health	Heart Rate (BPM)	BVP, EDA, Temp, Accel	6	512

Baselines

We benchmark representative learning approaches spanning gradient-boosted decision trees and deep neural network (NN) architectures:

Tree-based: XGBoost, CatBoost
Neural Networks: MLP, ResNet1D, LSTM, Transformer

Prerequisites

Python >=3.13
Set up the environment and install MuViS.

# Create and activate environment
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies and the package
pip install -r requirements.txt
pip install -e .

Download the datasets and place them in the data/raw/ folder as described below:
- BeijingPM10Quality
- BeijingPM25Quality
- Panasonic18650PFData Note: Panasonic_NCR18650PF_Data_Normalized.zip
- PPGDalia
- REVS/2013_Monterey_Motorsports_Reunion
- REVS/2013_Targa_Sixty_Six
- REVS/2014_Targa_Sixty_Six
- TennesseeEastmanProcess
- VehicleDynamicsDataset Note: Store November and October sessions in separate folders.
Final directory structure should look like this:

MuViS/
├── ...
├── data/
│   ├── raw/                             
│   │   ├── BeijingPM10Quality/
│   │   │   ├── BeijingPM10Quality_TEST.ts
│   │   │   └── BeijingPM10Quality_TRAIN.ts
│   │   ├── BeijingPM25Quality/
│   │   │   ├── BeijingPM25Quality_TEST.ts
│   │   │   └── BeijingPM25Quality_TRAIN.ts
│   │   ├── Panasonic18650PFData/
│   │   │   ├── Normalization/
│   │   │   ├── Test/
│   │   │   ├── Train/
│   │   │   └── Validation/
│   │   ├── PPGDalia/
│   │   │   ├── PPG_FieldStudy/
│   │   │   ├── data.zip
│   │   │   └── readme.pdf
│   │   ├── REVS/
│   │   │   ├── 2013_Monterey_Motorsports_Reunion/
│   │   │   │   └── *.csv
│   │   │   ├── 2013_Targa_Sixty_Six/
│   │   │   │   └── *.csv
│   │   │   └── 2014_Targa_Sixty_Six/
│   │   │       └── *.csv
│   │   ├── TennesseeEastmanProcess/
│   │   │   ├── TEP_FaultFree_Testing.RData
│   │   │   └── TEP_FaultFree_Training.RData
│   │   ├── VehicleDynamicsDataset/
│   │   │   ├── Nov2023/
│   │   │   │   └── *.csv
│   │   │   └── Oct2023/
│   │   │       └── *.csv
│   │   └── ...
│   └── processed/                       
└── ...

Preprocessing

To preprocess the raw datasets into the standardized .ts format, run:

python src/muvis/data_utils/preprocess.py

Run the training

Single experiment

Execute the following command to run a single experiment:

python main.py single --runconf configs/<DATASET_NAME>/<MODEL_NAME>.yaml

Multiple Experiments

To run multiple experiments and save the results to a CSV file, use the command below:

python main.py batch \
  --configs \
    configs/<DATASET_NAME_1>/<MODEL_NAME_1>.yaml \
    configs/<DATASET_NAME_2>/<MODEL_NAME_2>.yaml \
  --metric test_rmse \
  --output experiment_results.csv

Reproduce Results

Our evaluation demonstrates that while gradient-boosted ensembles remain highly competitive, the landscape is nuanced, with specific NN architectures excelling in distinct domains. No single architecture attains a statistically superior edge across the entire benchmark, underscoring the need for specialized architectures in virtual sensing.

To reproduce the results from the paper you can run:

bash run.sh

Contributing

Add your own dataset

Each dataset must ultimately produce two files:

train.ts
test.ts

Step 1. Place Raw Data

Copy your raw dataset files into: data/raw/<YourDatasetName>/

Note: MuViS does not impose any restrictions on the raw data format.

Step 2: Add a Dataset Converter

MuViS handles different raw dataset formats by converting them into a common .ts representation using dataset-specific converters. All converters live in:

src/muvis/data_utils/converters.py

Each dataset is implemented as a subclass of BaseConverter. To add a new dataset, create a new class that inherits from BaseConverter and implement the load_raw() method.

At a minimum, every converter must:

Read raw files from data/raw/<YourDataset>/
Split data into train and test sequences
Generate fixed-length sliding windows
Return data in MuViS’s internal case format

Step 3: Run Preprocessing

Once the converter is implemented, add it to the command-line interface at src/muvis/data_utils/preprocess.py and run:

python src/muvis/data_utils/preprocess.py --dataset <YourDatasetName>

Add your own model

MuViS supports both neural and tree-based models.

Step 1: Implement the Model

Model implementations are located in the following files:

Neural networks: Define your model architecture in src/muvis/utils/architectures.py
Tree-based models: Add your model to the model dictionary in src/muvis/train/run_tree_experiments.py. Any model following the scikit-learn .fit() convention is supported.

Step 2: Create a Configuration File

Each experiment is controlled via a YAML configuration file. Create configs/<YourDatasetName>/<YourModelName>.yaml specifying the model type, hyperparameters, and training settings.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
configs		configs
data		data
notebooks		notebooks
src/muvis		src/muvis
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MuViS: Multimodal Virtual Sensing Benchmark

Overview

Datasets

Baselines

Prerequisites

Preprocessing

Run the training

Single experiment

Multiple Experiments

Reproduce Results

Contributing

Add your own dataset

Step 1. Place Raw Data

Step 2: Add a Dataset Converter

Step 3: Run Preprocessing

Add your own model

Step 1: Implement the Model

Step 2: Create a Configuration File

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MuViS: Multimodal Virtual Sensing Benchmark

Overview

Datasets

Baselines

Prerequisites

Preprocessing

Run the training

Single experiment

Multiple Experiments

Reproduce Results

Contributing

Add your own dataset

Step 1. Place Raw Data

Step 2: Add a Dataset Converter

Step 3: Run Preprocessing

Add your own model

Step 1: Implement the Model

Step 2: Create a Configuration File

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages