This repository contains the MuViS codebase - dataset preprocessing, unified time-series I/O, configuration driven experiment runners and logging utilities for reproducible benchmarking across datasets. The corresponding paper is available at: <PLACEHOLDER>
Abstract: Virtual sensing infers hard-to-measure quantities from accessible measurements and is central to perception and control in physical systems. Despite rapid progress from first-principle and hybrid models to modern data-driven methods research remains siloed, leaving no established default approach that transfers across processes, modalities, and sensing configurations. We introduce \textsc{MuViS}, a domain-agnostic benchmarking suite for multimodal virtual sensing that consolidates diverse datasets into a unified interface for standardized preprocessing and evaluation. Using this framework, we benchmark representative approaches spanning gradient-boosted decision trees and deep neural network (NN) architectures, and quantify how close current methods come to a broadly useful default. \textsc{MuViS} is released as an open-source, extensible platform for reproducible comparison and future integration of new datasets and model classes.
Virtual sensing aims to infer hard-to-measure quantities from accessible primary measurements and is central to perceiving and controlling physical systems. Despite rapid progress, research is typically siloed in narrow application domains, limiting insight into how well approaches generalize.
MuViS is a comprehensive, domain-agnostic benchmarking suite for multimodal virtual sensing. It addresses the heterogeneity in file formats, split definitions, and sequence lengths by providing a framework that:
- Standardizes data preprocessing: Converts raw datasets from
data/raw/<dataset>/into a consistent.tsformat indata/processed/<dataset>/with predefined train-test splits (train.ts,test.ts), uniform sample shapes (X: N×T×C, y: N), and consistent missing value treatment. - Enables reproducible experiments: Provides config-driven training pipelines to systematically benchmark neural networks and tree-based models across multiple datasets.
Model-specific preprocessing operations (e.g., standardization, sequence flattening) are performed within training scripts to maintain flexibility in model architecture design.
MuViS aggregates six benchmark datasets spanning environmental monitoring, health sensing, vehicle dynamics, tire thermodynamics, chemical process monitoring, and electrochemical energy systems.
| Dataset | Domain | Target | Inputs | Features ( |
Steps ( |
|---|---|---|---|---|---|
| Beijing Air Quality | Environmental | PM2.5 / PM10 | Pollutants & Meteorology | 9 | 24 |
| Revs Program | Automotive | Lateral Velocity ( |
Driver inputs, IMU, Wheel speeds | 12 | 20 |
| Tire Temperature | Automotive | Tire Temp ( |
Vehicle motion, Control inputs | 11 | 50 |
| Tennessee Eastman | Industrial | Chemical Conc. | Process vars & Manipulated vars | 33 | 20 |
| Panasonic 18650PF | Energy | State-of-Charge (SoC) | Voltage, Current, Temp | 7 | 120 |
| PPG-DaLiA | Health | Heart Rate (BPM) | BVP, EDA, Temp, Accel | 6 | 512 |
We benchmark representative learning approaches spanning gradient-boosted decision trees and deep neural network (NN) architectures:
- Tree-based: XGBoost, CatBoost
- Neural Networks: MLP, ResNet1D, LSTM, Transformer
- Python >=3.13
- Set up the environment and install MuViS.
# Create and activate environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies and the package
pip install -r requirements.txt
pip install -e .-
Download the datasets and place them in the
data/raw/folder as described below:- BeijingPM10Quality
- BeijingPM25Quality
- Panasonic18650PFData Note: Panasonic_NCR18650PF_Data_Normalized.zip
- PPGDalia
- REVS/2013_Monterey_Motorsports_Reunion
- REVS/2013_Targa_Sixty_Six
- REVS/2014_Targa_Sixty_Six
- TennesseeEastmanProcess
- VehicleDynamicsDataset Note: Store November and October sessions in separate folders.
-
Final directory structure should look like this:
MuViS/
├── ...
├── data/
│ ├── raw/
│ │ ├── BeijingPM10Quality/
│ │ │ ├── BeijingPM10Quality_TEST.ts
│ │ │ └── BeijingPM10Quality_TRAIN.ts
│ │ ├── BeijingPM25Quality/
│ │ │ ├── BeijingPM25Quality_TEST.ts
│ │ │ └── BeijingPM25Quality_TRAIN.ts
│ │ ├── Panasonic18650PFData/
│ │ │ ├── Normalization/
│ │ │ ├── Test/
│ │ │ ├── Train/
│ │ │ └── Validation/
│ │ ├── PPGDalia/
│ │ │ ├── PPG_FieldStudy/
│ │ │ ├── data.zip
│ │ │ └── readme.pdf
│ │ ├── REVS/
│ │ │ ├── 2013_Monterey_Motorsports_Reunion/
│ │ │ │ └── *.csv
│ │ │ ├── 2013_Targa_Sixty_Six/
│ │ │ │ └── *.csv
│ │ │ └── 2014_Targa_Sixty_Six/
│ │ │ └── *.csv
│ │ ├── TennesseeEastmanProcess/
│ │ │ ├── TEP_FaultFree_Testing.RData
│ │ │ └── TEP_FaultFree_Training.RData
│ │ ├── VehicleDynamicsDataset/
│ │ │ ├── Nov2023/
│ │ │ │ └── *.csv
│ │ │ └── Oct2023/
│ │ │ └── *.csv
│ │ └── ...
│ └── processed/
└── ...
To preprocess the raw datasets into the standardized .ts format, run:
python src/muvis/data_utils/preprocess.pyExecute the following command to run a single experiment:
python main.py single --runconf configs/<DATASET_NAME>/<MODEL_NAME>.yamlTo run multiple experiments and save the results to a CSV file, use the command below:
python main.py batch \
--configs \
configs/<DATASET_NAME_1>/<MODEL_NAME_1>.yaml \
configs/<DATASET_NAME_2>/<MODEL_NAME_2>.yaml \
--metric test_rmse \
--output experiment_results.csvOur evaluation demonstrates that while gradient-boosted ensembles remain highly competitive, the landscape is nuanced, with specific NN architectures excelling in distinct domains. No single architecture attains a statistically superior edge across the entire benchmark, underscoring the need for specialized architectures in virtual sensing.
To reproduce the results from the paper you can run:
bash run.shEach dataset must ultimately produce two files:
train.tstest.ts
Copy your raw dataset files into:
data/raw/<YourDatasetName>/
Note: MuViS does not impose any restrictions on the raw data format.
MuViS handles different raw dataset formats by converting them into a common
.ts representation using dataset-specific converters.
All converters live in:
src/muvis/data_utils/converters.py
Each dataset is implemented as a subclass of BaseConverter. To add a new dataset, create a new class that inherits from BaseConverter and implement the load_raw() method.
At a minimum, every converter must:
- Read raw files from
data/raw/<YourDataset>/ - Split data into train and test sequences
- Generate fixed-length sliding windows
- Return data in MuViS’s internal case format
Once the converter is implemented, add it to the command-line interface at src/muvis/data_utils/preprocess.py and run:
python src/muvis/data_utils/preprocess.py --dataset <YourDatasetName> MuViS supports both neural and tree-based models.
Model implementations are located in the following files:
- Neural networks: Define your model architecture in src/muvis/utils/architectures.py
- Tree-based models: Add your model to the model dictionary in src/muvis/train/run_tree_experiments.py. Any model following the scikit-learn
.fit()convention is supported.
Each experiment is controlled via a YAML configuration file. Create configs/<YourDatasetName>/<YourModelName>.yaml specifying the model type, hyperparameters, and training settings.

