This repository contains the code accompanying our paper. It is organised into four stages: data preprocessing, HART pre-training, HART fine-tuning (LOSO evaluation), and baseline benchmarking.
anonymous_github/
├── environment.yml
├── data_preprocessing/
│ ├── whar_preprocessor.py # WHAR dataset preprocessing pipeline
│ └── Resampling.py # Polyphase FIR resampling of all HAR datasets to 30 Hz
├── pretraining_HART/
│ ├── config_clf.json # HART model architecture config
│ └── pretrain.py # Distributed SSL pre-training script
├── finetuning_HART/
│ ├── pretrained_checkpoint/ # Provided best pre-trained HART model
│ │ ├── config.json
│ │ └── model.safetensors
│ ├── finetune.py # LOSO fine-tuning & evaluation script for HART
│ ├── HAR_dataloader.py # HAR dataset windowing & loading utilities
│ └── run_all.sh # Convenience script — runs all datasets in parallel on 2 GPUs
└── benchmarking/ # Baseline models evaluation
├── HAR_models/ # Deep learning HAR baselines (HARNet, TinyHAR)
├── TSFMs/ # Time Series Foundational Models (MOMENT, UniTS, TSPulse)
└── ml_models/ # Scikit-learn based machine learning models
conda env create -f environment.yml
conda activate <env_name>All HAR datasets used in this work are sourced from the WHAR-datasets repository. The datasets are first downloaded and preprocessed as per the WHAR pipeline. Then, they are resampled at the session level to a unified 30 Hz before pre-training or fine-tuning.
python data_preprocessing/whar_preprocessor.pyThis runs the full whar_datasets preprocessing pipeline (segmentation, normalisation, LOSO splits) over all supported datasets. Edit whar_preprocessor.py to restrict processing to specific datasets via WHARDatasetID. Refer to WHAR Repo for all datasets and preprocessing: WHAR-datasets repository
python data_preprocessing/Resampling.py \
--input_dir /path/to/original_datasets \
--output_dir /path/to/output_30hz \
--target_fs 30| Argument | Description |
|---|---|
--input_dir |
Directory containing the original raw CSV datasets |
--output_dir |
Where to save the resampled CSVs |
--target_fs |
Target sampling frequency (default: 30) |
Pre-training uses multi-GPU distributed training via torchrun.
Before running, open pretraining_HART/pretrain.py and set the following paths:
args.data_root_path = "/path/to/your/pretrain_dataset" # folder of 30 Hz CSVs
args.save_dir = "./tspulse_pretrained" # where checkpoints are savedThen launch with:
torchrun --nproc_per_node=<NUM_GPUS> pretraining_HART/pretrain.pyExample (2 GPUs):
torchrun --nproc_per_node=2 pretraining_HART/pretrain.pyThe model architecture is controlled by pretraining_HART/config_clf.json. Key training hyperparameters (context length, patch size, batch size, epochs) are set directly in main() inside pretrain.py.
Fine-tuning evaluates the pre-trained model on each HAR dataset using Leave-One-Subject-Out (LOSO) cross-validation.
Note on Checkpoints: To easily reproduce the results from our paper, we provide our best pre-trained checkpoint in
finetuning_HART/pretrained_checkpoint. By default, the scripts below use this checkpoint so you can skip the computationally expensive pre-training stage. However, if you run the pre-training stage yourself (Stage 3), you can simply replace this path with the directory of your newly generated checkpoint.
python finetuning_HART/finetune.py \
--dataset_name capture24 \
--checkpoint_path finetuning_HART/pretrained_checkpoint \
--device cuda:0 \
--context_length 120 \
--hop_length 30 \
--epochs 100 \
--patience 15 \
--test_set_dir /path/to/test_set \
--output_dir results| Argument | Description |
|---|---|
--dataset_name |
Name of the HAR dataset to evaluate |
--checkpoint_path |
Path to the pre-trained HART checkpoint directory |
--device |
cpu, cuda, or cuda:0 |
--context_length |
Window length fed to the model (default: 512) |
--hop_length |
Stride between windows — 30 gives 75% overlap with context_length=120 |
--epochs |
Max fine-tuning epochs (default: 100) |
--patience |
Early stopping patience (default: 15) |
--test_set_dir |
Folder containing per-dataset HAR CSV files |
--output_dir |
Directory where result CSVs are written |
run_all.sh manages a two-GPU job queue — both GPUs stay busy at all times and a free GPU immediately picks up the next dataset.
bash finetuning_HART/run_all.sh \
finetuning_HART/pretrained_checkpoint \
cuda:0 \
100 \
30 \
<conda_env_name> \
/path/to/test_set \
120Positional arguments (all optional — defaults shown in the script):
| Position | Argument | Default |
|---|---|---|
| 1 | Checkpoint path | pretrained_checkpoint |
| 2 | Device (kept for compatibility) | cuda:0 |
| 3 | Epochs | 100 |
| 4 | Hop length (window stride) | 30 |
| 5 | Conda environment name | tspulse_env |
| 6 | Test set directory | test_set/ next to script |
| 7 | Context length | 120 |
Results are written to results/<checkpoint_name>_ep<N>_ctx<L>/ and per-dataset logs to results/logs_*/. To monitor a running job live:
tail -f results/logs_<checkpoint>_ep100_120/<dataset>.logThe baseline models used for benchmarking against our approach are included in the benchmarking/ folder (formerly named Final_HAR_Code/). The benchmarking setup applies the exact same LOSO validation and evaluation framework as used for HART.
The setup is split into three main categories:
Contains traditional time series classification models via sktime and scikit-learn, including:
-
Dummy Classifier (uniform baseline)
-
Rocket / MiniRocket algorithms
-
Arsenal
-
KNN Time Series Classifier (with DTW distance)
-
Run
all_sktime.shto execute the evaluation across all HAR datasets (usesml_sktime_person.pyorml_sktime.py). -
Uses
har_dataset_loader.pyfor uniform data loading.
Contains established deep learning baselines tailored for HAR:
- HARNet: Run via
all_harnet.sh(which invokesharnet_har.py). - TinyHAR: Included as an interactive Jupyter Notebook (
TinyHAR.ipynb).
Contains generic state-of-the-art foundation models for time series adapted for the HAR task:
- MOMENT: Run via
all_moment.sh. - UniTS: Run via
all_units.sh. - TSPulse: Baseline model implementation.
Each folder includes a har_dataset_loader.py tool adapted for that specific baseline to ensure inputs conform to the expected format.