This repository contains a reproducible, Slurm-based pipeline for time series anomaly detection experiments: synthetic data generation, UCR data preparation, VL model training, in-sample and out-of-sample evaluation, and aggregation into CSV summaries.
Main entry point: pipeline.sh (repository root).
Slurm is used because the project was developed and executed on an HPC environment. All intermediate and final outputs are provided so the most expensive stages do not need to be rerun.
The pipeline is organized into the following stages:
(Each step's parameters can be specified in the scripts that are called by the main pipeline.sh)
-
Synthetic data generation (
src/data/generate.slurm)
Writes datasets underall_data/synthetic/(e.g.,freq/,noisy-freq/) with:data.pklcsv_data/figs/
-
UCR data preparation (
all_data/UCR_dataset/)
A Slurm preparation job is referenced frompipeline.sh. -
Annotation processing (
src/annotations/)
Produces/normalizes annotation artifacts used by training/evaluation (including explanation-related analysis). Some annotation steps require an OpenAI API key (OPENAI_API_KEY). -
VL model training (
train_VL/, e.g.,train_clip.slurm)
Uses a local Qwen2.5-VL checkpoint configured inpipeline.sh. -
Evaluation (
eval/scripts/)
Writes JSONL outputs to:eval/json_outsample/(out-of-sample / UCR)eval/json_insample/(in-sample / synthetic)
-
Aggregation (
eval/results/)aggregate_affil.py: affiliation summariescitation_summary.py: explanation/citation-quality + metrics summaries
Reference runs were executed on:
- Slurm: 22.05.10
- OS: RHEL
- GPU: NVIDIA H100
pipeline.sh— orchestrates Slurm jobs + aggregationsrc/data/— synthetic generationsrc/annotations/— annotation processingall_data/synthetic/— generated synthetic datasetsall_data/UCR_dataset/— UCR datasetstrain_VL/— training scriptseval/— evaluation scripts, JSONL outputs, baselines, resultseval/json_insample/eval/json_outsample/eval/baselines/eval/results/
Edit configuration directly in pipeline.sh :
- model paths (e.g.,
BASE_MODEL,ST_MODEL) - project paths (
TRAIN_DIR,EVAL_DIR, etc.) - output locations
If you run annotation processing that requires OpenAI access, set:
export OPENAI_API_KEY="YOUR_KEY_HERE"From the repository root:
sh pipeline.shInstall Python dependencies via:
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt