FinStressTS is a synthetic benchmark for probabilistic financial stress time-series forecasting. It provides synthetic financial panels with controlled stylized facts, a standardized data-processing pipeline, native implementations of probabilistic forecasting models, and a unified evaluator.
The Python package and CLI are named finprobts for compatibility with existing experiment configs.
- Six synthetic financial data-generating processes, each with five difficulty levels, for 30 synthetic datasets total.
- Canonical loading for wide and long CSV/Parquet financial panels.
- Split-safe preprocessing, chronological train/validation/test splits, and rolling-window forecasting tasks.
- A shared probabilistic forecasting interface with samples shaped
[num_windows, num_samples, prediction_length, num_assets]. - Native benchmark implementations of
naive,deepar,deepvar,tempflow,timegrad,timemcl,ratd, andtsflow. - Evaluation metrics including point errors, quantile loss, empirical coverage, sample CRPS approximations, and finance-oriented diagnostics.
The deep models are native FinStressTS implementations aligned with the cited papers and public architecture references. They are intended for a consistent benchmark interface. For exact reproduction of an original method, please consult the original paper and official implementation listed in each model's finprobts/models/<model>/REFERENCE.md.
git clone <repo-url>
cd FinStressTS
python -m venv .venv
.\.venv\Scripts\activate
python -m pip install --upgrade pip
python -m pip install -e .[dev,torch]Use .[dev] if you only need the NumPy/Pandas baseline and tests. Use .[parquet] if you want Parquet I/O.
The full synthetic suite is defined in finprobts/synthetic/presets.py and generated by finprobts/synthetic/generator.py. It contains:
| Case | Stylized fact |
|---|---|
case1_garch |
factor/idiosyncratic volatility clustering |
case2_har |
multi-scale HAR volatility memory |
case3_heavy_tail |
heavy tails and rare outlier contamination |
case4_regime |
market-wide block Markov regimes |
case5_hawkes |
self-exciting market-wide jumps |
case6_zip_panel |
zero-inflated Poisson jumps with panel exposure |
Generate all six cases and all five levels:
finprobts generate-synthetic `
--case all `
--levels 1,2,3,4,5 `
--out-dir data/simulated `
--base-seed 123 `
--T 20000 `
--n-firms 50 `
--formats csvThis writes one dataset and one metadata file per case/level, plus:
data/simulated/manifest.json
Each .meta.json records the resolved simulator parameters, seed, summary statistics, case, and difficulty level. The effective seed is base_seed + level; if --base-seed is omitted, each case uses its preset default.
To generate only part of the suite:
finprobts generate-synthetic --case case1_garch,case2_har --levels 1,3,5 --out-dir data/simulatedGenerated datasets and outputs are intentionally ignored by git.
Input data is loaded into a canonical FinancialDataset with values shaped [time, asset].
Wide CSV example:
date,asset_a,asset_b,asset_c
2020-01-01,0.001,0.0004,-0.0002
Long CSV example:
date,asset_id,target
2020-01-01,asset_a,0.001
2020-01-01,asset_b,0.0004
Experiment configs control processing:
dataset:
name: custom_csv
path: data/example/example_returns_wide.csv
format: wide
date_column: date
target_columns: null
preprocessing:
value_kind: returns
price_to_log_return: false
missing_method: ffill
standardize: true
split:
train_size: 0.6
val_size: 0.2
task:
context_length: 96
prediction_length: 1
stride: 1To only load and preprocess a dataset into an NPZ file:
finprobts prepare-data --config configs/example_crypto_naive.yaml --output outputs/prepared_data.npzThis command is a data-loading sanity check. Full experiments perform train-fitted standardization and rolling-window construction inside finprobts run or finprobts run-synthetic-suite.
For deep models, finprobts/models/torch_utils.py converts each rolling window to tensors such as past_target, future_target, observed-value masks, time features, target-dimension indicators, and window indices.
Models implement the benchmark contract in finprobts/models/base.py:
class BaseProbForecastModel:
def fit(self, train_data, val_data=None) -> None: ...
def predict(self, test_data, num_samples: int) -> ForecastResult: ...
def save(self, path: str) -> None: ...
@classmethod
def load(cls, path: str): ...predict must return ForecastResult with:
samples: [num_windows, num_samples, prediction_length, num_assets]
y_true: [num_windows, prediction_length, num_assets]
Recommended steps:
- Create
finprobts/models/<your_model>/model.py. - Add
finprobts/models/<your_model>/__init__.py. - Register the model in
finprobts/models/registry.py. - Add a default config under
configs/model/<your_model>.yaml.
Minimal config shape:
model:
name: your_model
params:
batch_size: 128
max_epochs: 10
learning_rate: 1.0e-3
device: autoKeep adapters thin: model-specific code may transform the canonical rolling windows internally, but it should preserve the shared input/output contract so evaluation remains comparable across methods.
Run a single experiment:
finprobts run --config configs/example_crypto_deepvar.yamlThe output directory is run.output_dir / run.run_id and contains:
config.yaml
forecast_samples.npz
forecast_metrics.json
Evaluate saved forecasts again:
finprobts evaluate --run-dir outputs/example_crypto_deepvarRun the full generated synthetic suite for selected models:
finprobts run-synthetic-suite `
--manifest data/simulated/manifest.json `
--models deepvar,tempflow,timegrad `
--output-dir outputs/synthetic_suite `
--context-length 96 `
--prediction-length 1 `
--num-samples 100 `
--device autoUse --models all to run every registered model. For a quick config-generation smoke test without training:
finprobts run-synthetic-suite --models deepvar --dry-runCommon runtime overrides:
finprobts run-synthetic-suite --models deepvar --max-epochs 1 --batch-size 32 --num-samples 20 --device cpuSynthetic-suite outputs include resolved configs and per-model result tables:
configs/generated/synthetic_suite/<model>/*.yaml
outputs/synthetic_suite/results_<model>.csv
outputs/synthetic_suite/results_<model>.json
outputs/synthetic_suite/synthetic_suite_summary.json
The paper experiments on the 30 synthetic datasets were run with the original authors' implementations where publicly available, connected to the FinStressTS data and evaluation protocol through thin adapters. Those upstream codebases differ in licenses, dependency stacks, supported Python versions, and experiment runners, so this release does not vendor all of them directly.
This repository instead prioritizes one reproducible benchmark interface. The bundled deep models are native or adapted FinStressTS implementations aligned with the cited papers and public architecture references. They preserve the shared data, training, prediction, and evaluation contract, but may use native PyTorch modules, local data adapters, or dependency-light substitutes. Examples include native DeepAR/DeepVAR-style recurrent probabilistic forecasters, TimeGrad/TempFlow-style generative models, RATD-style retrieval-augmented diffusion, TimeMCL-style WTA hypotheses, and TSFlow with a native S4-style state-space backend.
The exact references and deviations are documented in:
finprobts/models/deepar/REFERENCE.md
finprobts/models/deepvar/REFERENCE.md
finprobts/models/tempflow/REFERENCE.md
finprobts/models/timegrad/REFERENCE.md
finprobts/models/timemcl/REFERENCE.md
finprobts/models/ratd/REFERENCE.md
finprobts/models/tsflow/REFERENCE.md
pytestFor the torch model smoke tests:
pytest tests/test_torch_models.py -q