Skip to content

edstevenson/ThousandWorlds

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ThousandWorlds

ThousandWorlds mascot

License: MIT Dataset DOI Python 3.10+

The search for life beyond Earth depends on the molecular signatures it leaves behind in the atmospheres of its host planet. Correctly interpreting these signatures requires understanding the climates of potential host planets. ThousandWorlds is a benchmark for emulating these exoplanet climates: 1760 simulations across 5 GCMs, 8 planet parameters, and atmospheric variables on a 32 x 64 x 10 latitude-longitude-pressure grid. It includes three nested benchmark subsets, two evaluation protocols, and eight released baseline methods.


ThousandWorlds dataset schematic

Quickstart

pip install -e .
import numpy as np
import thousandworlds as tw

tw.download_dataset(".")
bundle = tw.load("single-complete", data_dir="dataset")

pred = np.broadcast_to(bundle.Y_train.mean(axis=0), bundle.Y_test.shape)
scores = tw.evaluate.rmse(pred, bundle.Y_test, bundle.field_mask_test, bundle.field_names)
scores["per_variable"]

See notebooks/quickstart.ipynb for a short walkthrough.

Installation

pip install -e .              # core: data loading + evaluation
pip install -e '.[models]'    # baseline model dependencies
pip install -e '.[notebooks]' # notebook dependencies

Dataset

The benchmark dataset is hosted on Hugging Face. The repository already contains metadata and directory layout; this fills in the large array files:

python -c "import thousandworlds as tw; tw.download_dataset('.')"

Published baseline prediction results are distributed as separate archives:

python -c "import thousandworlds as tw; tw.download_baselines('.')"

Running Baselines

python -m thousandworlds.run_model train_mean single-complete
python -m thousandworlds.run_model --config results/models/multi-partial/pca_mlp/config.json

Runs write predictions, metrics, and the resolved config under results/models/<subset>/<method>/. The checked-in configs can be rerun with --config; GPLFR configs expect CUDA.

Repo Structure

thousandworlds/
  data.py               # download + load 
  preprocessing.py      # input/output transforms, normalization
  spectral.py           # spectral coefficients <-> gridded fields
  evaluate.py           # RMSE, ACC, energy score, spread-skill ratio, etc.
  run_model.py          # CLI entry point
  make_model_tables.py  # regenerate result tables
  models/               # baseline implementations
  assets/               # precomputed SHT matrix, latitude weights

dataset/                # inputs.csv, subset CSVs, arrays after download
results/                # configs, metrics, published tables
notebooks/              # quickstart, pca_mlp worked example
tests/                  # test suite

Citation

If you use ThousandWorlds, please cite the paper:

@misc{thousandworlds2026,
  title = {ThousandWorlds: A Benchmark for Exoplanet Climate Emulation},
  author = {{ThousandWorlds authors}},
  year = {2026},
  note = {Manuscript in preparation}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors