DAMBench is a minimalist, reproducible benchmark for evaluating learning-based data assimilation methods on reanalysis-like gridded data (e.g., ERA5). It standardizes data download, storage layout, preprocessing, training, and inference so you can compare models apples-to-apples.
- 📦 Turn-key environment via
environment.yaml - ⬇️ Scripted data retrieval with configurable years, spatial resolution, and variable types (single-/multi-level)
- 🗂️ Deterministic on-disk layout for fast I/O
- 🧪 Reference training & inference entry points (
train.py,inference.py) - 🧰 Drop-in model switch via
--modelflag (e.g.,FNP)
Create the conda environment from the provided spec:
conda env create -f environment.yaml
conda activate da-benchUse download.py to fetch and organize the dataset.
python download.pyYou can customize the download in the script:
- Years (e.g.,
1979–1980,2000–2020) - Resolution (e.g.,
240x121,1440x721) - Data types:
- Single-level:
t2m,u10,v10,msl, … - Multi-level:
z,t,q,u,v, with pressure levels (e.g., 50, 100, …, 1000 hPa)
- Single-level:
Tip: Open
download.pyand edit the config block (years / resolution / variables) to match your use case.
After download, files are organized by year → day → variable → (level) → time step:
DATA_ROOT/
└── 2000/
└── 2000-01-01/
├── msl/
│ ├── T0.npy
│ ├── T6.npy
│ ├── T12.npy
│ └── T18.npy
└── q/
├── 50/
│ ├── T0.npy
│ ├── T6.npy
│ ├── T12.npy
│ └── T18.npy
├── 100/
│ └── ...
└── ...
Train a model (example: FNP):
python train.py --model FNPRun inference with the trained checkpoint:
python inference.py --model FNP- Variables
- Single-level:
t2m,u10,v10,msl - Multi-level:
z,t,q,u,vwith pressure levels
- Single-level:
- Time steps: by default
T0,T6,T12,T18(6-hourly) - File format: each
.npyholds a single 2D grid for that variable/level/time
We visualize the assimilation result from the baseline models of the global RMSE of t850. The following results is the global RMSE between each baseline and ground truth in 2024.09.16, in which day there is typhoon in east China sea. This can also show how the extreme weather events affect the quality of data assimilation. As can be seen, the RMSE in the typhoon position is obviously higher than the neighborhood.
| Model | SpecDiv ↓ | MSE ↓ | MAE ↓ | z500 RMSE ↓ | t850 RMSE ↓ | t2m RMSE ↓ | u10 RMSE ↓ | v10 RMSE ↓ | u500 RMSE ↓ | v500 RMSE ↓ | q700 RMSE ↓ (×10⁻⁴) | Imp ↑ |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Background | 0.153 | 2.88 | 8.61 | 45.455 | 0.7200 | 0.7790 | 0.9336 | 0.9645 | 1.7278 | 1.7535 | 6.7220 | — |
| Adas | 0.116 | 2.31 | 7.65 | 30.100 | 0.6750 | 0.7350 | 0.8400 | 0.8600 | 1.4950 | 1.4900 | 6.5400 | — |
| Multi-Adas | 0.060 | 2.20 | 7.30 | 27.800 | 0.6700 | 0.6900 | 0.7400 | 0.7400 | 1.4000 | 1.4200 | 6.3500 | 4.35% |
| ConvCNP | 0.125 | 2.49 | 7.98 | 31.253 | 0.6944 | 0.7662 | 0.8334 | 0.8553 | 1.5770 | 1.5876 | 6.5717 | — |
| Multi-ConvCNP | 0.123 | 2.44 | 7.82 | 30.628 | 0.6805 | 0.7510 | 0.8170 | 0.8380 | 1.5750 | 1.5560 | 6.5400 | 2.01% |
| FNP | 0.063 | 2.30 | 7.54 | 28.500 | 0.6985 | 0.7100 | 0.7650 | 0.7650 | 1.4350 | 1.4600 | 6.4698 | — |
| Multi-FNP | 0.059 | 2.16 | 7.09 | 26.790 | 0.6566 | 0.6674 | 0.7191 | 0.7191 | 1.3489 | 1.3724 | 6.0800 | 6.09% |
| VAE-VAR | 0.052 | 2.31 | 7.60 | 27.000 | 0.6970 | 0.7050 | 0.7560 | 0.7770 | 1.4500 | 1.4500 | 6.4700 | — |
| Multi-VAE-VAR | 0.048 | 2.13 | 6.99 | 24.840 | 0.6412 | 0.6486 | 0.6955 | 0.7148 | 1.3340 | 1.3340 | 5.9500 | 7.79% |
| SDA | 0.117 | 2.65 | 8.02 | 38.000 | 0.7100 | 0.7500 | 0.8800 | 0.9100 | 1.6500 | 1.7000 | 6.6100 | — |
| SLAM | 0.091 | 2.55 | 7.94 | 32.500 | 0.7020 | 0.7300 | 0.8000 | 0.7800 | 1.5000 | 1.4700 | 6.5000 | 3.77% |
| DBF | 1.48 | 2.79 | 8.42 | 40.25 | 0.713 | 0.765 | 0.923 | 0.943 | 1.632 | 1.645 | 6.654 | |
| Multi-DBF | 1.41 | 2.73 | 8.27 | 39.11 | 0.708 | 0.762 | 0.912 | 0.931 | 1.613 | 1.629 | 6.642 | |
| PhyDA | 0.031 | 2.28 | 7.53 | 26.866 | 0.668 | 0.703 | 0.755 | 0.764 | 1.425 | 1.445 | 6.469 | |
| Multi-PhyDA | 0.031 | 2.27 | 7.51 | 26.801 | 0.668 | 0.702 | 0.756 | 0.763 | 1.426 | 1.444 | 6.468 | |
