Spinup-Evaluation provides a command-line tool and Python API for benchmarking the spin-up and restart performance of NEMO/DINO ocean models and machine learning emulators. It supports both single-run and comparison (reference) evaluation, and outputs detailed metrics and difference statistics.
π Full documentation is available on Read the Docs.
- Flexible CLI: Evaluate restart and/or output files, with or without a reference simulation.
- Configurable: Uses a YAML config file (e.g.,
configs/DINO-setup.yaml) to map variables to files. - Comparison Mode: Computes diffs, MAE, and RMSE between a simulation and a reference.
- Modern Output: Results are written as CSV files (one for restart, one for output).
- Test Suite: Integration and regression tests using real and subsampled NetCDF data.
- Extensible: Add new metrics by editing
src/spinup_evaluation/metrics.py.
Requires Python β₯ 3.9
-
Clone the repository
git clone https://github.com/m2lines/nemo-spinup-evaluation.git cd nemo-spinup-evaluation -
Create and activate a virtual environment
python3 -m venv .venv source .venv/bin/activate -
Install the package and dependencies
pip install .For developer installs, include development dependencies and enable pre-commit hooks:
pip install -e .[dev] pre-commit install
Use the test suite to check integrity with real subsampled DINO data. Download the dataset using the script:
./tests/get-data.shRun all tests:
pytest tests/Spinup-Evaluation is designed to assess the quality and stability of ocean model spin-up and restart states, as well as time-averaged outputs. The evaluation workflow is flexible: you can analyse a single simulation, or compare a simulation against a reference (e.g., a previous spin-up, a control run, or a forecast). The tool supports both instantaneous (restart) and time-averaged (output) evaluation modes.
The diagram below (Figure 1) illustrates the typical evaluation procedure. Model output files (restart and/or time-averaged NetCDFs) are loaded and standardized according to the YAML config. Metrics are computed, andβif a reference is providedβdifferences, MAE, and RMSE are calculated.
Spinup-Evaluation is often used alongside spinup-forecast, which automates the generation of machine learned spin-up states for NEMO/DINO models. Together, these tools provide a robust workflow for accelerating ocean spin-up.
Fig 1. Evaluation flow diagram illustrating the coupling to spinup-forecast, but spinup-evaluation can in theory be used to evaluate any ocean model, be it ML data driven, numerical or otherwise..
βββ pyproject.toml Project metadata, dependencies, and build system
βββ README.md Main project documentation (this file)
βββ configs/ Configuration files for variable/file mapping
β βββ DINO-setup.yaml Example YAML config for DINO/NEMO variables
βββ src/
β βββ spinup_evaluation/ Main Python package
β βββ cli.py Command-line interface (CLI) entry point
β βββ loader.py Data loading and preprocessing utilities
β βββ metrics_io.py Output helpers (CSV writing, formatting)
β βββ metrics.py Metric calculation functions
β βββ standardise_inputs.py Input standardization helpers
β βββ utils.py General utilities
βββ tests/ Test suite, test data, and data download scripts
β βββ get-data.sh Script to fetch test data from THREDDS
βββ results/ Default output directory for metrics CSVs
The main entry point is src/spinup_evaluation/cli.py (or the installed spinup-eval script):
python -m spinup_evaluation.cli \
--sim-path <simulation_dir> # Required: path to simulation directory
[--ref-sim-path <reference_sim_dir>] # Optional: path to reference simulation
[--config configs/DINO-setup.yaml] # Optional: YAML config file (default shown)
[--results-dir results] # Optional: output directory (default shown)
[--result-file-prefix metrics_results] # Optional: output file prefix (default shown)
[--mode output|restart|both] # Optional: which metric suite(s) to run| Argument | Description | Default |
|---|---|---|
--sim-path |
Path to the simulation directory | |
--ref-sim-path |
Path to a reference simulation directory to enable comparison | |
--config |
Path to the YAML config file | configs/DINO-setup.yaml |
--results-dir |
Directory to save output CSVs | results |
--result-file-prefix |
Prefix for output files | metrics_results |
--mode |
Which metric suite(s) to run: output, restart, or both |
both |
Spinup-Evaluation supports three modes, controlled by the --mode argument:
- Purpose: Evaluate a single model state (snapshot) from a NEMO/DINO
restart.ncfile. - Input:
restart.nc(andmesh_mask.nc) - Use case: Assess the physical realism or convergence of a single model state, e.g., after a spin-up or forecast.
- Output:
results/metrics_results_restart.csv(or your chosen prefix) - Reference: If
--ref-sim-pathis provided, computes diffs/stats vs. a reference restart file.
- Purpose: Evaluate time-averaged or multi-time-step model output, typically from files like
grid_T_3D.nc,grid_U_3D.nc,grid_V_3D.nc,grid_T_2D.nc. - Input: Grid files as mapped in the config YAML (see below).
- Use case: Assess the mean state or variability over a period, or compare time-averaged fields between runs.
- Output:
results/metrics_results_grid.csv(or your chosen prefix) - Reference: If
--ref-sim-pathis provided, computes diffs/stats vs. a reference output set.
- Purpose: Run both
restartandoutputmetric suites in one command. - Output: Both CSVs as above.
The YAML config (e.g., configs/DINO-setup.yaml) maps variable names to NetCDF files. You can specify variables in two ways:
output_variables:
temperature: grid_T_3D.nc
salinity: grid_T_3D.nc
# ...Behavior: The loader will try to infer the correct variable name (e.g., toce for temperature) from a list of likely candidates for each field.
output_variables:
temperature:
file: grid_T_3D.nc
var: toce
time_from: density # (optional) use time axis from another variable
# ...Behavior: You can explicitly specify the file, the variable name within the file, and optionally a time_from field to use the time axis from another variable.
You can mix and match simple and rich forms in the same config. The loader will handle both.
Note: Support for specifying temporal granularities and resampling (e.g., daily, monthly, seasonal means) is under active development and will be available in a future release.
Example config:
mesh_mask: mesh_mask.nc
restart_files: 'restart'
output_variables:
temperature: grid_T_3D.nc
salinity:
file: grid_T_3D.nc
var: soce
density: grid_T_3D.nc
ssh: grid_T_2D.nc
velocity_u: grid_U_3D.nc
velocity_v: grid_V_3D.ncResults are written as CSV files in the results directory, e.g.:
results/metrics_results_restart.csvresults/metrics_results_grid.csv
Each file contains metric values, and if a reference is provided, also includes:
- Reference metric values (prefixed with
ref_) - Differences (
diff_*)
A separate file with MAE and RMSE statistics is also generated if a reference directory is provided.
Add new metric functions to src/spinup_evaluation/metrics.py and update the metric function lists in cli.py as needed.
This work builds on significant contributions by Etienne Meunier, whose efforts on the Metrics-Ocean repository laid the foundation for several components used here.
