Skip to content

Address PR97 review comments + RCMIP3 wire-in#102

Merged
benmsanderson merged 23 commits into
feat/fair2-ciceroscmpy2-adapters-and-runmode-nonforkfrom
address/pr97-comments
Jun 5, 2026
Merged

Address PR97 review comments + RCMIP3 wire-in#102
benmsanderson merged 23 commits into
feat/fair2-ciceroscmpy2-adapters-and-runmode-nonforkfrom
address/pr97-comments

Conversation

@benmsanderson
Copy link
Copy Markdown
Collaborator

Draft. Stacked on top of #97 to address @znicholls's review comments, plus a Track A canonical-RCMIP3 wire-in that came out of the back-and-forth on the file-format question. Discussion history (with @maritsandstad) lived on the fork at benmsanderson/openscm-runner#15 — that PR description has the long-form context.

Opening as a draft so reviewers can see the shape before we settle the remaining maintainership wording.


How the commits map to #97 review comments

Initial cleanup pass (commit 9ad17c3) — closes 13 of 21

Coverage fail_under dropped to 0 with a comment per @znicholls's followup.

Naming hygiene (commit 7e2cc7b)

CICERO_TO_FAIR2_SPECIESRCMIP_TO_FAIR2_SPECIES; _read_cicero_conc_file_read_rcmip_conc_file; docstring fixes throughout _concentrations_translator.py and adjacent files. Pure rename — addresses @maritsandstad's point that the adapter doesn't read CICERO-internal format anywhere; the files it consumes are RCMIP-shaped tabular files and the code was mislabelled.

CI matrix marker (commit fe2bd70)

test_modern_adapters.py marked faircicero2_only so it routes through env2 of @maritsandstad's tests-multi-env job (modern-major venv).

Bug fix: rf_sun_file (commit ed94023, bundled with Solar/Volcanic wire-in)

While wiring up the Solar/Volcanic side of the RCMIP3 work, I verified the upstream ciceroscm v2.1.2 API and found that the adapter was passing rf_solar_file in scendata but upstream's InputHandler only recognises rf_sun_file / rf_sun_data. Anything passed as rf_solar_* was silently dropped and the model fell back to the default solar_IPCC.txt. Renamed throughout (cfg key, scendata key, canonical-file resolver, test override).

Note: same latent bug exists in cscm-calibrate's own run_full_rcmip_protocol.py — flagged to @maritsandstad separately. The published RCMIP3 reference runs may have been using the default solar trajectory instead of the per-scenario solar_RCMIP_*_RCMIP3.txt.

Back-report Solar / Volcanic / Land use (commit bb39feb)

Closes @znicholls's _output_variables.py:158-162 comment. When the canonical RCMIP3 path is active, the adapter back-reports per-scenario Solar / Volcanic / Land use ERF trajectories into the output ScmRun. On the legacy path the request logs a warning and skips the back-report.


Canonical RCMIP3 wire-in (Track A — out of strict scope but felt right)

Both Albedo Change|{Land Use,Irrigation} on @znicholls's hard-coded-irrigation comment and the conc-file-format on @maritsandstad's review point at the same underlying issue: the adapter shouldn't be locked to per-bundle file conventions when a canonical published source exists. Wired both adapters to consume the canonical Zenodo 20430630 wide-tables when the new rcmip3_bundle_path cfg key is set. Legacy paths stay as fallback.

Three axes, three commits (c047e1c, c5787b0, 6affa5c) staged for review clarity:

Axis FaIR2 CICEROSCMPY2
Land use + Irrigation forcings _fill_land_use_from_rcmip3 reads per-CMIP7-category Land Use + Irrigation from input_datafiles_generation/data/; historical uses the published per-component breakdown. _build_rf_luc_data_from_rcmip3 populates rf_luc_data in-memory scendata (CICERO has no irrigation channel).
Solar / Volcanic forcings _fill_natural_from_rcmip3 reads per-scenario Effective Radiative Forcing|Natural|{Solar,Volcanic} from rcmip_phase3_forcing_v2.0.0.csv. _build_natural_data_from_rcmip3 populates rf_sun_data + rf_volc_data in-memory; upstream's InputHandler shape-coercion handles annual → multi-axis.
Emissions + Concentrations _rcmip3_to_fair_emissions_df + build_concentrations_df_from_rcmip3 consume rcmip_phase3_{emissions,concentrations}_v2.0.0.csv; existing user-ScmRun overlay still wins per-species. _merge_rcmip3_canonical_into_user unions canonical rows into user's ScmRun upfront; existing _build_hybrid_* overlay code is untouched.

Scenario→CMIP7-category resolver covers the SSP-RCP set with sensible defaults (ssp119/126→VL, ssp245→M, ssp370→H, ssp434→L, ssp460→ML, ssp534-over→LN, ssp585→HL) and accepts per-cfg scenario_to_category overrides. Native scen7-{cat} names pass through directly. historical / historical-cmip6 use the published per-component breakdowns.

canonicalise_rcmip3_variable handles the IAMC-naming differences (CO2 sub-sectors → MAGICC-style, intermediate |F-Gases|HFC|, |PFC| categories stripped) so the existing per-adapter species translation maps work without changes.


What's deliberately not done

Item Status Why
Maintainership wording for CICEROSCMPY2 + FaIRv2 (#3, #6) Deferred @maritsandstad ok with co-maintaining CICEROSCMPY2 ("baseline ok to be (co-)maintainer"). @chrisroadmap not yet pinged for FaIRv2 — would appreciate a tag here if you want to push that thread forward.
Hard-coded irrigation question (#17) Implicitly addressed by Track A The bundled irrigation_forcing_timebounds_cmip7.csv "M" hard-coding is gone on the canonical path. Adapter resolves per scenario, with override.
Concentrations / emissions ScmRun-only path Already exists build_concentrations_df_from_scmrun covers it; the new Track A path is just an additional baseline source.

Test coverage

  • Unit tests for the RCMIP3 reader, scenario→category resolver, canonicaliser, FaIR2 canonical-path builders, and CICEROSCMPY2 back-report helper all live in tests/unit/io_tests/test_rcmip3.py (29 tests).
  • Mini fixture under tests/test-data/rcmip3-mini/ is a 2-scenario × 3-variable slice of the real Zenodo CSVs, runs without external fetch.
  • Integration tests/integration/test_modern_adapters.py covers FaIR2 + CICEROSCMPY2 ED/CD across ssp126/245/370 with a pytest-regressions snapshot; marked faircicero2_only for @maritsandstad's tests-multi-env env2.
  • Legacy paths (existing bundle CSVs, ScmRun-only conc) are untouched — same code, no behaviour change for callers that don't opt into rcmip3_bundle_path.

🤖 Generated with Claude Code

benmsanderson and others added 12 commits June 3, 2026 10:07
Addresses the concrete code/test items from #97
review (still open: the four maintainership questions and the
back-reporting of input forcings in CICEROSCMPY2):

- magicc7: drop the **kwargs __init__ override; move the pymagicc
  availability check and magicc_scenario_setup init into _init_model
  where they belong on the new base.
- _protocol: drop @runtime_checkable; the structural Protocol stays
  for type-hint use. Remove the isinstance(adapter, AdapterLike)
  runtime check in run.py — duck-typing is enough.
- base.py: drop the default NotImplementedError
  from_native_distribution stub from _Adapter; adapters that ship a
  native bundle (FAIR2, CICEROSCMPY2) already define it themselves.
- run.py: raise ValueError when scenarios is None; require
  IamDataFrame-shaped input (no AttributeError fallback); accept
  either a dict (values: cfgs list or pre-built adapter) or an
  iterable of (name, cfgs) tuples and/or adapter instances, freely
  mixable; require model_name on adapter instances rather than
  falling back to type().__name__.
- ciceroscm_wrapper: trim the distutils-removal comment.
- test_modern_adapters: run all 3 mini-bundle members; use the full
  ssp126/ssp245/ssp370 trio; replace the "0 < 2100 GSAT < 7" smoke
  assert with a pytest-regressions snapshot of GSAT + total ERF at
  1850/1900/2000/2025/2050/2100; expand parametrize ids
  ("fair2-emissions-driven", etc).
- delete tests/unit/adapters/test_{ciceroscmpy2,fair2}.py per request.
- README: drop the "Supported climate models and run modes" and
  "Programmatic API" prose sections (live in code / notebooks).
- pyproject: drop fail_under to 0 with a comment to bump back up
  once the adapter rewrite lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Marit's review of #97 corrected the
characterisation of the conc / em files the FaIR2 and
CICEROSCMPY2 adapters consume: they are RCMIP-format tabular
files, not a CICERO-internal format. This is the naming-only
first step of the follow-up to switch both adapters to consuming
the canonical RCMIP wide-tables from Zenodo 20430630.

- _concentrations_translator.py: rename CICERO_TO_FAIR2_SPECIES to
  RCMIP_TO_FAIR2_SPECIES; rename _read_cicero_conc_file to
  _read_rcmip_conc_file; rename local cicero_name to rcmip_name;
  update module + function docstrings to say RCMIP-format
  throughout.
- fair2_adapter.py: update the ValueError message to say
  RCMIP-format ``{scen}_conc_*`` instead of CICERO-format.
- ciceroscmpy2_adapter.py: update docstrings for
  ``historical_em_file`` / ``historical_conc_file`` and the
  _build_hybrid_emissions_data / _build_hybrid_concentrations_data
  helpers to say RCMIP-format.

CICERO-SCM-internal API names (the v1.1.x Fortran adapter, the
_OPENSCM_TO_CICERO_CONC map, _cicero_unit_to_pint, cicero_species
locals in the species-overlay code) are unchanged: those refer
to CICERO-SCM upstream conventions, not file format.

No behaviour change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Shared reader module for the canonical RCMIP Phase 3 wide-table CSVs
(Zenodo 20430630 -- concentrations, emissions, forcing). Both
adapters will consume this in the follow-up commit so we can drop
the per-scenario tab-delimited input path the FaIR2 +
CICEROSCMPY2 adapters currently rely on, and the hard-coded
``_DEFAULT_LAND_USE_SCENARIO = "M"`` choice in FaIR2.

API:

- ``load_rcmip3(path, kind, scenarios=None, variables=None,
  region="World", model=None)`` returns a filtered wide-format
  DataFrame in the canonical layout (Model, Scenario, Region,
  Variable, Unit, Activity_Id, Type, Priority, Mip_Era, Version,
  1750, ..., 2500).
- Convenience wrappers ``load_rcmip3_concentrations`` /
  ``_emissions`` / ``_forcings`` for the three kinds.
- ``load_rcmip3_albedo_categories(path, category)`` reads the
  per-category Land Use + Irrigation series from the bundle's
  upstream ``input_datafiles_generation/data/Forcing_AFOLU_CO2.csv``
  and ``Forcing_irrigation_population_scale.csv`` files. Necessary
  because the canonical forcing CSV only publishes the lumped
  ``Albedo Change`` row for ssp scenarios; per-component breakdown
  for ssp inputs comes from the per-category upstream files.

Path resolution accepts either the bundle root, the
``RCMIP3_input_datafiles`` subdir, or a direct CSV path.

Mini fixture under ``tests/test-data/rcmip3-mini/`` is a 2-scenario
x 3-variable x 6-year slice of the real CSVs (no external fetch
needed for tests). Unit tests at ``tests/unit/io/test_rcmip3.py``
cover the filter surface, error paths, and metadata-column
validation.

No adapter wiring yet; that's the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…orcings

Replaces the hard-coded ``_DEFAULT_LAND_USE_SCENARIO = "M"`` choice
in FaIR2's ``_fill_natural_forcings`` (and the equivalent
``rf_luc_file`` single-trajectory path in CICEROSCMPY2) with a
per-scenario lookup against the canonical RCMIP Phase 3 bundle
(Zenodo 20430630). Opt-in via the new ``rcmip3_bundle_path`` cfg
key on either adapter; legacy paths remain as the back-compat
default.

How it works:

- :func:`openscm_runner.io.rcmip3.resolve_scenario_category` maps
  the user's scenario name to a CMIP7 ScenarioMIP category
  (``VL`` / ``LN`` / ``L`` / ``ML`` / ``M`` / ``H`` / ``HL``). The
  defaults cover the published SSP-RCP scenario set (ssp119 -> VL,
  ssp126 -> VL, ssp245 -> M, ssp370 -> H, ssp434 -> L, ssp460 ->
  ML, ssp534-over -> LN, ssp585 -> HL); callers override via the
  ``scenario_to_category`` cfg key, and ``scen7-{cat}`` native
  CMIP7 names skip the map entirely.
- ``historical`` / ``historical-cmip6`` use the per-component
  ``Effective Radiative Forcing|Anthropogenic|Albedo Change|{Land
  Use,Irrigation}`` rows published directly in
  ``rcmip_phase3_forcing_v2.0.0.csv``; the per-category lookup is
  bypassed.
- Idealised scenarios (``protocol_land_use_forcing ==
  "constant_zero"``) keep zeroing out via the existing
  suppression mask in both adapters.
- Unknown scenarios log a warning and leave Land use / Irrigation
  at zero, with a pointer to ``scenario_to_category``.

FaIR2 surface:

- ``_fill_natural_forcings`` factors the Land use + Irrigation
  block out into two helpers: ``_fill_land_use_from_legacy_bundle``
  (existing behaviour, kept) and ``_fill_land_use_from_rcmip3``
  (new). The choice is driven by the ``rcmip3_bundle_path`` kwarg.
- Both call sites (``_run_native_cfgs`` -> ``_run_one_calibration``
  and ``_run_translated_cfgs``) pass the cfg-level
  ``rcmip3_bundle_path`` / ``scenario_to_category`` through.
- Translated-cfg path validates uniqueness of both kwargs across
  cfgs (same rule as ``emissions_bundle``).

CICEROSCMPY2 surface:

- ``_build_scendata_list`` grew a third LUC branch: when
  ``cfg.get("rcmip3_bundle_path")`` is set, populates the existing
  in-memory ``rf_luc_data`` scendata slot from
  :func:`_build_rf_luc_data_from_rcmip3` instead of writing
  ``rf_luc_file``. Idealised (zeros) and legacy (path) branches
  unchanged.
- ``_build_rf_luc_data_from_rcmip3`` returns a year-indexed
  single-column DataFrame matching the shape upstream's
  ``DistributionRun`` expects. CICEROSCM has no irrigation channel,
  so only Land Use is wired in; Irrigation is FaIR-only.

Solar / Volcanic deferred to a follow-up commit; this PR is
land-use only per Ben + Marit's scope.

Test fixture under ``tests/test-data/rcmip3-mini/`` expanded with
the historical Albedo Change|{Land Use,Irrigation} breakdown rows
so the new code paths are exercised end-to-end without external
fetch. Unit tests in ``tests/unit/io/test_rcmip3.py`` cover the
resolver, the per-category loader, and the historical-breakdown
canonical CSV path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…trix

Marit's tests-multi-env job (added in 0d9f5b1 / merged via #100) runs
two isolated venvs:

* env1 -- fair>=1.6,<2 + ciceroscm>=1.1,<2, selects
  ``-m "faircicero1x_only or magicc"``.
* env2 -- fair>=2,<3 + ciceroscm>=2,<3, selects
  ``-m "(faircicero2_only or magicc) and not faircicero1x_only"``.

Unmarked tests are caught by the main ``tests`` job, which currently
resolves to legacy majors via the poetry lockfile. The FaIRv2 +
CICEROSCMPY2 integration smoke tests need the modern majors, so route
the module through env2 via a module-level ``pytestmark``. The
existing per-test ``fair2_skip`` / ``cicero_skip`` markers stay so the
module still collects (and individually skips) under the main job's
legacy resolution.

The unit tests in tests/unit/io_tests/test_rcmip3.py stay unmarked --
they're pure pandas-based reader tests with no model dependency, so
the main ``tests`` job (which runs everything unmarked) covers them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ings

Extends the ``rcmip3_bundle_path`` cfg key wired in the previous
land-use commit so it also covers Solar + Volcanic effective
radiative forcings, reading per-scenario rows from
``rcmip_phase3_forcing_v2.0.0.csv``
(``Effective Radiative Forcing|Natural|{Solar,Volcanic}``) instead of
broadcasting a single trajectory from the calibration bundle's
``solar_forcing`` / ``volcanic_forcing`` CSVs.

Per-scenario coverage matters here: the canonical CSV's
``historical`` and ``ssp245`` solar rows differ at every overlapping
year (historical uses observations, the SSP-RCP rows use the
SSP-projection extension); the legacy single-trajectory broadcast
loses that distinction.

- ``_fill_natural_forcings``: branches on ``rcmip3_bundle_path``;
  delegates to ``_fill_natural_from_legacy_bundle`` (existing
  single-CSV-broadcast) or the new ``_fill_natural_from_rcmip3``
  (per-scenario CSV lookup with ``natural_mask`` suppression and a
  warning-then-zero fallback for scenarios with no canonical row).
- Docstring reorganised: both Solar/Volcanic and Land use/Irrigation
  now describe a "canonical RCMIP3" and "legacy bundle" path; the
  former two columns of the structure mirror each other now.

CICEROSCMPY2 Solar+Volcanic is **not** in this commit. Two reasons:

1. Upstream ``ciceroscm`` v2 uses ``rf_sun_*`` (not ``rf_solar_*``)
   as the canonical solar key; the openscm-runner adapter currently
   passes ``rf_solar_file`` which upstream silently ignores in favour
   of its default ``solar_IPCC.txt``. This is a pre-existing bug
   that's orthogonal to the RCMIP3 wire-in.
2. Upstream's volcanic input expects a 12-column monthly DataFrame
   (``rf_volc_n_data`` / ``rf_volc_s_data``); the canonical RCMIP3
   CSV is annual. Translating annual -> monthly cleanly needs a
   physical choice (broadcast vs. equally distribute) that I'd
   rather make in a separate commit after talking with Marit.

Test fixture already covers the historical + ssp245 Solar/Volcanic
rows (added in the land-use commit). New unit tests in
tests/unit/io_tests/test_rcmip3.py verify the per-scenario rows
exist for both natural-forcing components and that the historical /
ssp245 trajectories differ at overlapping years.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…al + fix rf_sun_file key

Two changes bundled because they touch overlapping lines:

(1) Fix pre-existing key-name bug: the adapter has been passing
``rf_solar_file`` to scendata, but upstream ``ciceroscm`` v2.x only
recognises ``rf_sun_file`` / ``rf_sun_data`` in its
``InputHandler.set_sun_volc_luc_defaults`` and ``get_data`` paths.
Anything passed under ``rf_solar_*`` was silently dropped and the
model fell back to its bundled default ``solar_IPCC.txt``. Renamed
the cfg key + scendata key + canonical-file-resolver entry to
``rf_sun_file`` to match upstream. (The same latent bug exists in
cscm-calibrate's own ``run_full_rcmip_protocol.py``; flagged for
Marit.)

(2) Wire Solar + Volcanic into the canonical RCMIP3 path that was
added for Land use in the previous commit. When the
``rcmip3_bundle_path`` cfg key is set, the scendata builder now
populates ``rf_sun_data`` and ``rf_volc_data`` (in-memory year-indexed
DataFrames) from per-scenario
``Effective Radiative Forcing|Natural|{Solar,Volcanic}`` rows in
``rcmip_phase3_forcing_v2.0.0.csv`` instead of pointing at the
calibration-bundle paths. Upstream's
``InputHandler.set_sun_volc_luc_defaults`` propagates the single
``rf_volc_data`` to ``rf_volc_n_data`` / ``rf_volc_s_data``
automatically; the input handler's shape-coercion (1D annual ->
``arr[:, None]`` for volcanic) accepts annual single-column
DataFrames without needing a monthly broadcast.

Legacy file path stays as the back-compat default when
``rcmip3_bundle_path`` is not set; idealised scenarios still go
through upstream's ``sunvolc=0`` suppression as before.

Behaviour change for users who currently pass ``rf_solar_file=...``
to ``CICEROSCMPY2.from_native_distribution(...)``: that key is now
``rf_sun_file=...``. Anything passed under the old key was already
being silently dropped, so no working behaviour is lost; this just
makes the key wire-up explicit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on RCMIP3 path

Closes Zeb's review comment on
``_output_variables.py:158-162`` ("I'd suggest just back-reporting
this anyway, makes life easier for users"). CICERO-SCM v2.x doesn't
expose Solar / Volcanic / Land-use albedo as diagnostic outputs --
the model treats them as forcing inputs read from the bundle. The
adapter now echoes the per-scenario input trajectories back into
the output ScmRun when the canonical RCMIP3 path is active, so
users can request them alongside diagnostics like total ERF or GSAT.

Changes:

- ``_output_variables.RCMIP3_BACK_REPORTABLE_VARIABLES``: new
  frozenset of the three back-reportable variable names.
  ``all_supported()`` includes it (the validator accepts requests
  for these names on either path).
- The "input-side forcing" entry in ``_STRUCTURAL_LIMITS`` was
  already removed in Marit's earlier cleanup commit; the module
  docstring's "What CICERO-SCM v2.x cannot produce" list now drops
  Solar/Volcanic/Land use and the "What it supports" list mentions
  the back-report.
- ``_run_one_distribution`` splits ``output_variables`` into
  upstream-produced + back-reportable. The back-reportable subset
  is stripped from the upstream call and synthesised after the
  model run from the scendata's ``rf_sun_data`` / ``rf_volc_data``
  / ``rf_luc_data`` trajectories. On the legacy path (without
  ``rcmip3_bundle_path``) the back-report is skipped with a
  warning.
- New helper ``_build_rcmip3_backreport_scmrun`` constructs the
  back-report ScmRun with the same metadata schema upstream's
  ``CSCMParWrapper`` uses (``climate_model``, ``model``, ``run_id``,
  ``scenario``, ``region``, ``variable``, ``unit``, then year
  columns). Returns ``None`` when no scendata trajectory matches
  the request (e.g. all-idealised Land-use request).
- Two unit tests in ``tests/unit/io_tests/test_rcmip3.py`` cover
  the happy path (10-row shape across 2 scenarios x 2 cfgs x
  partial-coverage variables) and the empty-fallback path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nonical RCMIP3

Track A wire-in: both adapters now consume the canonical
``rcmip_phase3_emissions_v2.0.0.csv`` and
``rcmip_phase3_concentrations_v2.0.0.csv`` from the Zenodo 20430630
bundle as the per-scenario baseline when ``rcmip3_bundle_path`` is
set. User ScmRun overlays still win per-species; legacy paths
preserved as fallback.

Shared canonicaliser:

- :func:`openscm_runner.io.canonicalise_rcmip3_variable` translates
  canonical RCMIP3 variable names to openscm-runner conventions.
  CO2 sub-sectors ("Emissions|CO2|AFOLU",
  "Emissions|CO2|Energy and Industrial Processes") get the MAGICC-
  style suffix; F-gas / PFC / HFC / Halon / Montreal intermediate
  IAMC categories are stripped so the leaf species name ends up
  adjacent to the "Emissions|" or "Atmospheric Concentrations|"
  prefix. Variables that don't need translation round-trip
  unchanged.

FaIR2:

- ``_emissions_translator._rcmip3_to_fair_emissions_df``: new helper
  that reads canonical CSV, canonicalises variables, and reshapes to
  FaIR's horizontal form (one row per (scenario, FaIR-species)).
- ``build_emissions_df`` grows an ``rcmip3_bundle_path`` kwarg.
  When set, the baseline comes from the canonical CSV (per-scenario
  rows) instead of the bundle's single ``historical_emissions_*.csv``.
  ``_splice_bundle_with_user`` grows a ``bundle_per_scenario`` flag
  to skip the relabel step when the baseline is already
  per-scenario.
- ``_concentrations_translator.build_concentrations_df_from_rcmip3``:
  new third route alongside the existing per-scenario tab-delimited
  reader and the in-memory ScmRun reader. Reads canonical CSV +
  canonicalises + translates leaf species via RCMIP_TO_FAIR2_SPECIES.
- Adapter dispatch: both ``_run_one_calibration`` and
  ``_run_translated_cfgs`` pass ``rcmip3_bundle_path`` through to
  the translators; conc-driven-mode validation accepts
  ``rcmip3_bundle_path`` as a third way to satisfy the
  "need concentrations from somewhere" check.

CICEROSCMPY2:

- ``_merge_rcmip3_canonical_into_user``: new helper that reads
  canonical emissions + concentrations CSVs, canonicalises variable
  names, and unions them into the user's ScmRun upfront.
  Deduplication is per ``(scenario, variable)`` -- user rows win
  when both supply the same key. The existing
  ``_build_hybrid_emissions_data`` and
  ``_build_hybrid_concentrations_data`` see the merged ScmRun and
  work unchanged: their ``cicero_comp_dict`` suffix-matching on
  ``f"Emissions|{openscm_suffix}"`` now picks up the canonicalised
  MAGICC-style names.
- ``_build_scendata_list`` calls the merge at the top when
  ``rcmip3_bundle_path`` is set; the rest of the per-scenario loop
  is untouched.

Tests:

- ``tests/unit/io_tests/test_rcmip3.py`` grows tests for the
  canonicaliser (10 parametrised cases covering flat / CO2-sector /
  PFC-nested / F-Gases-nested round-trips) and for the FaIR2
  canonical-RCMIP3 emissions + concentrations builders against the
  mini fixture.

Legacy paths unchanged: callers that don't set ``rcmip3_bundle_path``
keep the existing behaviour (read bundle historical CSV, or read
per-scenario tab-delimited conc files).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@benmsanderson
Copy link
Copy Markdown
Collaborator Author

Supersedes #101 (which was opened from benmsanderson/openscm-runner:address/pr97-comments and couldn't access org secrets, so the MAGICC download step was failing every CI job). This PR's head is the same branch pushed directly to openscm/openscm-runner so the test matrix can actually run.

benmsanderson and others added 10 commits June 4, 2026 23:12
Five categories of failure surfaced when the test matrix actually ran:

1. ``NameError: name 'pd' is not defined`` in
   ``_build_rcmip3_backreport_scmrun``. pandas was used inside the
   helper but the module had only local imports; promote ``import
   pandas as pd`` to module level.

2. Pre-existing unit tests in ``tests/unit/test_{run,installation}.py``
   pass ``scenarios="not used"`` (a string) on the way to exercising
   out_config or adapter-import errors. The strict
   ``scenarios.get_unique_meta`` validation I added in PR97 ran
   first and crashed with ``AttributeError`` before the
   intentional error could fire. Move the scenarios validation
   AFTER ``_check_out_config`` and the adapter-construction loop so
   cfg-shape and missing-package errors keep their original timing.
   Scenarios=None still raises ValueError as Marit requested -- just
   later in the function.

3. Four dummy-adapter dispatch tests in ``tests/unit/test_run.py``
   pass ``scenarios=None`` expecting the legacy "scenarios optional"
   contract. Updated to use a new ``_dummy_scenarios()`` helper that
   returns a minimal ScmRun (one ``Surface Temperature`` row -- the
   variable doesn't start with ``Emissions|`` so
   ``check_variables_are_as_expected`` no-ops).

4. ``test_modern_adapters.py`` was expanded in the initial cleanup
   to ssp126/ssp245/ssp370 per Zeb's "more scenarios" review
   comment. The CICERO mini-bundle only ships ssp245 conc files
   though, so the FaIRv2 CD path crashed with ``IndexError`` in
   ``fair.fill_from_pandas`` for ssp126/ssp370 (no matching rows
   in the conc_df). Scoped back to ssp245 with a TODO comment;
   expanding the mini bundles to cover the full trio is a follow-up.

5. Several "snapshot not found" failures from pytest-regressions
   on its first run are expected -- the framework writes the
   reference CSVs on the first run and the next CI run will
   compare against them. Not a code fix; will commit the snapshots
   in a follow-up commit once CI generates them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sts instead

Previous fix moved the scenarios check after adapter construction so
``test_no_fair`` and ``test_no_pymagicc`` (which pass scenarios as a
literal string) would reach the missing-package ImportError. That
broke ``test_run_rejects_unknown_emissions_variables`` which uses an
unregistered ``"model_a"`` cfg and expects the bad-emissions
``ValueError`` from scenarios validation to fire BEFORE the
``NotImplementedError`` from get_adapter.

Inverting the trade-off: the correct order is
``cfg-shape -> scenarios -> adapter construction`` so user-input
errors are surfaced before package-import errors. ``test_no_fair`` /
``test_no_pymagicc`` / ``test_run_out_config_conflict_error`` updated
to use a minimal ScmRun (Surface Temperature row -- bypasses
``check_variables_are_as_expected`` since the variable name doesn't
start with ``Emissions|``) so the rest of the test runs as
intended.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…carbon-cycle fix

Marit raised this in a chat after the original PR97 round: openscm-
runner shouldn't be monkey-patching upstream ciceroscm internals when
we (collectively) maintain ciceroscm too. The patches in
``_upstream_patches.py`` worked around the
``"carbon_cycle_outputs" in cfg`` presence-check bug in
``concentrations_emissions_handler.py`` of ciceroscm 2.0.1 / 2.1.0.
Upstream fixed it to ``cfg.get("carbon_cycle_outputs")`` in 2.1.1.
Our monkey-patches are now wrapping code that no longer exists --
at best a no-op, at worst a maintenance liability that will silently
break on the next upstream API drift.

- Delete ``_upstream_patches.py`` and the ``_apply()`` call from the
  package ``__init__``.
- Move ``_CARBON_CYCLE_VARIABLES`` and ``output_vars_need_carbon_cycle``
  to ``_output_variables.py`` -- they're still useful for the INFO
  log when users request carbon-cycle variables (the 30-50x slowdown
  is real regardless of the patches).
- Add ``require_modern_ciceroscm()`` in ``_compat.py`` that raises a
  clear ``ImportError`` if the installed ciceroscm is below 2.1.1.
  Used by ``CICEROSCMPY2._init_model`` in place of the previous
  major-version-only check.
- Add ``_ciceroscm_version_tuple()`` for fine-grained checks (the
  integration test's ``cicero_skip`` marker now uses this so it
  collects-but-skips on 2.0.1 / 2.1.0).
- Bump the env2 install pin in ``.github/workflows/ci.yaml`` from
  ``ciceroscm>=2,<3`` to ``ciceroscm>=2.1.1,<3``. The pyproject pin
  stays at ``>=1.1,<3`` because the ``[ciceroscmpy]`` extra still
  needs the v1.1.x major; runtime enforcement is the cleaner split.

The deletion is intentionally larger than the additions:
the carbon-cycle fix lives upstream now, and that's where it
should stay.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…o/forcing paths

After Marit's PR97 review (and a follow-up chat about ducked-error
patterns), the additive "legacy paths preserved alongside canonical
RCMIP3" approach from earlier commits became dead weight. This commit
deletes the legacy paths and makes ``rcmip3_bundle_path`` mandatory
for both modern adapters.

API shift on ``from_native_distribution`` -- now takes two explicit
required args:

    FAIR2.from_native_distribution(calibration_dir, rcmip3_bundle_path)
    CICEROSCMPY2.from_native_distribution(calibration_dir, rcmip3_bundle_path)

``calibration_dir`` carries model-specific calibration only (parameter
posterior, species_configs / gaspam, natural-emissions baselines,
tuning scale factors). ``rcmip3_bundle_path`` carries the canonical
Zenodo 20430630 protocol inputs. The two are different things and
shouldn't be conflated; the previous ``native_distribution_path``
single-arg API + ``**cfg_overrides`` was conflating them. The new
signature has no ``**cfg_overrides`` -- file-path overrides for
protocol files are gone because the canonical CSV is the source of
truth, and the surviving non-path knobs (``member_indices``, ``mode``,
``stochastic_run``, ``max_workers``) are explicit kwargs.

Code deletions:

- ``_concentrations_translator.build_concentrations_df`` (per-scenario
  tab-delimited reader), ``_read_rcmip_conc_file``, ``_pick_conc_file``.
- ``_emissions_translator``: ``bundle_emissions_csv`` arg on
  ``build_emissions_df`` dropped; ``_splice_bundle_with_user``'s
  ``bundle_per_scenario`` flag dropped (always true now).
- ``fair2_adapter._fill_natural_from_legacy_bundle``,
  ``_fill_land_use_from_legacy_bundle``, ``_DEFAULT_LAND_USE_SCENARIO``,
  ``fair2_conc_bundle_dir`` cfg key, the ``f.fill_from_rcmip()``
  fallback in translated-cfg mode, the single-trajectory ``_write``
  helper inside ``_fill_natural_forcings``.
- ``ciceroscmpy2_adapter`` ``_DEFAULT_CANONICAL_FILES``: drops
  ``rf_sun_file`` / ``rf_volc_file`` / ``rf_luc_file``. Scendata
  builder always populates ``rf_sun_data`` / ``rf_volc_data`` /
  ``rf_luc_data`` from the canonical RCMIP3 path; the previous
  three-branch (idealised / canonical / legacy) logic collapses to
  two (idealised / canonical).
- ``_native_calibration._BUNDLE_FILES``: drops
  ``historical_emissions``, ``solar_forcing``, ``volcanic_forcing``,
  ``land_use_forcing``, ``irrigation_forcing``. Calibration bundle
  now carries calibration-only files.

Test fixture cleanup:

- ``tests/test-data/fair2-mini-bundle/``: deleted
  ``historical_emissions_1750-2023_cmip7.csv``,
  ``solar_forcing_timebounds_cmip7.csv``,
  ``volcanic_forcing_timebounds_cmip7.csv``,
  ``land_use_forcing_timebounds_cmip7.csv``,
  ``irrigation_forcing_timebounds_cmip7.csv``. Calibration files
  (parameter posterior, species_configs, scale factors) stay.
- ``tests/test-data/ciceroscm-mini-bundle/``: deleted
  ``solar_RCMIP_ssp245_RCMIP3.txt``,
  ``VOLC_RCMIP_ssp245_RCMIP3.txt``,
  ``LUCalbedo_RCMIP_ssp245_RCMIP3.txt``. Renamed
  ``ssp245_{em,conc}_*.txt`` to ``historical_{em,conc}_*.txt`` to
  match the canonical resolver expectation (those files anchor
  the CICERO species column structure + units; the actual scenario
  time series come from the canonical RCMIP3 merge).
- ``tests/integration/test_modern_adapters.py`` factory functions
  use the new two-arg ``from_native_distribution`` and drop the
  ``_CICERO_SSP245_OVERRIDES`` cfg-override dict.

Net: -7163 / +163 LOC across these files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… on canonical-bundle misses

Marit and Zeb both flagged the cluster of "warn and fall back to zero"
patterns introduced during the canonical-RCMIP3 wire-in: missing
scenario rows, unmapped scenario->category, unconvertible units. Each
one masks a real bug in the input (or in the bundle) while letting
the run produce plausible-but-wrong output -- the "AI cover all your
bases" objection.

Convert these to loud failures now that the canonical-only policy
makes "no fallback" the contract:

  FaIR2:
  - ``_fill_natural_from_rcmip3``: missing Solar/Volcanic row raises
    KeyError instead of warning and zeroing the species.
  - ``_fill_land_use_from_rcmip3``: ``resolve_scenario_category``
    KeyError propagates (caller must add an override); missing
    historical Albedo Change rows raise.
  - ``_emissions_translator._unit_scale``: drop the broad
    ``except Exception -> None`` and the caller's
    "skip-this-species" warning; let pint's
    UndefinedUnitError / DimensionalityError surface.

  CICEROSCMPY2:
  - ``_build_natural_data_from_rcmip3`` /
    ``_build_rf_luc_data_from_rcmip3``: same canonical-row raises as
    FaIR2; ``resolve_scenario_category`` KeyError propagates.
  - ``_run_one_distribution``: requesting back-reportable variables
    without ``rcmip3_bundle_path`` set now raises ValueError instead
    of silently dropping them from the output.
  - ``_build_hybrid_emissions_data``: drop the broad
    ``except Exception -> skipped_unmapped`` on unit conversion;
    fail loudly if a species mapping has a bad target unit or the
    user ScmRun ships an unparseable unit.

Also refresh the CICEROSCMPY2 module docstring's cfg-key reference
section -- it still listed the ``rf_sun_file`` /
``rf_volc_file`` / ``rf_luc_file`` / ``rf_luc_constant_zero_file``
keys removed in the previous commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pts the frame

The first env2 CI run on the canonical-only branch failed both
FaIRv2 smoke tests with::

    fair.exceptions.NonMonotonicError: Time values in the emissions
    file must be strictly monotonically increasing.

Root cause is in ``_splice_bundle_with_user``: the bundle frame may
not cover every year the user supplies (the mini-bundle fixture
trims to a 6-year subset; the full canonical CSV covers 1750-2500
yearly but a user can in principle drop in extra years too). When
the user introduces a year the bundle did not have, the assignment
``spliced_df.loc[mask, year] = value * scale`` (or the
``pd.concat`` path for unmapped-species rows) parks the new column
at the right edge in user-row order -- which is typically not
monotonic relative to the bundle tail (mini-bundle ends at 2100,
user adds 2015 next).

Fix: after the ffill, reindex the frame columns to
``meta_cols + sorted(year_cols)`` so year columns are strictly
monotonic before the frame leaves the translator. Mirrors the same
expectation FaIR's ``_check_csv`` enforces a few lines later.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sions-mode species

The first env2 CI run on the canonical-only branch failed both
FaIRv2 smoke tests at ``fill_from_pandas`` with::

    IndexError: index 0 is out of bounds for axis 0 with size 0

Root cause: the rcmip3-mini fixture ships only 3 emissions species
(CO2 FFI + CO2 AFOLU + CH4 for historical + ssp245). FaIR 2.x's
AR6-calibrated species_configs has 51 species with
``input_mode == "emissions"``. When ``fill_from_pandas`` iterates
the full species list, every species we did not supply hits the
empty-row ``.values[0]`` at fair.io.fill_from:219.

In CD mode only CO2 / CH4 / N2O flip to ``concentration``; the
other 48 species still need an emissions row. So both ED and CD
need the full 51-species coverage.

Fix: regenerate the mini fixture's emissions CSV with placeholder
0-emissions rows for every other FaIR species, using the canonical
RCMIP variable names (``Emissions|<species>`` for everything except
CO2 sub-sectors which keep ``Emissions|CO2|AFOLU`` /
``Emissions|CO2|Energy and Industrial Processes``). Values for
CH4 + CO2 FFI + CO2 AFOLU are preserved verbatim; the added
species use 0 across all years (mini fixture is for shape /
smoke only, not scientific reproduction). Historical rows leave
2050/2100 empty to match the existing ssp-vs-historical pattern.

The fixture-shape unit test
``test_fair2_emissions_canonical_path_translates_variables`` got
its hard species-set assertion replaced with a
``.issubset`` check covering the sub-sector + intermediate-category
representatives the translator actually has to handle, so it
remains a meaningful regression on the canonicalisation logic
without re-asserting the exact (now-expanded) fixture content.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… to pint errors

The commit 3 broad-except removal caused the cicero env2 smokes to
fail with::

    pint.errors.UndefinedUnitError: 'GgH1211' is not defined in
    the unit registry

Root cause: ``_cicero_unit_to_pint`` reproduces the v1.1.x adapter
gaspam convention of concatenating prefix + species name
(``Gg`` + ``H1211`` -> ``GgH1211``) for the unit string. Most
species this round-trips through openscm-units fine, but Halons
are recorded in the gaspam as ``H-1211`` / ``H-1301`` / ``H-2402``
rather than ``Halon-1211`` etc., so the concatenated form is not
in openscm-units' registry. ``GgHalon1211`` would be, but the
gaspam-naming round-trip would need to happen in the
``_cicero_unit_to_pint`` helper, not in this loop.

The "honest errors" stance from commit 3 was directed at the
canonical-RCMIP3 path -- silently zeroing scientific data when a
bundle row is missing was Marit's concern. The legacy hybrid
emissions overlay is a different situation: the user-side mapping
is best-effort, the species' baseline trajectory stays in place
(not zeros), and the convention pre-dates this PR. Narrow the
except to ``UndefinedUnitError`` / ``DimensionalityError`` only so
real bugs still raise, but the Halon naming quirk doesn't break
every cicero smoke run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…test_modern_adapters

After the canonical-RCMIP3 wire-in (commit 3a07afb) and subsequent
fixture / error-handling fixes, all four ``test_adapter_smoke``
parameterisations run end-to-end through their respective climate
models. The only remaining failure is the pytest-regressions
first-run "File not found in data directory, created:" -- the
snapshot CSV gets written but the test fails so the runner can
notice and commit it.

CI has no way to surface that CSV at the moment, so we cannot
baseline the snapshots without re-running the env2 stack
somewhere we control. Add an ``actions/upload-artifact@v4`` step
gated on ``if: failure()`` that captures every CSV under
``tests/integration/test_modern_adapters/``. After this lands the
next CI run will publish four CSVs as a workflow artifact; those
get committed in a follow-up and the smoke tests turn green.

``if-no-files-found: ignore`` so non-pytest-regressions failures
don't trip an extra error on top.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ts from first CI run

CSVs generated by the env2 job on commit 0a33faf and pulled via
the artifact upload step added in that commit. Identical across
py3.10 and py3.11 (deterministic per-member runs), so one set
suffices.

Covers all four parameterised cases:
  - test_adapter_smoke_fair2_emissions_driven_.csv
  - test_adapter_smoke_fair2_concentration_driven_.csv
  - test_adapter_smoke_cicero_emissions_driven_.csv
  - test_adapter_smoke_cicero_concentration_driven_.csv

After this lands the env2 CI step turns green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@benmsanderson benmsanderson marked this pull request as ready for review June 5, 2026 17:15
@benmsanderson benmsanderson requested a review from Copilot June 5, 2026 17:16
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR follows up on #97 by tightening the openscm_runner.run.run() input surface, wiring FaIRv2 + CICERO-SCM-PY2 to consume canonical RCMIP3 (Zenodo 20430630) wide-table inputs, and reshaping tests/CI to exercise the modern adapters via a dedicated environment.

Changes:

  • Add a new openscm_runner.io.rcmip3 module for loading/filtering canonical RCMIP3 wide-table CSVs (plus a mini in-repo fixture) and use it in FaIRv2 emissions/concentrations pathways.
  • Update run.run() to accept dict or iterable inputs (including pre-constructed adapters) and enforce scenarios as required (ScmRun-like) for variable validation.
  • Refresh integration/unit tests and CI env2 (fair>=2, ciceroscm>=2.1.1) to validate the canonical-path wiring with regression snapshots.

Reviewed changes

Copilot reviewed 44 out of 46 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/unit/test_run.py Updates unit tests to satisfy the new scenarios validation contract.
tests/unit/test_installation.py Adds minimal ScmRun helper so missing-package tests fail at adapter import, not scenario validation.
tests/unit/io_tests/test_rcmip3.py New unit tests for RCMIP3 reader, canonicaliser, resolver, and adapter helper behavior.
tests/unit/adapters/test_fair2.py Removes older FaIRv2 unit-test module (coverage shifts to new tests/integration).
tests/unit/adapters/test_ciceroscmpy2.py Removes older CICEROSCMPY2 unit-test module (coverage shifts to new tests/integration).
tests/test-data/rcmip3-mini/rcmip_phase3_forcing_v2.0.0.csv Adds mini canonical forcing CSV fixture.
tests/test-data/rcmip3-mini/rcmip_phase3_emissions_v2.0.0.csv Adds mini canonical emissions CSV fixture.
tests/test-data/rcmip3-mini/rcmip_phase3_concentrations_v2.0.0.csv Adds mini canonical concentrations CSV fixture.
tests/test-data/rcmip3-mini/input_datafiles_generation/data/Forcing_irrigation_population_scale.csv Adds per-category irrigation split fixture for albedo breakdown.
tests/test-data/rcmip3-mini/input_datafiles_generation/data/Forcing_AFOLU_CO2.csv Adds per-category land-use albedo split fixture.
tests/test-data/fair2-mini-bundle/volcanic_forcing_timebounds_cmip7.csv Removes legacy bundle volcanic forcing CSV (now sourced via RCMIP3).
tests/test-data/ciceroscm-mini-bundle/solar_RCMIP_ssp245_RCMIP3.txt Removes legacy per-scenario solar forcing file fixture.
tests/test-data/ciceroscm-mini-bundle/LUCalbedo_RCMIP_ssp245_RCMIP3.txt Removes legacy per-scenario land-use albedo file fixture.
tests/integration/test_modern_adapters/test_adapter_smoke_fair2_emissions_driven_.csv New pytest-regressions snapshot for FaIRv2 emissions-driven smoke.
tests/integration/test_modern_adapters/test_adapter_smoke_fair2_concentration_driven_.csv New pytest-regressions snapshot for FaIRv2 concentration-driven smoke.
tests/integration/test_modern_adapters/test_adapter_smoke_cicero_emissions_driven_.csv New pytest-regressions snapshot for CICEROSCMPY2 emissions-driven smoke.
tests/integration/test_modern_adapters/test_adapter_smoke_cicero_concentration_driven_.csv New pytest-regressions snapshot for CICEROSCMPY2 concentration-driven smoke.
tests/integration/test_modern_adapters.py Reworks modern-adapter integration testing into regression snapshots and routes via env2 marker.
src/openscm_runner/run.py Supports mixed input shapes (dict/iterable + adapters), normalises entries, and enforces scenarios non-None.
src/openscm_runner/io/rcmip3.py New canonical RCMIP3 wide-table reader + variable canonicaliser + scenario-category resolver.
src/openscm_runner/io/init.py Exposes RCMIP3 I/O utilities from the openscm_runner.io package.
src/openscm_runner/adapters/magicc7/magicc7.py Moves pymagicc availability check/setup into _init_model.
src/openscm_runner/adapters/fair2_adapter/_native_calibration.py Removes scenario inputs/forcings from the FaIR calibration bundle contract (now RCMIP3-driven).
src/openscm_runner/adapters/fair2_adapter/_emissions_translator.py Reads baseline emissions from canonical RCMIP3 and tightens unit-conversion behavior.
src/openscm_runner/adapters/fair2_adapter/_concentrations_translator.py Adds canonical-RCMIP3 concentration builder and removes legacy conc-file reader path.
src/openscm_runner/adapters/ciceroscm_py2_adapter/_upstream_patches.py Deletes now-unneeded upstream monkey-patches (requires ciceroscm>=2.1.1 instead).
src/openscm_runner/adapters/ciceroscm_py2_adapter/_output_variables.py Documents/validates supported outputs and adds back-reportable RCMIP3 forcing variables.
src/openscm_runner/adapters/ciceroscm_py2_adapter/_compat.py Adds version parsing + enforces ciceroscm>=2.1.1.
src/openscm_runner/adapters/ciceroscm_py2_adapter/init.py Updates package docs to reflect the ciceroscm>=2.1.1 floor and removed patches.
src/openscm_runner/adapters/ciceroscm_py_adapter/ciceroscmpy.py Removes redundant __init__ passthrough.
src/openscm_runner/adapters/ciceroscm_adapter/ciceroscm.py Minor exception/message tweak + security linter annotation change.
src/openscm_runner/adapters/ciceroscm_adapter/ciceroscm_wrapper.py Shortens distutils migration comment in tempdir setup.
src/openscm_runner/adapters/base.py Removes unused TYPE_CHECKING import and deletes base from_native_distribution.
src/openscm_runner/adapters/_protocol.py Drops @runtime_checkable from AdapterLike protocol.
README.md Removes programmatic API and supported-models prose blocks.
pyproject.toml Adjusts coverage gate (fail_under=0) and documents mutual-exclusion constraints + ciceroscm floor.
.github/workflows/ci.yaml Pins env2 ciceroscm>=2.1.1 and uploads regression snapshots on failure.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/openscm_runner/run.py Outdated
Comment on lines 102 to 104
scenarios : :obj:`pyam.IamDataFrame`
Scenarios to run.

Comment thread src/openscm_runner/run.py Outdated
Comment on lines 139 to 141
@@ -115,71 +140,46 @@ def run(
``out_config`` has keys which are not in ``climate_models_cfgs``.

Comment thread src/openscm_runner/run.py
Comment on lines +41 to +58
def _normalise_entries(climate_models_cfgs):
"""Coerce dict or iterable input into ``[(name, value), ...]`` pairs."""
if isinstance(climate_models_cfgs, Mapping):
return list(climate_models_cfgs.items())
entries = []
for item in climate_models_cfgs:
if _looks_like_adapter(item):
name = getattr(item, "model_name", None)
if name is None:
raise TypeError(
"Adapter instance is missing a `model_name` attribute: "
f"{type(item).__name__}."
)
entries.append((name, item))
else:
name, value = item # expect a (name, cfgs) tuple
entries.append((name, value))
return entries
Comment on lines +159 to +173
def output_vars_need_carbon_cycle(output_variables) -> bool:
"""Return ``True`` if any requested variable needs the carbon cycle."""
return bool(set(output_variables) & _CARBON_CYCLE_VARIABLES)
"""Forcing inputs the adapter back-reports on the canonical RCMIP3 path.

CICERO-SCM v2.x treats Solar / Volcanic / Land-use albedo as input
forcings (read from the bundle's per-scenario forcing files) rather
than diagnostic outputs. The model neither computes nor exposes them
through :data:`SUPPORTED_VARIABLES`. The adapter back-reports them by
echoing the per-scenario trajectories it already has in scope when
the ``rcmip3_bundle_path`` cfg key is set (see
:func:`_build_natural_data_from_rcmip3` and
:func:`_build_rf_luc_data_from_rcmip3` in the adapter module). On
the legacy bundle path the adapter logs a warning and omits the
back-report from the output."""
Comment on lines +386 to +392
Returns a DataFrame in the same horizontal layout that
:func:`_scmrun_to_fair2_rows` produces (one row per
``(scenario, FaIR-species)``, integer year columns), keyed by user
scenario rather than a single bundle ``"historical"`` row -- so
:func:`_splice_bundle_with_user` must be called with
``bundle_per_scenario=True``.

Comment on lines 332 to 335
bundle_unit = spliced_df.loc[mask, "unit"].iloc[0]
user_unit = user_row.get("unit")
scale = _unit_scale(user_unit, bundle_unit, user_row["variable"])
if scale is None:
LOGGER.warning(
"Could not convert %r emissions from user unit %r to "
"bundle unit %r; user data NOT overlaid for this species, "
"bundle values left in place.",
user_row["variable"],
user_unit,
bundle_unit,
)
continue

… + stricter _normalise_entries)

Copilot left six inline review comments on the PR. None are bugs;
five are docstring/comment drift from the recent strict-canonical
and error-tightening commits, and one is a minor input-validation
hardening.

  run.py
  - The ``scenarios`` parameter is documented as a
    ``pyam.IamDataFrame``, but the function now calls
    ``scenarios.get_unique_meta("variable")`` -- an ``ScmRun``
    method. Update the docstring to require an ``ScmRun`` and tell
    legacy ``IamDataFrame`` users to wrap their input.
  - The ``Raises`` section claimed ``KeyError`` for unknown
    ``out_config`` keys, but ``_check_out_config`` only warns. Move
    the warning behaviour into a ``Notes`` section and drop the
    ``KeyError`` claim.
  - ``_normalise_entries`` would happily iterate over a stray
    string (``"magicc7"``) and then fail with a low-signal
    ``ValueError: too many values to unpack``. Reject ``str`` /
    ``bytes`` explicitly at the top level and per-entry, with a
    clear ``TypeError``. Also wrap the per-entry unpack in a
    ``try/except`` so non-pair entries also produce a clear
    message.

  ciceroscm_py2_adapter/_output_variables.py
  - There was a stray triple-quoted string literal floating between
    ``output_vars_need_carbon_cycle`` and the next module-level
    block. As a standalone expression it has no effect (and lints
    flag it as a pointless statement). Move the prose to a ``#``
    comment block above the ``RCMIP3_BACK_REPORTABLE_VARIABLES``
    definition it was meant to document, and refresh the wording
    since the "legacy bundle path" path it referenced was deleted
    in commit 3a07afb.

  fair2_adapter/_emissions_translator.py
  - The ``_rcmip3_to_fair_emissions_df`` docstring referenced a
    ``bundle_per_scenario=True`` parameter that the
    ``_splice_bundle_with_user`` signature never had (and certainly
    doesn't now). Rewrite the relevant paragraph to describe what
    the current contract actually is.
  - The overlay block above ``_unit_scale`` still claimed that
    conversion failures get logged and the row skipped, but the
    commit 3 tightening made ``_unit_scale`` raise. Update the
    comment so future readers don't re-introduce the silent-skip.

Unit tests still pass (61 passed) -- no behaviour change for
existing call sites since ``_normalise_entries`` only narrows
already-invalid inputs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@benmsanderson benmsanderson merged commit 40398dc into feat/fair2-ciceroscmpy2-adapters-and-runmode-nonfork Jun 5, 2026
15 checks passed
@benmsanderson benmsanderson deleted the address/pr97-comments branch June 5, 2026 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants