Skip to content

Address PR97 review comments + RCMIP3 wire-in#101

Closed
benmsanderson wants to merge 12 commits into
openscm:feat/fair2-ciceroscmpy2-adapters-and-runmode-nonforkfrom
benmsanderson:address/pr97-comments
Closed

Address PR97 review comments + RCMIP3 wire-in#101
benmsanderson wants to merge 12 commits into
openscm:feat/fair2-ciceroscmpy2-adapters-and-runmode-nonforkfrom
benmsanderson:address/pr97-comments

Conversation

@benmsanderson
Copy link
Copy Markdown
Collaborator

Draft. Stacked on top of #97 to address @znicholls's review comments, plus a Track A canonical-RCMIP3 wire-in that came out of the back-and-forth on the file-format question. Discussion history (with @maritsandstad) lived on the fork at benmsanderson/openscm-runner#15 — that PR description has the long-form context.

Opening as a draft so reviewers can see the shape before we settle the remaining maintainership wording.


How the commits map to #97 review comments

Initial cleanup pass (commit 9ad17c3) — closes 13 of 21

Coverage fail_under dropped to 0 with a comment per @znicholls's followup.

Naming hygiene (commit 7e2cc7b)

CICERO_TO_FAIR2_SPECIESRCMIP_TO_FAIR2_SPECIES; _read_cicero_conc_file_read_rcmip_conc_file; docstring fixes throughout _concentrations_translator.py and adjacent files. Pure rename — addresses @maritsandstad's point that the adapter doesn't read CICERO-internal format anywhere; the files it consumes are RCMIP-shaped tabular files and the code was mislabelled.

CI matrix marker (commit fe2bd70)

test_modern_adapters.py marked faircicero2_only so it routes through env2 of @maritsandstad's tests-multi-env job (modern-major venv).

Bug fix: rf_sun_file (commit ed94023, bundled with Solar/Volcanic wire-in)

While wiring up the Solar/Volcanic side of the RCMIP3 work, I verified the upstream ciceroscm v2.1.2 API and found that the adapter was passing rf_solar_file in scendata but upstream's InputHandler only recognises rf_sun_file / rf_sun_data. Anything passed as rf_solar_* was silently dropped and the model fell back to the default solar_IPCC.txt. Renamed throughout (cfg key, scendata key, canonical-file resolver, test override).

Note: same latent bug exists in cscm-calibrate's own run_full_rcmip_protocol.py — flagged to @maritsandstad separately. The published RCMIP3 reference runs may have been using the default solar trajectory instead of the per-scenario solar_RCMIP_*_RCMIP3.txt.

Back-report Solar / Volcanic / Land use (commit bb39feb)

Closes @znicholls's _output_variables.py:158-162 comment. When the canonical RCMIP3 path is active, the adapter back-reports per-scenario Solar / Volcanic / Land use ERF trajectories into the output ScmRun. On the legacy path the request logs a warning and skips the back-report.


Canonical RCMIP3 wire-in (Track A — out of strict scope but felt right)

Both Albedo Change|{Land Use,Irrigation} on @znicholls's hard-coded-irrigation comment and the conc-file-format on @maritsandstad's review point at the same underlying issue: the adapter shouldn't be locked to per-bundle file conventions when a canonical published source exists. Wired both adapters to consume the canonical Zenodo 20430630 wide-tables when the new rcmip3_bundle_path cfg key is set. Legacy paths stay as fallback.

Three axes, three commits (c047e1c, c5787b0, 6affa5c) staged for review clarity:

Axis FaIR2 CICEROSCMPY2
Land use + Irrigation forcings _fill_land_use_from_rcmip3 reads per-CMIP7-category Land Use + Irrigation from input_datafiles_generation/data/; historical uses the published per-component breakdown. _build_rf_luc_data_from_rcmip3 populates rf_luc_data in-memory scendata (CICERO has no irrigation channel).
Solar / Volcanic forcings _fill_natural_from_rcmip3 reads per-scenario Effective Radiative Forcing|Natural|{Solar,Volcanic} from rcmip_phase3_forcing_v2.0.0.csv. _build_natural_data_from_rcmip3 populates rf_sun_data + rf_volc_data in-memory; upstream's InputHandler shape-coercion handles annual → multi-axis.
Emissions + Concentrations _rcmip3_to_fair_emissions_df + build_concentrations_df_from_rcmip3 consume rcmip_phase3_{emissions,concentrations}_v2.0.0.csv; existing user-ScmRun overlay still wins per-species. _merge_rcmip3_canonical_into_user unions canonical rows into user's ScmRun upfront; existing _build_hybrid_* overlay code is untouched.

Scenario→CMIP7-category resolver covers the SSP-RCP set with sensible defaults (ssp119/126→VL, ssp245→M, ssp370→H, ssp434→L, ssp460→ML, ssp534-over→LN, ssp585→HL) and accepts per-cfg scenario_to_category overrides. Native scen7-{cat} names pass through directly. historical / historical-cmip6 use the published per-component breakdowns.

canonicalise_rcmip3_variable handles the IAMC-naming differences (CO2 sub-sectors → MAGICC-style, intermediate |F-Gases|HFC|, |PFC| categories stripped) so the existing per-adapter species translation maps work without changes.


What's deliberately not done

Item Status Why
Maintainership wording for CICEROSCMPY2 + FaIRv2 (#3, #6) Deferred @maritsandstad ok with co-maintaining CICEROSCMPY2 ("baseline ok to be (co-)maintainer"). @chrisroadmap not yet pinged for FaIRv2 — would appreciate a tag here if you want to push that thread forward.
Hard-coded irrigation question (#17) Implicitly addressed by Track A The bundled irrigation_forcing_timebounds_cmip7.csv "M" hard-coding is gone on the canonical path. Adapter resolves per scenario, with override.
Concentrations / emissions ScmRun-only path Already exists build_concentrations_df_from_scmrun covers it; the new Track A path is just an additional baseline source.

Test coverage

  • Unit tests for the RCMIP3 reader, scenario→category resolver, canonicaliser, FaIR2 canonical-path builders, and CICEROSCMPY2 back-report helper all live in tests/unit/io_tests/test_rcmip3.py (29 tests).
  • Mini fixture under tests/test-data/rcmip3-mini/ is a 2-scenario × 3-variable slice of the real Zenodo CSVs, runs without external fetch.
  • Integration tests/integration/test_modern_adapters.py covers FaIR2 + CICEROSCMPY2 ED/CD across ssp126/245/370 with a pytest-regressions snapshot; marked faircicero2_only for @maritsandstad's tests-multi-env env2.
  • Legacy paths (existing bundle CSVs, ScmRun-only conc) are untouched — same code, no behaviour change for callers that don't opt into rcmip3_bundle_path.

🤖 Generated with Claude Code

benmsanderson and others added 12 commits June 3, 2026 10:07
Addresses the concrete code/test items from openscm#97
review (still open: the four maintainership questions and the
back-reporting of input forcings in CICEROSCMPY2):

- magicc7: drop the **kwargs __init__ override; move the pymagicc
  availability check and magicc_scenario_setup init into _init_model
  where they belong on the new base.
- _protocol: drop @runtime_checkable; the structural Protocol stays
  for type-hint use. Remove the isinstance(adapter, AdapterLike)
  runtime check in run.py — duck-typing is enough.
- base.py: drop the default NotImplementedError
  from_native_distribution stub from _Adapter; adapters that ship a
  native bundle (FAIR2, CICEROSCMPY2) already define it themselves.
- run.py: raise ValueError when scenarios is None; require
  IamDataFrame-shaped input (no AttributeError fallback); accept
  either a dict (values: cfgs list or pre-built adapter) or an
  iterable of (name, cfgs) tuples and/or adapter instances, freely
  mixable; require model_name on adapter instances rather than
  falling back to type().__name__.
- ciceroscm_wrapper: trim the distutils-removal comment.
- test_modern_adapters: run all 3 mini-bundle members; use the full
  ssp126/ssp245/ssp370 trio; replace the "0 < 2100 GSAT < 7" smoke
  assert with a pytest-regressions snapshot of GSAT + total ERF at
  1850/1900/2000/2025/2050/2100; expand parametrize ids
  ("fair2-emissions-driven", etc).
- delete tests/unit/adapters/test_{ciceroscmpy2,fair2}.py per request.
- README: drop the "Supported climate models and run modes" and
  "Programmatic API" prose sections (live in code / notebooks).
- pyproject: drop fail_under to 0 with a comment to bump back up
  once the adapter rewrite lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Marit's review of openscm#97 corrected the
characterisation of the conc / em files the FaIR2 and
CICEROSCMPY2 adapters consume: they are RCMIP-format tabular
files, not a CICERO-internal format. This is the naming-only
first step of the follow-up to switch both adapters to consuming
the canonical RCMIP wide-tables from Zenodo 20430630.

- _concentrations_translator.py: rename CICERO_TO_FAIR2_SPECIES to
  RCMIP_TO_FAIR2_SPECIES; rename _read_cicero_conc_file to
  _read_rcmip_conc_file; rename local cicero_name to rcmip_name;
  update module + function docstrings to say RCMIP-format
  throughout.
- fair2_adapter.py: update the ValueError message to say
  RCMIP-format ``{scen}_conc_*`` instead of CICERO-format.
- ciceroscmpy2_adapter.py: update docstrings for
  ``historical_em_file`` / ``historical_conc_file`` and the
  _build_hybrid_emissions_data / _build_hybrid_concentrations_data
  helpers to say RCMIP-format.

CICERO-SCM-internal API names (the v1.1.x Fortran adapter, the
_OPENSCM_TO_CICERO_CONC map, _cicero_unit_to_pint, cicero_species
locals in the species-overlay code) are unchanged: those refer
to CICERO-SCM upstream conventions, not file format.

No behaviour change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Shared reader module for the canonical RCMIP Phase 3 wide-table CSVs
(Zenodo 20430630 -- concentrations, emissions, forcing). Both
adapters will consume this in the follow-up commit so we can drop
the per-scenario tab-delimited input path the FaIR2 +
CICEROSCMPY2 adapters currently rely on, and the hard-coded
``_DEFAULT_LAND_USE_SCENARIO = "M"`` choice in FaIR2.

API:

- ``load_rcmip3(path, kind, scenarios=None, variables=None,
  region="World", model=None)`` returns a filtered wide-format
  DataFrame in the canonical layout (Model, Scenario, Region,
  Variable, Unit, Activity_Id, Type, Priority, Mip_Era, Version,
  1750, ..., 2500).
- Convenience wrappers ``load_rcmip3_concentrations`` /
  ``_emissions`` / ``_forcings`` for the three kinds.
- ``load_rcmip3_albedo_categories(path, category)`` reads the
  per-category Land Use + Irrigation series from the bundle's
  upstream ``input_datafiles_generation/data/Forcing_AFOLU_CO2.csv``
  and ``Forcing_irrigation_population_scale.csv`` files. Necessary
  because the canonical forcing CSV only publishes the lumped
  ``Albedo Change`` row for ssp scenarios; per-component breakdown
  for ssp inputs comes from the per-category upstream files.

Path resolution accepts either the bundle root, the
``RCMIP3_input_datafiles`` subdir, or a direct CSV path.

Mini fixture under ``tests/test-data/rcmip3-mini/`` is a 2-scenario
x 3-variable x 6-year slice of the real CSVs (no external fetch
needed for tests). Unit tests at ``tests/unit/io/test_rcmip3.py``
cover the filter surface, error paths, and metadata-column
validation.

No adapter wiring yet; that's the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…orcings

Replaces the hard-coded ``_DEFAULT_LAND_USE_SCENARIO = "M"`` choice
in FaIR2's ``_fill_natural_forcings`` (and the equivalent
``rf_luc_file`` single-trajectory path in CICEROSCMPY2) with a
per-scenario lookup against the canonical RCMIP Phase 3 bundle
(Zenodo 20430630). Opt-in via the new ``rcmip3_bundle_path`` cfg
key on either adapter; legacy paths remain as the back-compat
default.

How it works:

- :func:`openscm_runner.io.rcmip3.resolve_scenario_category` maps
  the user's scenario name to a CMIP7 ScenarioMIP category
  (``VL`` / ``LN`` / ``L`` / ``ML`` / ``M`` / ``H`` / ``HL``). The
  defaults cover the published SSP-RCP scenario set (ssp119 -> VL,
  ssp126 -> VL, ssp245 -> M, ssp370 -> H, ssp434 -> L, ssp460 ->
  ML, ssp534-over -> LN, ssp585 -> HL); callers override via the
  ``scenario_to_category`` cfg key, and ``scen7-{cat}`` native
  CMIP7 names skip the map entirely.
- ``historical`` / ``historical-cmip6`` use the per-component
  ``Effective Radiative Forcing|Anthropogenic|Albedo Change|{Land
  Use,Irrigation}`` rows published directly in
  ``rcmip_phase3_forcing_v2.0.0.csv``; the per-category lookup is
  bypassed.
- Idealised scenarios (``protocol_land_use_forcing ==
  "constant_zero"``) keep zeroing out via the existing
  suppression mask in both adapters.
- Unknown scenarios log a warning and leave Land use / Irrigation
  at zero, with a pointer to ``scenario_to_category``.

FaIR2 surface:

- ``_fill_natural_forcings`` factors the Land use + Irrigation
  block out into two helpers: ``_fill_land_use_from_legacy_bundle``
  (existing behaviour, kept) and ``_fill_land_use_from_rcmip3``
  (new). The choice is driven by the ``rcmip3_bundle_path`` kwarg.
- Both call sites (``_run_native_cfgs`` -> ``_run_one_calibration``
  and ``_run_translated_cfgs``) pass the cfg-level
  ``rcmip3_bundle_path`` / ``scenario_to_category`` through.
- Translated-cfg path validates uniqueness of both kwargs across
  cfgs (same rule as ``emissions_bundle``).

CICEROSCMPY2 surface:

- ``_build_scendata_list`` grew a third LUC branch: when
  ``cfg.get("rcmip3_bundle_path")`` is set, populates the existing
  in-memory ``rf_luc_data`` scendata slot from
  :func:`_build_rf_luc_data_from_rcmip3` instead of writing
  ``rf_luc_file``. Idealised (zeros) and legacy (path) branches
  unchanged.
- ``_build_rf_luc_data_from_rcmip3`` returns a year-indexed
  single-column DataFrame matching the shape upstream's
  ``DistributionRun`` expects. CICEROSCM has no irrigation channel,
  so only Land Use is wired in; Irrigation is FaIR-only.

Solar / Volcanic deferred to a follow-up commit; this PR is
land-use only per Ben + Marit's scope.

Test fixture under ``tests/test-data/rcmip3-mini/`` expanded with
the historical Albedo Change|{Land Use,Irrigation} breakdown rows
so the new code paths are exercised end-to-end without external
fetch. Unit tests in ``tests/unit/io/test_rcmip3.py`` cover the
resolver, the per-category loader, and the historical-breakdown
canonical CSV path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…trix

Marit's tests-multi-env job (added in 0d9f5b1 / merged via openscm#100) runs
two isolated venvs:

* env1 -- fair>=1.6,<2 + ciceroscm>=1.1,<2, selects
  ``-m "faircicero1x_only or magicc"``.
* env2 -- fair>=2,<3 + ciceroscm>=2,<3, selects
  ``-m "(faircicero2_only or magicc) and not faircicero1x_only"``.

Unmarked tests are caught by the main ``tests`` job, which currently
resolves to legacy majors via the poetry lockfile. The FaIRv2 +
CICEROSCMPY2 integration smoke tests need the modern majors, so route
the module through env2 via a module-level ``pytestmark``. The
existing per-test ``fair2_skip`` / ``cicero_skip`` markers stay so the
module still collects (and individually skips) under the main job's
legacy resolution.

The unit tests in tests/unit/io_tests/test_rcmip3.py stay unmarked --
they're pure pandas-based reader tests with no model dependency, so
the main ``tests`` job (which runs everything unmarked) covers them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ings

Extends the ``rcmip3_bundle_path`` cfg key wired in the previous
land-use commit so it also covers Solar + Volcanic effective
radiative forcings, reading per-scenario rows from
``rcmip_phase3_forcing_v2.0.0.csv``
(``Effective Radiative Forcing|Natural|{Solar,Volcanic}``) instead of
broadcasting a single trajectory from the calibration bundle's
``solar_forcing`` / ``volcanic_forcing`` CSVs.

Per-scenario coverage matters here: the canonical CSV's
``historical`` and ``ssp245`` solar rows differ at every overlapping
year (historical uses observations, the SSP-RCP rows use the
SSP-projection extension); the legacy single-trajectory broadcast
loses that distinction.

- ``_fill_natural_forcings``: branches on ``rcmip3_bundle_path``;
  delegates to ``_fill_natural_from_legacy_bundle`` (existing
  single-CSV-broadcast) or the new ``_fill_natural_from_rcmip3``
  (per-scenario CSV lookup with ``natural_mask`` suppression and a
  warning-then-zero fallback for scenarios with no canonical row).
- Docstring reorganised: both Solar/Volcanic and Land use/Irrigation
  now describe a "canonical RCMIP3" and "legacy bundle" path; the
  former two columns of the structure mirror each other now.

CICEROSCMPY2 Solar+Volcanic is **not** in this commit. Two reasons:

1. Upstream ``ciceroscm`` v2 uses ``rf_sun_*`` (not ``rf_solar_*``)
   as the canonical solar key; the openscm-runner adapter currently
   passes ``rf_solar_file`` which upstream silently ignores in favour
   of its default ``solar_IPCC.txt``. This is a pre-existing bug
   that's orthogonal to the RCMIP3 wire-in.
2. Upstream's volcanic input expects a 12-column monthly DataFrame
   (``rf_volc_n_data`` / ``rf_volc_s_data``); the canonical RCMIP3
   CSV is annual. Translating annual -> monthly cleanly needs a
   physical choice (broadcast vs. equally distribute) that I'd
   rather make in a separate commit after talking with Marit.

Test fixture already covers the historical + ssp245 Solar/Volcanic
rows (added in the land-use commit). New unit tests in
tests/unit/io_tests/test_rcmip3.py verify the per-scenario rows
exist for both natural-forcing components and that the historical /
ssp245 trajectories differ at overlapping years.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…al + fix rf_sun_file key

Two changes bundled because they touch overlapping lines:

(1) Fix pre-existing key-name bug: the adapter has been passing
``rf_solar_file`` to scendata, but upstream ``ciceroscm`` v2.x only
recognises ``rf_sun_file`` / ``rf_sun_data`` in its
``InputHandler.set_sun_volc_luc_defaults`` and ``get_data`` paths.
Anything passed under ``rf_solar_*`` was silently dropped and the
model fell back to its bundled default ``solar_IPCC.txt``. Renamed
the cfg key + scendata key + canonical-file-resolver entry to
``rf_sun_file`` to match upstream. (The same latent bug exists in
cscm-calibrate's own ``run_full_rcmip_protocol.py``; flagged for
Marit.)

(2) Wire Solar + Volcanic into the canonical RCMIP3 path that was
added for Land use in the previous commit. When the
``rcmip3_bundle_path`` cfg key is set, the scendata builder now
populates ``rf_sun_data`` and ``rf_volc_data`` (in-memory year-indexed
DataFrames) from per-scenario
``Effective Radiative Forcing|Natural|{Solar,Volcanic}`` rows in
``rcmip_phase3_forcing_v2.0.0.csv`` instead of pointing at the
calibration-bundle paths. Upstream's
``InputHandler.set_sun_volc_luc_defaults`` propagates the single
``rf_volc_data`` to ``rf_volc_n_data`` / ``rf_volc_s_data``
automatically; the input handler's shape-coercion (1D annual ->
``arr[:, None]`` for volcanic) accepts annual single-column
DataFrames without needing a monthly broadcast.

Legacy file path stays as the back-compat default when
``rcmip3_bundle_path`` is not set; idealised scenarios still go
through upstream's ``sunvolc=0`` suppression as before.

Behaviour change for users who currently pass ``rf_solar_file=...``
to ``CICEROSCMPY2.from_native_distribution(...)``: that key is now
``rf_sun_file=...``. Anything passed under the old key was already
being silently dropped, so no working behaviour is lost; this just
makes the key wire-up explicit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on RCMIP3 path

Closes Zeb's review comment on
``_output_variables.py:158-162`` ("I'd suggest just back-reporting
this anyway, makes life easier for users"). CICERO-SCM v2.x doesn't
expose Solar / Volcanic / Land-use albedo as diagnostic outputs --
the model treats them as forcing inputs read from the bundle. The
adapter now echoes the per-scenario input trajectories back into
the output ScmRun when the canonical RCMIP3 path is active, so
users can request them alongside diagnostics like total ERF or GSAT.

Changes:

- ``_output_variables.RCMIP3_BACK_REPORTABLE_VARIABLES``: new
  frozenset of the three back-reportable variable names.
  ``all_supported()`` includes it (the validator accepts requests
  for these names on either path).
- The "input-side forcing" entry in ``_STRUCTURAL_LIMITS`` was
  already removed in Marit's earlier cleanup commit; the module
  docstring's "What CICERO-SCM v2.x cannot produce" list now drops
  Solar/Volcanic/Land use and the "What it supports" list mentions
  the back-report.
- ``_run_one_distribution`` splits ``output_variables`` into
  upstream-produced + back-reportable. The back-reportable subset
  is stripped from the upstream call and synthesised after the
  model run from the scendata's ``rf_sun_data`` / ``rf_volc_data``
  / ``rf_luc_data`` trajectories. On the legacy path (without
  ``rcmip3_bundle_path``) the back-report is skipped with a
  warning.
- New helper ``_build_rcmip3_backreport_scmrun`` constructs the
  back-report ScmRun with the same metadata schema upstream's
  ``CSCMParWrapper`` uses (``climate_model``, ``model``, ``run_id``,
  ``scenario``, ``region``, ``variable``, ``unit``, then year
  columns). Returns ``None`` when no scendata trajectory matches
  the request (e.g. all-idealised Land-use request).
- Two unit tests in ``tests/unit/io_tests/test_rcmip3.py`` cover
  the happy path (10-row shape across 2 scenarios x 2 cfgs x
  partial-coverage variables) and the empty-fallback path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nonical RCMIP3

Track A wire-in: both adapters now consume the canonical
``rcmip_phase3_emissions_v2.0.0.csv`` and
``rcmip_phase3_concentrations_v2.0.0.csv`` from the Zenodo 20430630
bundle as the per-scenario baseline when ``rcmip3_bundle_path`` is
set. User ScmRun overlays still win per-species; legacy paths
preserved as fallback.

Shared canonicaliser:

- :func:`openscm_runner.io.canonicalise_rcmip3_variable` translates
  canonical RCMIP3 variable names to openscm-runner conventions.
  CO2 sub-sectors ("Emissions|CO2|AFOLU",
  "Emissions|CO2|Energy and Industrial Processes") get the MAGICC-
  style suffix; F-gas / PFC / HFC / Halon / Montreal intermediate
  IAMC categories are stripped so the leaf species name ends up
  adjacent to the "Emissions|" or "Atmospheric Concentrations|"
  prefix. Variables that don't need translation round-trip
  unchanged.

FaIR2:

- ``_emissions_translator._rcmip3_to_fair_emissions_df``: new helper
  that reads canonical CSV, canonicalises variables, and reshapes to
  FaIR's horizontal form (one row per (scenario, FaIR-species)).
- ``build_emissions_df`` grows an ``rcmip3_bundle_path`` kwarg.
  When set, the baseline comes from the canonical CSV (per-scenario
  rows) instead of the bundle's single ``historical_emissions_*.csv``.
  ``_splice_bundle_with_user`` grows a ``bundle_per_scenario`` flag
  to skip the relabel step when the baseline is already
  per-scenario.
- ``_concentrations_translator.build_concentrations_df_from_rcmip3``:
  new third route alongside the existing per-scenario tab-delimited
  reader and the in-memory ScmRun reader. Reads canonical CSV +
  canonicalises + translates leaf species via RCMIP_TO_FAIR2_SPECIES.
- Adapter dispatch: both ``_run_one_calibration`` and
  ``_run_translated_cfgs`` pass ``rcmip3_bundle_path`` through to
  the translators; conc-driven-mode validation accepts
  ``rcmip3_bundle_path`` as a third way to satisfy the
  "need concentrations from somewhere" check.

CICEROSCMPY2:

- ``_merge_rcmip3_canonical_into_user``: new helper that reads
  canonical emissions + concentrations CSVs, canonicalises variable
  names, and unions them into the user's ScmRun upfront.
  Deduplication is per ``(scenario, variable)`` -- user rows win
  when both supply the same key. The existing
  ``_build_hybrid_emissions_data`` and
  ``_build_hybrid_concentrations_data`` see the merged ScmRun and
  work unchanged: their ``cicero_comp_dict`` suffix-matching on
  ``f"Emissions|{openscm_suffix}"`` now picks up the canonicalised
  MAGICC-style names.
- ``_build_scendata_list`` calls the merge at the top when
  ``rcmip3_bundle_path`` is set; the rest of the per-scenario loop
  is untouched.

Tests:

- ``tests/unit/io_tests/test_rcmip3.py`` grows tests for the
  canonicaliser (10 parametrised cases covering flat / CO2-sector /
  PFC-nested / F-Gases-nested round-trips) and for the FaIR2
  canonical-RCMIP3 emissions + concentrations builders against the
  mini fixture.

Legacy paths unchanged: callers that don't set ``rcmip3_bundle_path``
keep the existing behaviour (read bundle historical CSV, or read
per-scenario tab-delimited conc files).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@maritsandstad
Copy link
Copy Markdown
Collaborator

@benmsanderson : I think we can merge this in now so we can get the CIs running and have everything going on in one place?

@benmsanderson
Copy link
Copy Markdown
Collaborator Author

Reopening from the same branch pushed directly to openscm/openscm-runner so the fork-PR secrets restriction stops blocking CI — see #102.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants