Feat/cmip7 awiesm3 veg hr by JanStreffing · Pull Request #266 · esm-tools/pycmor

JanStreffing · 2026-04-02T12:55:34Z

CMIP7 cmorization for AWI-ESM3-VEG-HR

Adds full CMIP7 support targeting AWI-ESM3-VEG-HR, including a native compound-name
architecture that replaces the legacy cmip6-table-based data request lookup.

Key changes

CMIP7 data request

Load DataRequest from CMIP7_DReq_metadata JSON instead of cmip6 tables
Refactor to native compound-name architecture (ocean.tos.tavg-u-hxy-sea.mon.GLB)
Fix JSON key mismatch: cmip6_table → cmip6_cmor_table in vendored metadata
Improve compound_name matching against cmip6_compound_name and cmip7_compound_name attributes
Derive table_id from compound name when not set explicitly
Strict ValueError on zero DRV matches (instead of silent skip)

Pipeline

Add generic vertical_integrate custom pipeline step
Remove duplicate convert() step from DefaultPipeline
Fix Prefect State objects not being unwrapped to actual results in parallel runs
Propagate pipeline/flow errors instead of silently logging them

Standard library

Add time bounds support (src/pycmor/std_lib/time_bounds.py)
Fix dimension mapping to use getattr + _pycmor_cfg fallback
Fix global_attributes to derive table_id from CMIP6/CMIP7 compound names

Xarray accessor API

Lazy accessor registration and StdLibAccessor with .process()

Test infrastructure

Modernize with entry-point model discovery (pycmor.fixtures.model_runs)
Add pycmor.tutorial dataset system (xarray.tutorial-style API)
Fix stub generator to use monotonic coordinate values for multi-file datasets

Misc fixes

Python 3.9 entry_points() compatibility
Guard pyfesom2 imports for environments without it
Fix tarball double-nesting extraction on Python 3.12+
Rename non-standard time dimension on load (OpenIFS support)

Test plan

The entry_points() API changed between Python 3.9 and 3.10: - Python 3.9: entry_points() returns dict-like object - Python 3.10+: entry_points(group='name') with keyword argument Use try/except to detect the API version at runtime. Fixes TypeError: entry_points() got an unexpected keyword argument 'group'

…6_table-based approach - Use user-specified CMIP7_DReq_metadata file for DataRequest loading - Fix cmip6_cmor_table -> cmip6_table key mismatch in table.py - Extract table IDs from cmip6_table values not compound name prefix - Add warning when rules have no matching data_request_variables - Add debug logging to find_matching_rule for troubleshooting This partially addresses the architectural issue where CMIP7 is forced into CMIP6's table-based structure. Full compound name matching still needs implementation (see CMIP7_ARCHITECTURE_ISSUE.md). Fixes silent failure where rules were dropped with no user feedback.

Add step-by-step failure scenario showing: - Silent failure symptoms - Root cause discovery process (3 layered bugs) - Log output at each debugging stage - Key symptoms and workarounds

The branch fixes immediate bugs (silent failure, config ignored) but architectural issues persist (cmip6_table dependency, partial matching).

- Index variables by full compound name instead of cmip6_table - Implement exact compound name matching for CMIP7 (find_matching_rule_cmip7) - Generate synthetic table headers from variable metadata - Remove dependency on cmip6_table field for CMIP7 data loading - Add comprehensive unit tests for synthetic header generation - Maintain full backward compatibility with CMIP6 and existing CMIP7 metadata Resolves critical AttributeError for table_header in CMIP7 processing. Addresses architectural issues identified in CMIP7_ARCHITECTURE_ISSUE.md. Tests: 15 passed, 1 skipped

Fixes trailing whitespace on blank lines in cmorizer.py and reformats several other files to be consistent with black when run from root.

Resolves merge conflicts in cmorizer.py and global_attributes.py, keeping CMIP7_DReq_metadata feature and integrating prep-release compound_name table_id derivation logic.

…ected The PycmorConfigManager applies a 'pycmor' namespace, so it looks for keys like 'pycmor_dask_cluster'. But the YAML 'pycmor:' section provides unprefixed keys like 'dask_cluster', which were silently ignored and fell back to defaults (e.g. dask_cluster defaulted to 'local' instead of 'slurm'). Fix by prefixing dict keys in _create_environments. Also adds custom_steps.py with vertical_integrate pipeline step and fixes grid_file path and max_jobs in the minimal example.

Fix two bugs where pipelines didn't get the Dask cluster assigned: 1. _post_init_create_pipelines appended a new Pipeline.from_dict(p) instead of the one that had cluster assigned 2. DefaultPipeline created at rule init time bypassed CMORizer cluster assignment — now handled in _match_pipelines_in_rules Switch example config from adaptive to fixed SLURM scaling to avoid race condition where adaptive scaler kills workers before .compute() submits the real Dask graph.

JanStreffing · 2026-04-02T21:06:52Z

I was able to run both tos and the more complex absscint with the lastest commit on this branch. We may want to work on #267, and certainly need to work on #265. But neither should block us from starting to build up more rules for the picontrol variables.

…ntested) - Rules for 20 of 28 core ocean variables in cmip7_awiesm3-veg-hr_ocean.yaml - New custom steps: load_gridfile (generic), compute_deptho, compute_sftof, compute_thkcello_fx, compute_masscello_fx (FESOM mesh-derived) - 6 new pipeline definitions for Ofx variables (fx_extract, fx_deptho, etc.) - namelist.io: vec_autorotate=.true., hnode output, daily sst/sss/ssh - Todo tracking and missing.md for variables FESOM cannot output - Research: FESOM uses potential temp (no bigthetao), MLD3 for mlotst, velocities need rotation, u/v on elem grid -> use unod/vnod Not yet tested — pipelines and rules need validation against actual data.

NOT TESTED — pipelines and custom steps need validation against real data. - New steps: compute_density (gsw/TEOS-10), compute_mass_transport (Boussinesq rho_0*dz), compute_zostoga (global thermosteric SL) - mass_transport_pipeline for umo/vmo/wmo - zostoga_pipeline using gsw for EOS computation - Rules for umo, vmo, wmo, zostoga in ocean rules file - gsw package installed in pycmor_py312 environment masscello(Omon) still needs density x hnode pipeline.

…(untested) - 8 sea ice rules (simass, siu, siv, sithick, snd, ts, siconc, sitimefrac) - siconc_pipeline (fraction_to_percent) and sitimefrac_pipeline (binary ice presence) - fraction_to_percent and compute_sitimefrac custom steps - Runnable sea ice config (examples/awiesm3-cmip7-seaice.yaml) - namelist.io: added h_ice, h_snow, ist (monthly) and a_ice (daily) - Moved missing.md and namelist.io up one level per user request - Removed old awiesm3-cmip7-example.yaml (superseded by ocean/seaice configs)

…iesm3-veg-hr

…th inherit - Add 45 CAP7 sea ice variable rules (direct mapping, scale, multi-variable compute, melt ponds, hemisphere integrals, stress tensor) - Add custom pipeline steps: scale_by_constant, integrate_over_hemisphere, compute_sispeed, compute_ice_mass_transport, compute_sistressave/max, compute_siflcondtop, compute_sihc, compute_sisnhc, compute_sitempbot, compute_sifb, compute_constant_field, compute_simpeffconc - Restructure all rules YAMLs into full runnable configs with general, pycmor, jobqueue, pipelines, and inherit sections - Move data_path into inherit section with YAML anchor for reuse in inputs.path across all rules - Update namelist.io with new monthly/daily diagnostics for CAP7 variables - Add CAP7 sea ice variables todo tracking (~89 variables, 45 done)

- Add 28 CAP7 ocean variable rules covering easy (pbo, volo, global means, squaring, wfo), medium (tob, sob, pso, phcint, scint, difvho/difvso, difmxylo, masso), decadal (7 variables), and hard (opottemptend) categories - Add custom pipeline steps: compute_square, extract_bottom, compute_surface_pressure - Full runnable config with inherit section (data_path anchor) - Comprehensive todo tracking ~147 CAP7 ocean variables (28 done, ~20 skipped, rest blocked or need model re-run)

…llo_dec, opottempmint, somint) Second pass over CAP7 ocean variables to identify what can be computed purely in pycmor post-processing. Adds volcello_fx and volcello_time custom steps and pipelines, plus rules for virtual salt flux, static/decadal cell volume, decadal cell mass, and yearly depth- integrated temperature and salinity.

…vsfcorr, mlotst_day, uos, vos) Add evap and relaxsalt to monthly output in namelist.io, and MLD3, unod, vnod to daily output. Write corresponding pycmor rules with scale_pipeline, surface_extract_pipeline, and direct mappings. Add extract_surface custom step for daily surface velocity extraction. Note: daily 3D unod/vnod output is very storage-heavy.

Adds a shared home for CF cell-measure fx variables so every config that references ``cell_measures: area: areacello`` / ``areacella`` can produce the companion fx file with a one-line pipeline reference instead of copy-pasting a 7-step pipeline into each yaml. std_lib/cell_measures.py * load_gridfile: open ``rule.grid_file`` as the data source (fx variables read a grid/mesh, not time-series output). * compute_areacello: read ``cell_area``/``cluster_area`` from the mesh Dataset produced by ``load_gridfile``. Works for any model whose mesh carries a per-cell surface-area field (FESOM, ICON, MPAS, ...). * compute_areacella: spherical-Earth formula on a regular lat/lon grid (R^2 * dlon * |sin(lat+dlat/2) - sin(lat-dlat/2)|). core/pipeline.py * AreacelloFxPipeline (FrozenPipeline): load_gridfile -> compute_areacello -> set_global/variable/coordinates -> map_dimensions -> save_dataset. Reference as ``uses: pycmor.core.pipeline.AreacelloFxPipeline``. * AreacellaFxPipeline: same shape for atmosphere. The same functions remain in examples/custom_steps.py unchanged for backward compatibility with existing configs; new configs should prefer the std_lib paths. examples/_verify_sidmassth.yaml * Switches the areacello pipeline from inline steps to ``uses: pycmor.core.pipeline.AreacelloFxPipeline``. Output is byte-identical apart from timestamp; QC state unchanged (1 polar-cell flag on both files, which is the known mesh quirk).

The atmospheric areacella is computed analytically from lat/lon on a regular grid -- it does not read a per-cell area field from a mesh the way areacello does. Existing per-config areacella pipelines use load_mfdataset + get_variable to pick up lat/lon from any model-output file; the FrozenPipeline now matches that pattern (instead of load_gridfile, which only made sense for the unstructured areacello case).

…ines Every CMIP7 run that produces variables with cell_measures: area: areacello (or areacella) must ship the referenced measure as a companion fx file. Now that pycmor.core.pipeline.AreacelloFxPipeline and AreacellaFxPipeline exist in std_lib, the copy-pasted 10-step pipelines across configs become single `uses:` lines. Fewer places to keep in sync and no absolute script:// paths to custom_steps.py. Migrations (inline steps -> uses:): * awi-esm3-veg-hr-variables/core_ocean -> AreacelloFxPipeline * awi-esm3-veg-hr-variables/core_land -> AreacellaFxPipeline * awi-esm3-veg-hr-variables/extra_land -> AreacellaFxPipeline (feeds areacellr) * examples/cmip7_core_ocean_core2_test -> AreacelloFxPipeline * examples/cmip7_core_land_tco95_test -> AreacellaFxPipeline * examples/cmip7_extra_land_tco95_test -> AreacellaFxPipeline * examples/awiesm3-cmip7-minimal -> AreacelloFxPipeline Additions (configs that referenced but did not ship the measure): * examples/awiesm3-cmip7-minimal: new areacello rule * awi-esm3-veg-hr-variables/core_atm: new areacella pipeline + rule Verified _verify_sidmassth.yaml still produces the same 1-finding (polar-cell) CF state on the areacello output, byte-parity otherwise.

Brings HR (awi-esm3-veg-hr-variables/*/cmip7_awiesm3-veg-hr_*.yaml) and LR (examples/cmip7_*_test.yaml) rule sets into parity; 16 of 17 topics are now byte-identical at the rule-structure level (same names, same compound_name, same pipeline assignments). The one exception is lrcs_ocean, which keeps 15 HR-only entries (msftm_density / msftmmpa_density / msftmmpa_depth + *_dec variants) as commented-out stubs in LR with a note explaining why (custom steps not yet implemented, decadal averages need a 10y+ run). Substantive changes: - cap7_aerosol: add ghg_scalar_pipeline + cfc11/cfc12/ch4/n2o_mon - cap7_atm: drop dead compute_hur_ml pipeline, read hur directly from new XIOS ml output - cap7_land: HR moved from 'pipeline:' (singular, non-schema) to the correct 'pipelines:' list form for 48 LPJ-GUESS rules - cap7_ocean: unify tauuo/tauvo on 3hr frequency - core_atm: LR gains areacella + AreacellaFxPipeline frozen pipeline - core_ocean/lrcs_ocean: attach scale_pipeline to mlotst{,_day} - lrcs_ocean: HR gains hfbasin/msftmz/sltbasin + pipelines - lrcs_seaice: LR gains 23 HR-only rules (rad_seaice, siconca(+day), sidragtop, sifl*top, sisnmass_*_si, siarea/siextent/sivol _day, regrid_atm_to_fesom_pipeline) Repointing: - HR yamls → /work/bb1469/a270092/runtime/awiesm3-develop/HR_test_01, year_start=year_end=1586 (new HR test run) - LR yamls → /work/bb1469/a270092/runtime/awiesm3-develop/LR_test_01, year_start=1900, year_end=1901 (previous LR test run) - All '*.fesom.<year>.nc' literal-year patterns regex-ified to '*\.fesom\..*\.nc' so the yamls work across any sim year Disabled outputs (matching XIOS file_def decisions): - 6hr model-level rules in cap7_atm (5 rules) - 3hr plev6 rules in veg_atm (5 rules) Both blocks commented with a pointer to doc/awi_cap7_volume_estimate.txt explaining the data-volume driver. atmos gn switch fallout: the HR and LR runs now write atmos on the native reduced Gaussian (cell=40320 at LR, cell=421120 at HR), lat/lon as auxiliary coords with bounds_lat/lon(cell, nvertex=4). Yamls carry grid_label=gn accordingly.

- examples/run_core_atm_hr.sh: sbatch wrapper that runs the HR core_atm production yaml locally on a compute node (Prefect server on compute node, HDF5 file-locking off, scratch-based TMPDIR), with output_directory rewritten to ./cmorized_output/core_atm_hr so it does not clash with a parallel LR run. - doc/awi_cap7_volume_estimate.txt: final DKRZ planning estimate derived from running estimate_data_volume_{lr,hr}.py against the current yaml set, with corrected TCo95/TCo319 reduced-Gaussian grid sizes (40320 / 421120 points, not the old 192x400 regular-grid assumption). Scenario: native atmos/land/veg, ocean/seaice native plus 1° (LR) / 0.25° (HR) regrid, 6hr_ml and 3hr_pl6 excluded, empirical 1.62x compression factor measured on real pycmor output.

JanStreffing · 2026-04-22T07:45:43Z

+    return ds
+
+
+def _attach_bounds_from_mesh(ds, rule, coord_names):


Check if generic enough for backend

JanStreffing · 2026-04-22T07:47:44Z

+    return any(v.chunks is not None for v in ds.data_vars.values())
+
+
+def _encoding_from_dask_chunks(ds, rule):


Deferred for later review

JanStreffing · 2026-04-22T08:02:40Z


-        if table_id is None:
-            # Fallback to user-provided
-            table_id = self.rule_dict.get("table_id", None)


Should be checked. Why deleted?

JanStreffing · 2026-04-22T08:04:49Z

+        """
+        return self.rule_dict.get("Conventions", "CF-1.11")
+
+    # ========================================================================


Should be a class. Do we need it?

JanStreffing · 2026-04-22T08:06:07Z

-        return "hdl:21.14100/" + str(uuid.uuid4())
+        """Generate a unique tracking ID (prefix overridable via rule_dict).
+
+        The CMIP7 tracking_id CV requires the ``hdl:21.14107/<uuid>`` prefix


Does that mean cmip6 wont work anymore?

JanStreffing · 2026-04-22T08:07:47Z

    approx_interval = drv.table_header.approx_interval
    frequency_str = _frequency_from_approx_interval(approx_interval)
    logger.debug(f"{approx_interval=} {frequency_str=}")
    # attach the frequency_str to rule, it is referenced when creating file name
    rule.frequency_str = frequency_str
    time_method = _get_time_method(drv.frequency)
    rule.time_method = time_method
+    # FESOM yearly files and concat'd hemispheric selects can yield a


Hallucination? xarray can sort itself?

JanStreffing · 2026-04-22T08:23:46Z


 ## Development Commands

+### Environment Setup


Delete before merge

Driven by a full sweep of LR test-run failures. pycmor std_lib: - cell_measures.compute_areacella now handles native reduced Gaussian / unstructured grids via per-cell bounds_lat/bounds_lon; returns 1D (cell,) instead of degenerate (cell, cell). Bounds broadcast along time by open_mfdataset is squeezed out. Resolves 'Bad chunk sizes' on TCo95 and the 258 GB OOM on TCo319 (the old code asked for a 421120x421120 materialization). - AreacellaFxPipeline drops get_variable so compute_areacella sees the Dataset with its bounds variables. - chunking + variable_attributes skip _FillValue / missing_value on CF flag variables (flag_values/flag_meanings); missing_value cast to integer dtype now checks iinfo bounds and skips on overflow. Resolves 'basin' OverflowError. examples/custom_steps.py: - _load_secondary_mf matches via re.fullmatch against os.listdir (consistent with pycmor's regex-based primary gather_inputs). - compute_hur_plev recognises additional plev coord names (pressure_levels, plev39/plev7h/plev8). Config fixes: - snd_day: add second_input_path/_pattern/_variable for rsn. - 4 WMGHG scalars (cfc11/cfc12/ch4/n2o): branding suffix tavg-u-hm-air -> tavg-u-hm-u (dreq v1.2.2.2). - lrcs_seaice: add missing oifs_data_path &odp anchor; sisnmass NH/SH hm-si -> hm-u. - 6 atm yamls: normalise secondary-input patterns from glob (*.nc) to regex (.*\.nc), matching primary pattern convention.

The default flox path for resample().first/mean() routes through flox's numbagg backend, which JIT-compiles each aggregator via numba on first use. On HR runs the compile takes ~30 s per (aggregator, dtype, worker) triple, and that cost is repaid for every fresh Dask worker process. tasmax_mon on TCo319 spent 612 s in trigger_compute almost entirely inside numba compile — save_dataset itself took 1 s. Make "numpy" the default engine (vectorised, zero JIT cold-start), and add a "flox_engine" knob (rule attribute or pycmor-config) for rules that genuinely benefit from numba — we currently have none. Measured on the minimal bench (examples/cmip7_slow_write_bench_hr.yaml, tasmax_mon, TCo319, year 1586, 4 workers): default (numbagg): rule total ~615 s, trigger_compute 612 s flox_engine=numpy: rule total ~7 s, trigger_compute 3.5 s Also drops the two sbatch wrappers used to run the comparison and the minimal yaml, so the bench is reproducible.

Write path was hardcoded to zlib-1 + shuffle and wrapped in scheduler="synchronous". On the old PyPI-netCDF4 stack that was fine because the bundled HDF5 is not thread-safe and libnetcdf had no alternate codecs anyway. On a thread-safe HDF5 build with a modern libnetcdf that has zstd/blosc filters, both restrictions are leaving most of the write throughput on the table. Add two rule-level knobs (with pycmor-config fallbacks): - netcdf_compression_codec: one of zlib (default), zstd, blosc_lz, blosc_lz4, blosc_lz4hc, blosc_zlib, blosc_zstd, bzip2, szip. Sets the `compression=` encoding kwarg that netCDF4-python passes through to libnetcdf. - netcdf_write_scheduler: dask scheduler used around save_mfdataset (default "synchronous" — safe; "threads" wins when HDF5 is built threadsafe). Wired through both chunk-encoding paths: - _calculate_netcdf_chunks → get_encoding_with_chunks - _encoding_from_dask_chunks (dask-aligned writes) Measured on the wap_day bench (HR TCo319, 9.1 GB input, 1 year): zlib-1 + shuffle, sync scheduler (old, bundled stack) ........ 22 MB/s zlib-1 + shuffle, threaded (new env) ......................... 25 MB/s blosc_zstd-3 + shuffle, threaded, dask=4 ..................... 56 MB/s blosc_zstd-3 + shuffle, threaded, dask=1 + BLOSC_NTHREADS=16 . 106 MB/s Also drops the tasmax_mon bench (it isolated flox's numba cold-start, already fixed) and adds a wap_day bench pair (sync netCDF4 + system netCDF4 variants) that exercises the new knobs end-to-end.

…ords attr) Seen together on a 9 GB wap file produced for the HR core-atmosphere benchmark. Each bug is fixable in isolation but all three were biting the same file. files.py :: _encoding_from_dask_chunks Mirror the _FillValue logic already present in get_encoding_with_chunks into the dask-aligned encoding path. Without this, large dask-backed float32 data variables (e.g. wap(time, plev19, cell)) were written with the xarray default fill of NaN instead of the CMIP-required 1e20, producing a `_FillValue != missing_value` CF §2.5 finding on every dask-path output. files.py :: _ensure_lat_lon_bounds_impl Also accept XIOS-style ``bounds_<coord>`` bounds variables (as emitted by IFS output) in addition to the CF-standard ``<coord>_bnds`` form; rename to the canonical name and fix up the ``bounds`` attr. Without this, the bounds attr on lat/lon pointed at a variable that was never promoted through the pipeline. generic.py :: get_variable When a selected model variable's coord has a ``bounds`` attr, attach the bounds variable as a coord on the returned DataArray so it survives downstream steps and is emitted in the final save. XIOS stores bounds as data_vars with an extra ``nvertex`` dim; simple ``data[var_name]`` indexing drops them. Wrapped in try/except so exotic bounds (e.g. time_bounds with ``axis_nbounds``) that xarray refuses to attach as coords are silently skipped. files.py :: _ensure_coordinates_attr (new save-time pass) Rebuild the ``coordinates`` attribute of each data variable from the current dim/coord names at save time. ``set_coordinate_attributes`` runs early in the pipeline (before ``map_dimensions``); a rename afterwards (e.g. vertical coord ``pressure_levels`` -> ``plev19``) would otherwise leave the attribute pointing at a variable that no longer exists. Wired into _ensure_lat_lon_bounds_and_external_vars alongside the existing external_variables pass so every save path picks it up. Verified on examples/_verify_sidmassth.yaml -- no regression (still CF 1 + wcrp_cmip7 8 findings, same as before).

The previous attempt attached bounds as coords of the selected DataArray, but xarray rejects coords whose dims are not a subset of the target (e.g. ``bounds_lat(cell, nvertex)`` on ``wap(time, plev19, cell)``) and returning a Dataset from ``get_variable`` broke every downstream step that relied on ``DataArray.name`` (e.g. ``scale_by_constant``). New strategy: leave ``get_variable`` alone and recover the referenced bounds variable at save time. When ``_ensure_lat_lon_bounds_impl`` sees a ``lat.bounds`` / ``lon.bounds`` attribute pointing at a variable that is not in the live dataset, open the first file named by ``rule.inputs``, locate the bounds variable (candidates: the declared name, ``bounds_<coord>``, ``<coord>_bnds``), verify the first-dim size matches the coord size, and re-attach as ``<coord>_bnds``. Fills the gap for CF §7.1 compliance on HR IFS atmospheric output without touching the pipeline flow.

CF 1.11 §7.1 is explicit that bounds variables must not carry their own attributes -- they inherit from the parent coordinate. Both _attach_bounds_from_mesh (FESOM path) and _recover_bounds_from_inputs (XIOS path) were setting ``units='degrees'`` on the emitted bounds DataArray, triggering §7.1 findings on every unstructured output: 'lat_bnds' has attr 'units' 'degrees' that does not agree with its associated variable ('lat')'s attr value 'degrees_north' ... The Boundary variables 'lat_bnds' should not have the attributes: '['units']' Pass an empty attrs dict in both emitters. Verified on sidmassth -- CF findings stay at 1 (polar-cell recommendation) and bounds vars now carry only the ``coordinates`` attribute inherited via xarray.

libnetcdf >= 4.9 exposes ``quantize_mode`` + ``significant_digits`` encoding knobs for bit-level lossy quantization. Turn on BitGroom quantization with 5 significant digits by default for float data variables, which gives ~30-50% file-size reduction on top of zlib/BLOSC with no measurable impact on typical analyses. Apply in all three encoding builders -- ``get_encoding_with_chunks`` (chunking.py), ``_encoding_from_dask_chunks`` (dask-aligned path), and ``_calculate_netcdf_chunks`` (simple path) -- so the behaviour is consistent regardless of which write path a rule goes through. Skip cases that must remain bit-exact: * ``*_bnds``, ``*_bounds`` and ``bounds_*`` variables (CF §7.1 requires bounds values to agree exactly with the parent coord). Also prevents libnetcdf stamping a ``_QuantizeBitGroom...`` attribute on bounds, which was tripping the CF §2.3 naming check. * Integer flag / index variables (``dtype.kind != 'f'``). * Coordinate variables (not in ``ds.data_vars``). Opt-out per rule via ``netcdf_quantize_mode: null``; customise sig digits via ``netcdf_significant_digits``. Defaults were chosen to be safe for CMIP-class output where 5 sig digits is well above the precision of any model calculation.

…es. as well as commenting out dmoc for now

…CMOR_HOME - core/validate.py + core/utils.py: expand $VARS (and ~ on the loader side) before resolving script:// paths, so configs can reference custom-step scripts via portable env vars instead of hard-coded absolute paths - migrate 40 example/ and awi-esm3-veg-hr-variables/ yamls from /work/ab0246/a270092/software/pycmor/... to $PYCMOR_HOME/... - absolute paths and ~/... still work; unset env vars produce a clear "Must be a valid file path" validator error early

JanStreffing added 5 commits March 30, 2026 11:12

docs: Improve reproduction section with actual debugging experience

61c1e9a

Add step-by-step failure scenario showing: - Silent failure symptoms - Root cause discovery process (3 layered bugs) - Log output at each debugging stage - Key symptoms and workarounds

docs: Clarify what's fixed vs architectural issues that remain

f952af4

The branch fixes immediate bugs (silent failure, config ignored) but architectural issues persist (cmip6_table dependency, partial matching).

JanStreffing requested review from christian-stepanek, mandresm, nwieters, pgierz and siligam April 2, 2026 12:55

JanStreffing changed the base branch from main to prep-release April 2, 2026 12:55

style: run black from project root to fix all formatting issues

5617a18

Fixes trailing whitespace on blank lines in cmorizer.py and reformats several other files to be consistent with black when run from root.

JanStreffing force-pushed the feat/cmip7-awiesm3-veg-hr branch from 1a42875 to 5617a18 Compare April 2, 2026 13:18

JanStreffing added 6 commits April 2, 2026 15:24

Merge origin/prep-release into fix/cmip7-pipeline-fixes

0ac7a3e

Resolves merge conflicts in cmorizer.py and global_attributes.py, keeping CMIP7_DReq_metadata feature and integrating prep-release compound_name table_id derivation logic.

fix: Remove unused json and Path imports from test file

936c996

style: reformat with black 24.8.0 for CI compatibility

5104773

adding minimal example for awiesm3-veg-hr

27c092c

JanStreffing added 10 commits April 3, 2026 09:19

style: reformat custom_steps.py with black 24.8.0

010090f

adding wip namelist.io

0ecb485

Merge remote-tracking branch 'origin/prep-release' into feat/cmip7-aw…

6cd1624

…iesm3-veg-hr

JanStreffing added 5 commits April 21, 2026 21:58

JanStreffing commented Apr 22, 2026

View reviewed changes

Comment thread CLAUDE.md

## Development Commands

### Environment Setup

Copy link
Copy Markdown

Contributor Author

JanStreffing Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete before merge

JanStreffing added 2 commits April 22, 2026 11:32

JanStreffing force-pushed the feat/cmip7-awiesm3-veg-hr branch from 105e934 to 0700690 Compare April 22, 2026 10:07

JanStreffing added 11 commits April 22, 2026 15:18

add wip verification

6959c2e

add full hr test runner script

a61be16

update HR yamls with performance optims and removal of double pipelin…

0c31f9b

…es. as well as commenting out dmoc for now

env checker to ensure we run on env with hdf5 and nc from hpc

09c48bc

add step rechunk_time

ae3a92f

update all example scripts for lr and hr benchmarking

a6b229e

This was referenced Apr 27, 2026

Bug: lat/lon coordinates written as float32 instead of required float64 #269

Open

fix: cast lat/lon coordinates to float64 at CMOR writer level #270

Open

JanStreffing and others added 2 commits April 28, 2026 09:42

feat(land): add 75 LPJ-GUESS CAP7/VEG/EXTRA rules and depth/pool loaders

1927d89

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/cmip7 awiesm3 veg hr#266

Feat/cmip7 awiesm3 veg hr#266
JanStreffing wants to merge 153 commits intoprep-releasefrom
feat/cmip7-awiesm3-veg-hr

JanStreffing commented Apr 2, 2026 •

edited

Loading

Uh oh!

JanStreffing commented Apr 2, 2026

Uh oh!

JanStreffing Apr 22, 2026

Uh oh!

JanStreffing Apr 22, 2026

Uh oh!

JanStreffing Apr 22, 2026

Uh oh!

JanStreffing Apr 22, 2026

Uh oh!

JanStreffing Apr 22, 2026

Uh oh!

JanStreffing Apr 22, 2026

Uh oh!

JanStreffing Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		return ds


		def _attach_bounds_from_mesh(ds, rule, coord_names):

		return any(v.chunks is not None for v in ds.data_vars.values())


		def _encoding_from_dask_chunks(ds, rule):

Conversation

JanStreffing commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CMIP7 cmorization for AWI-ESM3-VEG-HR

Key changes

Test plan

Uh oh!

JanStreffing commented Apr 2, 2026

Uh oh!

JanStreffing Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

JanStreffing Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

JanStreffing Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

JanStreffing Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

JanStreffing Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

JanStreffing Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

JanStreffing Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JanStreffing commented Apr 2, 2026 •

edited

Loading