Submission checker reads `combined_params` / `params_dict`, but writer emits `parameters` / `override_parameters` (3.0.26)


## Summary

In `mlpstorage` 3.0.26 the `BaseBenchmark.metadata` writer and the
`submission_checker` readers disagree about the top-level keys under which
the merged DLIO parameters and user-override parameters are stored in
`*_metadata.json`. The writer was renamed as part of the fix for #365
(`combined_params` → `parameters`, `params_dict` → `override_parameters`)
but several reader sites in `submission_checker/checks/` still look up the
old names. As a result, every check that depends on these dicts silently
sees `{}` and either:

- emits a spurious **VIOLATION** ("dataset parameters not found in
  metadata", "record length is 0", "datagen size 0.00GiB < required …"), or
- silently no-ops a CLOSED/OPEN allow-list check that should have fired,

depending on whether the check fails-open or fails-closed when the dict is
empty.

This blocks `mlpstorage validate` from passing on any locally-produced
training submission, even when the run itself is correct.

## Environment

| | |
|---|---|
| `mlpstorage` version | 3.0.26 |
| Install | `pip install mlpstorage` into a venv |
| Affected commands | `mlpstorage validate`, `mlpstorage reports reportgen` (partial — reportgen also uses `parameters`, so it works) |


## Reproduction

1. Run any training benchmark to completion (e.g. `mlpstorage training run --model unet3d ...`) so that
   `closed/<orgname>/results/<system>/training/unet3d/run/<ts>/training_unet3d_metadata.json` is
   written.
2. Confirm the JSON contains the **new** key names:
   ```bash
   python -c 'import json,sys; m=json.load(open(sys.argv[1])); print(sorted(k for k in m if "param" in k))' \
     closed/<orgname>/results/<system>/training/unet3d/run/<ts>/training_unet3d_metadata.json
   # → ['override_parameters', 'parameters']
   ```
3. Run `mlpstorage validate <results-dir>`.
4. Observe spurious violations:
   - `[3.1.1] trainingVerifyDatasizeUsage: dataset parameters not found in metadata`
   - `[3.1.2] trainingRecalculateDatasetSize: record length is 0, cannot calculate dataset size`
   - `[3.2.1] trainingDatagenMinimumSize: datagen size 0.00GiB is less than required 0.00GiB`

## Root cause

The writer at
[`mlpstorage_py/benchmarks/base.py`](mlpstorage_py/benchmarks/base.py) lines 408–420
deliberately stores the merged config under `parameters` and the user-overrides under
`override_parameters` (this rename is referenced in the inline comment that
calls out #365):

```python
# Parameters - YAML defaults with CLI overrides folded in (fixes #365).
# combined_params alone omits overrides added after __init__ (e.g.
# checkpoint.num_checkpoints_*), causing split-phase runs to double-count.
if hasattr(self, 'combined_params'):
    metadata['parameters'] = self._apply_dotted_overrides(
        self.combined_params, getattr(self, 'params_dict', {}))
else:
    metadata['parameters'] = {}

# Override parameters - user-specified overrides only
if hasattr(self, 'params_dict'):
    metadata['override_parameters'] = self.params_dict
else:
    metadata['override_parameters'] = {}
```

But the reader-side checks under `mlpstorage_py/submission_checker/checks/`
still call `metadata.get("combined_params", {})` / `metadata.get("params_dict", {})`.
Because the lookups silently default to `{}`, the downstream code paths
either crash with a `record_length == 0` early-exit or pass the wrong
allow-list check.

### Affected reader sites

| File | Line | Current lookup | Should be |
|---|---|---|---|
| `submission_checker/checks/training_checks.py` | 227 | `metadata.get("combined_params", {})` | `metadata.get("parameters", {})` |
| `submission_checker/checks/training_checks.py` | 238 | `metadata.get("combined_params", {})` | `metadata.get("parameters", {})` |
| `submission_checker/checks/training_checks.py` | 579 | `metadata.get("params_dict", {})` | `metadata.get("override_parameters", {})` |
| `submission_checker/checks/training_checks.py` | 642 | `metadata.get("params_dict", {})` | `metadata.get("override_parameters", {})` |
| `submission_checker/checks/checkpointing_checks.py` | 160 | `metadata.get("combined_params", {})` | `metadata.get("parameters", {})` |
| `submission_checker/checks/checkpointing_checks.py` | 223 | `metadata.get("params_dict", {})` | `metadata.get("override_parameters", {})` |
| `submission_checker/checks/checkpointing_checks.py` | 504 | `metadata.get("params_dict", {})` | `metadata.get("override_parameters", {})` |
| `submission_checker/checks/vdb_checks.py` | 880 | `metadata.get("params_dict", {})` | `metadata.get("override_parameters", {})` |
| `submission_checker/checks/vdb_checks.py` | 941 | `metadata.get("params_dict", {})` | `metadata.get("override_parameters", {})` |

The tests under `tests/test_training_check_tool_injected_params.py` and
`tests/test_vdb_checks.py` construct fixture metadata using the **old**
key names (`params_dict`, `combined_params`), so the existing test suite
does not catch this — the tests test the checker against itself, not
against the actual writer output.

## Expected behaviour

`mlpstorage validate` against a clean, locally-produced submission tree
should pass all "parameter shape" checks (3.1.1, 3.1.2, 3.2.1, 3.6.2,
3.6.3 in the training suite; the corresponding checkpointing/VDB rules)
without manual JSON post-processing.


File	Line	Current lookup	Should be
`submission_checker/checks/training_checks.py`	227	`metadata.get("combined_params", {})`	`metadata.get("parameters", {})`
`submission_checker/checks/training_checks.py`	238	`metadata.get("combined_params", {})`	`metadata.get("parameters", {})`
`submission_checker/checks/training_checks.py`	579	`metadata.get("params_dict", {})`	`metadata.get("override_parameters", {})`
`submission_checker/checks/training_checks.py`	642	`metadata.get("params_dict", {})`	`metadata.get("override_parameters", {})`
`submission_checker/checks/checkpointing_checks.py`	160	`metadata.get("combined_params", {})`	`metadata.get("parameters", {})`
`submission_checker/checks/checkpointing_checks.py`	223	`metadata.get("params_dict", {})`	`metadata.get("override_parameters", {})`
`submission_checker/checks/checkpointing_checks.py`	504	`metadata.get("params_dict", {})`	`metadata.get("override_parameters", {})`
`submission_checker/checks/vdb_checks.py`	880	`metadata.get("params_dict", {})`	`metadata.get("override_parameters", {})`
`submission_checker/checks/vdb_checks.py`	941	`metadata.get("params_dict", {})`	`metadata.get("override_parameters", {})`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Submission checker reads `combined_params` / `params_dict`, but writer emits `parameters` / `override_parameters` (3.0.26) #598

Summary

Environment

Reproduction

Root cause

Affected reader sites

Expected behaviour

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development


`mlpstorage` version	3.0.26
Install	`pip install mlpstorage` into a venv
Affected commands	`mlpstorage validate`, `mlpstorage reports reportgen` (partial — reportgen also uses `parameters`, so it works)

Uh oh!

Submission checker reads combined_params / params_dict, but writer emits parameters / override_parameters (3.0.26) #598

Description

Summary

Environment

Reproduction

Root cause

Affected reader sites

Expected behaviour

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Submission checker reads `combined_params` / `params_dict`, but writer emits `parameters` / `override_parameters` (3.0.26) #598