KV cache Rules for closed and open submission#602
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
|
@FileSystemGuy There seems to be some sort of automated rules checking going on. Can you update the system that drives this to be compatible with @hazemawadalla's additions? |
|
@dslik I don't understand, what does "to be compatible with" mean? What action are you thinking should happen but isn't, or what action is failing that you think should happen? |
The automated tests check every section of the rules against existing tests (or want OUT_OF_SCOPE_RULES marked), causing the tests to fail against this check-in, even though it only modifies a markdown file. |
|
@dslik Ah, yes the new mlpstorage checks the Rules.md file for new requirements that it has not been told about. That's supposed to force me to not forget to build out the mlpstorage support for any rules changes. I'll dig into it in more detail after the evening WG meeting. |
|
@hazemawadalla @dslik Claude has noticed that:
Could you please confirm that --inter-option-delay = 20 is the correct value for CLOSED and the correct default for OPEN? |
|
Two spec clarifications from offline discussion, so they're recorded here for reviewers: 1. 2. Four-part §6.3 IDs ( I filed #653 to relax the regex to accept N-part IDs. That should be merged first, then this PR's CI will surface the full 19-rule unmapped set, and we can decide per-rule between real TL;DR to unblock merge:
Happy to draft either the #653 fix or the coverage_mapping companion once the WG signs off on the direction. |
…rsers (#656) * test(#653): RED — rules_coverage regex silently skips 4-part IDs Injects a Rules.md line with a 4-part ID (6.3.99.1) and asserts it reaches result['unmapped']. Fails on the current regex, which only matches 3-part IDs — the twelve §6.3.x.x rules added by PR #602 are silently skipped rather than surfacing as unmapped, so the coverage gate can never enforce them. * fix(#653): allow N-part IDs in rules_coverage discovery regex The Phase-3 regex ^([23456]\.\d+\.\d+)\. only matched 3-part IDs, so PR #602's twelve §6.3.x.x kvcache rules (6.3.1.1, 6.3.2.1, 6.3.4.5, etc.) were silently skipped rather than surfacing as unmapped — meaning the coverage gate could never enforce them. Relax the tail from \.\d+\.\d+ to \.\d+(?:\.\d+)+ so 3-, 4-, or deeper nested IDs all match. Downstream consumers (STUB_COVERAGE lookup, unmapped-set membership, CLI table rendering) treat the ID as an opaque string, so this is a strict superset of the prior behavior. Also update the doc-comment in test_rules_coverage.py that restates the locked regex. Follow-up (not this PR): tests/test_definition_of_done.py:129,150 uses [234]\.\d+\.\d+ to parse violation log lines like [2.1.2 someName]. Still 3-part-shape assumption. Currently limited to §2/§3/§4 so no kvcache impact today, but when §6 real checkers emit violations tagged with 4-part IDs, that parser will need the same relaxation. * test(#653): RED — violation-log extractors drop §5/§6 and 4-part IDs Adds TestExtractorRegexParity with four unit tests against synthetic BaseCheck.log_violation-formatted lines: - extract_ids / extract_pairs each drop §5 and §6 IDs because the character class [234] excludes them (a live regression: any VDB §5 violation is invisible to the good/bad fixture assertions). - extract_ids / extract_pairs each drop 4-part IDs because the dotted tail \.\d+\.\d+ pins the shape to 3-part. Both bugs are the log-line-parsing counterpart to the rules_coverage discovery regex fix in the previous commit. * fix(#653): allow §5/§6 and N-part IDs in violation-log extractors Applies the same two-axis regex relaxation the previous commit made to rules_coverage: - Character class [234] → [23456] so §5 (VDB, live) and §6 (kvcache, landing) violations parse. Before the fix any VDB §5 violation was silently dropped from _extract_error_rule_ids, so the good/bad fixture assertions in TestDefinitionOfDone* couldn't observe them — a live regression, not a speculative future concern. - Dotted tail \.\d+\.\d+ → \.\d+(?:\.\d+)+ so PR #602's §6.3.x.x 4-part IDs parse. Both extractors take the same shape change. All four RED tests from the previous commit now pass; the 4-suite parallel sweep is clean.
* test(#655): RED — DoD fixture lacks vector_database and kv_cache subtrees Adds five failing tests exercising the fixture-coverage gap in build_submission (storage#655): - TestBuildSubmissionVdbKvcacheSubtrees (2 unit tests): the sealed kwargs guard rejects include_vdb and include_kv_cache — required before adding the subtrees. - TestDefinitionOfDoneVdb (2 integration tests): bad-fixture case engineering vdb_missing_metric_field='p95_latency_ms' to trip 5.3.4 vdbMetricsReported end-to-end; good-fixture case guarding against spurious §5 violations. - TestDefinitionOfDoneKvcache (1 integration test): good kv_cache fixture routes through MODE_TO_CHECKERS without tripping [2.1.10 workloadCategories] as an unrecognized mode. All five fail on unknown-override TypeError until include_vdb, include_kv_cache, and vdb_missing_metric_field are added to conftest.build_submission. * fix(#655): add vector_database and kv_cache subtrees to build_submission Closes the DoD fixture-coverage gap that made VdbCheck (17 real §5 @rule bindings) and KVCacheCheck (§6 stub) invisible to the end-to-end validator pipeline. conftest.build_submission gains three new kwargs: - include_vdb=True — creates results/<sys>/vector_database/milvus/ DISKANN/{datagen,run}/<ts>/ per loader.py:207-238. Five run timestamps (5.3.1 vdbRunCount requires exactly 5), each with a §5-conformant summary.json + metadata.json + a fs_separation.json CAP-03 sidecar (5.4.2 vdbFilesystemCheck). - include_kv_cache=True — creates results/<sys>/kv_cache/llama3.1-8b/ {datagen,run}/<ts>/ per loader.py:191-205. KVCacheCheck is a stub so the good-fixture case only needs to route through MODE_TO_CHECKERS without tripping [2.1.10 workloadCategories]; the fixture is scaffolded so a follow-up (once PR #602 gives §6 real bindings) can add a kv_cache_missing_field knob mirroring vdb_missing_metric_field. - vdb_missing_metric_field=<name> — pops the named field from the vdb run summary.json to engineer 5.3.4 vdbMetricsReported for the bad-fixture case. Also updates the stale docstring at vdb_checks.py:150-155 that described "Phase 4 land time" loader behavior — issue #612 has since made Loader.load() explicitly fill run_files / datagen_files for both vector_database and kv_cache modes; the guard remains as a defensive walk for test-time SubmissionLogs fakes.
Updated the inter-option delay for CLOSED submissions from 90 seconds to 20 seconds.
* test(#658): RED — PR #602 §6 KVCache IDs unmapped in coverage_mapping Simulates PR #602's Rules.md state (19 new §6 rule-ID lines appended to a fake Rules.md) and asserts none of the IDs surface in reconcile()['unmapped']. Fails now because STUB_COVERAGE['KVCacheCheck'] and OUT_OF_SCOPE_RULES are both empty for §6 — matching the state at which PR #602's merge would break test_reconcile_returns_no_unmapped_ids. Next commit populates the mapping; this test flips GREEN. * fix(#658): GREEN — advertise PR #602 §6 KVCache rules in coverage_mapping Populates two registries in advance of PR #602's merge so the coverage gate (test_reconcile_returns_no_unmapped_ids) does not flip red the moment the new Rules.md §6 IDs land: * STUB_COVERAGE['KVCacheCheck'] gains the seventeen enforceable §6 IDs (6.3.1.1, 6.3.2.x, 6.3.3.x, 6.3.4.x, 6.4.1, 6.5.x, 6.6.1-2). Real @rule bindings on KVCacheCheck are a follow-up phase; until then PR #602's own enforcement-status note applies — the §6 rules are enforced by the run-time CLI locks (6.3.2.1) and manual review. * OUT_OF_SCOPE_RULES gains §6.4.2 (descriptive POSIX I/O-model prose) and §6.6.3 (OPEN-narrative example invocation). Neither carries a submission-validator contract; §6.6.3's 'record exact invocation' clause is satisfied by the OPEN run's normal metadata capture. test_baseline_no_stale_entries is renamed to test_baseline_no_unexpected_stale_entries and allow-lists the PR #602 anticipation window (the 19 IDs that exist in the registries but not yet in live Rules.md). Drift warnings for those IDs fire in CI as an intentional signal until PR #602 lands, at which point the allow-list naturally collapses to the empty set. Any OTHER stale entry still fails the test. Docstring updates on KVCacheCheck and coverage_mapping's module-level docstring reflect the new post-PR-602 posture (§6 no longer 'empty at land time'). No behavior change to KVCacheCheck itself: it still emits zero violations. Sweep: 2611 + 822 + 155 + 238 = 3826 tests pass, 0 fail.
|
Hey Curtis, |
|
@hazemawadalla @dslik If you want 90 instead of 20, I'll change it, I just needed to know which one you want. The new mlpstorage command cross checks all the rules in Rules.md against the code that implements those rules in the validation checker. Its good s/w engineering. |
* test(#664): expect --inter-option-delay=90 + subprocess safety net Per storage#664 the CLOSED --inter-option-delay pin moves from 20 s to 90 s (author confirmed 2026-07-03 for PR #602 §6.3.2.1). Update tests/unit/test_cli_kvcache.py + tests/unit/test_benchmarks_kvcache.py to assert the new value; runtime alignment follows in the next commit. Also add an autouse fixture on TestClosedEnforcement that patches _execute_command, _probe_results_dir_shared, and _interruptible_sleep for every test in the class. The *_returns_1 tests rely on the CLOSED guard short-circuiting before any subprocess launches, so they do not mock these collaborators. When the runtime and the tests drift (as they do during this rename), the guard misses and _execute_run falls into the real mpirun / ssh-probe / aggregation path — on localhost with kvcache_bin_path=None that expands into a fork storm heavy enough to OOM a 32 GB host. The fixture ensures a guard miss can only cause a clean assertion failure, never runaway forking. * fix(#664): move CLOSED --inter-option-delay enforcement from 20 s to 90 s PR #602 §6.3.2.1 fixes the CLOSED --inter-option-delay at 90 s (author hazemawadalla confirmed 2026-07-03). The runtime CLI, effective-value fallback, help text, and ManPage still enforced/documented 20 s — an artifact of a temporary reconciliation while the author was unresponsive. Bring them all in line with the spec. Sites changed: - mlpstorage_py/benchmarks/kvcache.py — CLOSED hard-fail, error text, and effective-value fallback all move 20 -> 90 - mlpstorage_py/cli/kvcache_args.py — closed set_defaults 20 -> 90 - mlpstorage_py/run_summary.py — effective-value fallback 20 -> 90 - mlpstorage_py/cli/help_formatter.py — CLOSED pinned-defaults help text inter-option-delay 20s -> 90s - ManPage.md — closed pins list inter-option-delay 20 -> 90 Rules.md §6.3.2.1 is updated on PR #602's branch as a companion change. Behavior change: CLOSED submissions that previously would have accepted --inter-option-delay 20 will now hard-fail; they must use 90.
Author (hazemawadalla) confirmed on 2026-07-03 that the intended CLOSED inter-option-delay is 90 s, not 20 s. The 20-s value was a temporary reconciliation with shipped code while the author's intent was unresolved. Runtime CLI + tests are already aligned on main via storage#665 (merged 2026-07-03); this restores the spec to match. Sites: §6.3.2.1 (kvcacheClosedSequenceLocks) and §6.6.1 (kvcacheClosedImmutable) both mention the 20 s value inline.
…atches
The two drift-simulation tests replaced the whole OUT_OF_SCOPE_RULES /
STUB_COVERAGE dict with a synthetic stale entry, which worked while
Rules.md had no §6 rules to satisfy. Once this PR adds §6, the bare
replacement drops the real 6.4.2 / 6.6.3 OOS mappings and the seventeen
KVCacheCheck stub mappings, so every §6 ID falls into the unmapped set
and the "drift warnings must not affect unmapped set" assertion breaks.
Fix: extend the real dict with the synthetic stale entry via
`{**REAL, "stale.id": ...}` instead of replacing it. The synthetic
stale entry still triggers the drift warning; the real mappings still
keep the §6 IDs mapped; the unmapped-set invariant holds.
No description provided.