Skip to content

KV cache Rules for closed and open submission#602

Merged
FileSystemGuy merged 8 commits into
mlcommons:mainfrom
hazemawadalla:rules-pr
Jul 3, 2026
Merged

KV cache Rules for closed and open submission#602
FileSystemGuy merged 8 commits into
mlcommons:mainfrom
hazemawadalla:rules-pr

Conversation

@hazemawadalla

Copy link
Copy Markdown
Contributor

No description provided.

@hazemawadalla hazemawadalla requested a review from a team June 30, 2026 19:01
@github-actions

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@dslik

dslik commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@FileSystemGuy There seems to be some sort of automated rules checking going on. Can you update the system that drives this to be compatible with @hazemawadalla's additions?

@FileSystemGuy

Copy link
Copy Markdown
Contributor

@dslik I don't understand, what does "to be compatible with" mean? What action are you thinking should happen but isn't, or what action is failing that you think should happen?

@dslik

dslik commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@dslik I don't understand, what does "to be compatible with" mean? What action are you thinking should happen but isn't, or what action is failing that you think should happen?

The automated tests check every section of the rules against existing tests (or want OUT_OF_SCOPE_RULES marked), causing the tests to fail against this check-in, even though it only modifies a markdown file.

@FileSystemGuy

Copy link
Copy Markdown
Contributor

@dslik Ah, yes the new mlpstorage checks the Rules.md file for new requirements that it has not been told about. That's supposed to force me to not forget to build out the mlpstorage support for any rules changes. I'll dig into it in more detail after the evening WG meeting.

@FileSystemGuy

Copy link
Copy Markdown
Contributor

@hazemawadalla @dslik Claude has noticed that:

  1. Internal contradiction on inter-option-delay. §6.3.2.1 declares --inter-option-delay = 90 s, but §6.6.1 references "sequence locks of 6.3.2.1 (seed 42, trials 3, inter-option-delay 20 s, no --config)." Only one can be right.

  2. Rules disagree with code. mlpstorage_py/benchmarks/kvcache.py:247 hard-fails any CLOSED run whose inter_option_delay != 20, and the CLI default is 20 (cli/kvcache_args.py:214). §6.6.1's 20 s matches the shipped code; §6.3.2.1's 90 s does not. If 90 s is the actual intent, kvcache.py:247–262 must change too.

Could you please confirm that --inter-option-delay = 20 is the correct value for CLOSED and the correct default for OPEN?

@FileSystemGuy

Copy link
Copy Markdown
Contributor

Two spec clarifications from offline discussion, so they're recorded here for reviewers:

1. --inter-option-delay = 20 seconds (CLOSED).
§6.3.2.1 in this PR currently reads --inter-option-delay = 90 s; §6.6.1 correctly says 20 s. The shipped code (mlpstorage_py/benchmarks/kvcache.py:247 hard-fails on any CLOSED value ≠ 20; mlpstorage_py/cli/kvcache_args.py:214 defaults to 20) is authoritative. Please update §6.3.2.1 to 20 s so it matches §6.6.1 and the code.

2. Four-part §6.3 IDs (6.3.1.1, 6.3.2.1, 6.3.4.5, etc.) are intentional and stay.
The current rules_coverage.py:54 discovery regex only matches 3-part IDs, so it silently skips the twelve §6.3 rules — which is why this PR's CI failure lists only 7 unmapped IDs (§6.4/§6.5/§6.6) instead of 19. The §6.3 rules aren't unmapped, they're undiscovered; without a regex fix, the coverage gate can never enforce them.

I filed #653 to relax the regex to accept N-part IDs. That should be merged first, then this PR's CI will surface the full 19-rule unmapped set, and we can decide per-rule between real @rule bindings and OUT_OF_SCOPE_RULES entries (my guess: 6.4.2 and 6.6.3 are OPEN-narrative and belong out-of-scope with a reason; the rest should get real bindings — but that's a follow-on PR, not something this doc-only PR needs to carry).

TL;DR to unblock merge:

Happy to draft either the #653 fix or the coverage_mapping companion once the WG signs off on the direction.

FileSystemGuy added a commit that referenced this pull request Jul 3, 2026
…rsers (#656)

* test(#653): RED — rules_coverage regex silently skips 4-part IDs

Injects a Rules.md line with a 4-part ID (6.3.99.1) and asserts it
reaches result['unmapped']. Fails on the current regex, which only
matches 3-part IDs — the twelve §6.3.x.x rules added by PR #602 are
silently skipped rather than surfacing as unmapped, so the coverage
gate can never enforce them.

* fix(#653): allow N-part IDs in rules_coverage discovery regex

The Phase-3 regex ^([23456]\.\d+\.\d+)\. only matched 3-part IDs, so
PR #602's twelve §6.3.x.x kvcache rules (6.3.1.1, 6.3.2.1, 6.3.4.5,
etc.) were silently skipped rather than surfacing as unmapped —
meaning the coverage gate could never enforce them. Relax the tail
from \.\d+\.\d+ to \.\d+(?:\.\d+)+ so 3-, 4-, or deeper nested IDs
all match. Downstream consumers (STUB_COVERAGE lookup, unmapped-set
membership, CLI table rendering) treat the ID as an opaque string,
so this is a strict superset of the prior behavior.

Also update the doc-comment in test_rules_coverage.py that restates
the locked regex.

Follow-up (not this PR): tests/test_definition_of_done.py:129,150
uses [234]\.\d+\.\d+ to parse violation log lines like
[2.1.2 someName]. Still 3-part-shape assumption. Currently limited
to §2/§3/§4 so no kvcache impact today, but when §6 real checkers
emit violations tagged with 4-part IDs, that parser will need the
same relaxation.

* test(#653): RED — violation-log extractors drop §5/§6 and 4-part IDs

Adds TestExtractorRegexParity with four unit tests against synthetic
BaseCheck.log_violation-formatted lines:

- extract_ids / extract_pairs each drop §5 and §6 IDs because the
  character class [234] excludes them (a live regression: any VDB
  §5 violation is invisible to the good/bad fixture assertions).
- extract_ids / extract_pairs each drop 4-part IDs because the
  dotted tail \.\d+\.\d+ pins the shape to 3-part.

Both bugs are the log-line-parsing counterpart to the rules_coverage
discovery regex fix in the previous commit.

* fix(#653): allow §5/§6 and N-part IDs in violation-log extractors

Applies the same two-axis regex relaxation the previous commit made to
rules_coverage:

- Character class [234] → [23456] so §5 (VDB, live) and §6 (kvcache,
  landing) violations parse. Before the fix any VDB §5 violation was
  silently dropped from _extract_error_rule_ids, so the good/bad
  fixture assertions in TestDefinitionOfDone* couldn't observe them
  — a live regression, not a speculative future concern.
- Dotted tail \.\d+\.\d+ → \.\d+(?:\.\d+)+ so PR #602's
  §6.3.x.x 4-part IDs parse.

Both extractors take the same shape change. All four RED tests from
the previous commit now pass; the 4-suite parallel sweep is clean.
FileSystemGuy added a commit that referenced this pull request Jul 3, 2026
* test(#655): RED — DoD fixture lacks vector_database and kv_cache subtrees

Adds five failing tests exercising the fixture-coverage gap in
build_submission (storage#655):

- TestBuildSubmissionVdbKvcacheSubtrees (2 unit tests): the sealed
  kwargs guard rejects include_vdb and include_kv_cache — required
  before adding the subtrees.
- TestDefinitionOfDoneVdb (2 integration tests): bad-fixture case
  engineering vdb_missing_metric_field='p95_latency_ms' to trip
  5.3.4 vdbMetricsReported end-to-end; good-fixture case guarding
  against spurious §5 violations.
- TestDefinitionOfDoneKvcache (1 integration test): good kv_cache
  fixture routes through MODE_TO_CHECKERS without tripping
  [2.1.10 workloadCategories] as an unrecognized mode.

All five fail on unknown-override TypeError until include_vdb,
include_kv_cache, and vdb_missing_metric_field are added to
conftest.build_submission.

* fix(#655): add vector_database and kv_cache subtrees to build_submission

Closes the DoD fixture-coverage gap that made VdbCheck (17 real §5
@rule bindings) and KVCacheCheck (§6 stub) invisible to the
end-to-end validator pipeline.

conftest.build_submission gains three new kwargs:

- include_vdb=True — creates results/<sys>/vector_database/milvus/
  DISKANN/{datagen,run}/<ts>/ per loader.py:207-238. Five run
  timestamps (5.3.1 vdbRunCount requires exactly 5), each with a
  §5-conformant summary.json + metadata.json + a
  fs_separation.json CAP-03 sidecar (5.4.2 vdbFilesystemCheck).
- include_kv_cache=True — creates results/<sys>/kv_cache/llama3.1-8b/
  {datagen,run}/<ts>/ per loader.py:191-205. KVCacheCheck is a stub
  so the good-fixture case only needs to route through
  MODE_TO_CHECKERS without tripping [2.1.10 workloadCategories];
  the fixture is scaffolded so a follow-up (once PR #602 gives §6
  real bindings) can add a kv_cache_missing_field knob mirroring
  vdb_missing_metric_field.
- vdb_missing_metric_field=<name> — pops the named field from the
  vdb run summary.json to engineer 5.3.4 vdbMetricsReported for the
  bad-fixture case.

Also updates the stale docstring at vdb_checks.py:150-155 that
described "Phase 4 land time" loader behavior — issue #612 has
since made Loader.load() explicitly fill run_files /
datagen_files for both vector_database and kv_cache modes; the
guard remains as a defensive walk for test-time SubmissionLogs
fakes.
Updated the inter-option delay for CLOSED submissions from 90 seconds to 20 seconds.
FileSystemGuy added a commit that referenced this pull request Jul 3, 2026
* test(#658): RED — PR #602 §6 KVCache IDs unmapped in coverage_mapping

Simulates PR #602's Rules.md state (19 new §6 rule-ID lines appended to
a fake Rules.md) and asserts none of the IDs surface in
reconcile()['unmapped']. Fails now because STUB_COVERAGE['KVCacheCheck']
and OUT_OF_SCOPE_RULES are both empty for §6 — matching the state at
which PR #602's merge would break test_reconcile_returns_no_unmapped_ids.

Next commit populates the mapping; this test flips GREEN.

* fix(#658): GREEN — advertise PR #602 §6 KVCache rules in coverage_mapping

Populates two registries in advance of PR #602's merge so the coverage
gate (test_reconcile_returns_no_unmapped_ids) does not flip red the
moment the new Rules.md §6 IDs land:

* STUB_COVERAGE['KVCacheCheck'] gains the seventeen enforceable §6 IDs
  (6.3.1.1, 6.3.2.x, 6.3.3.x, 6.3.4.x, 6.4.1, 6.5.x, 6.6.1-2). Real
  @rule bindings on KVCacheCheck are a follow-up phase; until then
  PR #602's own enforcement-status note applies — the §6 rules are
  enforced by the run-time CLI locks (6.3.2.1) and manual review.

* OUT_OF_SCOPE_RULES gains §6.4.2 (descriptive POSIX I/O-model prose)
  and §6.6.3 (OPEN-narrative example invocation). Neither carries a
  submission-validator contract; §6.6.3's 'record exact invocation'
  clause is satisfied by the OPEN run's normal metadata capture.

test_baseline_no_stale_entries is renamed to
test_baseline_no_unexpected_stale_entries and allow-lists the PR #602
anticipation window (the 19 IDs that exist in the registries but not
yet in live Rules.md). Drift warnings for those IDs fire in CI as an
intentional signal until PR #602 lands, at which point the allow-list
naturally collapses to the empty set. Any OTHER stale entry still
fails the test.

Docstring updates on KVCacheCheck and coverage_mapping's module-level
docstring reflect the new post-PR-602 posture (§6 no longer 'empty at
land time'). No behavior change to KVCacheCheck itself: it still emits
zero violations.

Sweep: 2611 + 822 + 155 + 238 = 3826 tests pass, 0 fail.
@hazemawadalla

Copy link
Copy Markdown
Contributor Author

Hey Curtis,
I'm confused why do we need a test for changes to a markdown file ? I mean it seems like a good feature it's just odd . I thought we said the inter run delay is 90s

@FileSystemGuy

Copy link
Copy Markdown
Contributor

@hazemawadalla @dslik If you want 90 instead of 20, I'll change it, I just needed to know which one you want.

The new mlpstorage command cross checks all the rules in Rules.md against the code that implements those rules in the validation checker. Its good s/w engineering.

FileSystemGuy added a commit that referenced this pull request Jul 3, 2026
* test(#664): expect --inter-option-delay=90 + subprocess safety net

Per storage#664 the CLOSED --inter-option-delay pin moves from 20 s to
90 s (author confirmed 2026-07-03 for PR #602 §6.3.2.1). Update
tests/unit/test_cli_kvcache.py + tests/unit/test_benchmarks_kvcache.py
to assert the new value; runtime alignment follows in the next commit.

Also add an autouse fixture on TestClosedEnforcement that patches
_execute_command, _probe_results_dir_shared, and _interruptible_sleep
for every test in the class. The *_returns_1 tests rely on the CLOSED
guard short-circuiting before any subprocess launches, so they do not
mock these collaborators. When the runtime and the tests drift (as
they do during this rename), the guard misses and _execute_run falls
into the real mpirun / ssh-probe / aggregation path — on localhost
with kvcache_bin_path=None that expands into a fork storm heavy enough
to OOM a 32 GB host. The fixture ensures a guard miss can only cause a
clean assertion failure, never runaway forking.

* fix(#664): move CLOSED --inter-option-delay enforcement from 20 s to 90 s

PR #602 §6.3.2.1 fixes the CLOSED --inter-option-delay at 90 s (author
hazemawadalla confirmed 2026-07-03). The runtime CLI, effective-value
fallback, help text, and ManPage still enforced/documented 20 s — an
artifact of a temporary reconciliation while the author was
unresponsive. Bring them all in line with the spec.

Sites changed:
- mlpstorage_py/benchmarks/kvcache.py — CLOSED hard-fail, error text,
  and effective-value fallback all move 20 -> 90
- mlpstorage_py/cli/kvcache_args.py — closed set_defaults 20 -> 90
- mlpstorage_py/run_summary.py — effective-value fallback 20 -> 90
- mlpstorage_py/cli/help_formatter.py — CLOSED pinned-defaults help
  text inter-option-delay 20s -> 90s
- ManPage.md — closed pins list inter-option-delay 20 -> 90

Rules.md §6.3.2.1 is updated on PR #602's branch as a companion change.

Behavior change: CLOSED submissions that previously would have accepted
--inter-option-delay 20 will now hard-fail; they must use 90.
Author (hazemawadalla) confirmed on 2026-07-03 that the intended CLOSED
inter-option-delay is 90 s, not 20 s. The 20-s value was a temporary
reconciliation with shipped code while the author's intent was
unresolved. Runtime CLI + tests are already aligned on main via
storage#665 (merged 2026-07-03); this restores the spec to match.

Sites: §6.3.2.1 (kvcacheClosedSequenceLocks) and §6.6.1
(kvcacheClosedImmutable) both mention the 20 s value inline.
…atches

The two drift-simulation tests replaced the whole OUT_OF_SCOPE_RULES /
STUB_COVERAGE dict with a synthetic stale entry, which worked while
Rules.md had no §6 rules to satisfy. Once this PR adds §6, the bare
replacement drops the real 6.4.2 / 6.6.3 OOS mappings and the seventeen
KVCacheCheck stub mappings, so every §6 ID falls into the unmapped set
and the "drift warnings must not affect unmapped set" assertion breaks.

Fix: extend the real dict with the synthetic stale entry via
`{**REAL, "stale.id": ...}` instead of replacing it. The synthetic
stale entry still triggers the drift warning; the real mappings still
keep the §6 IDs mapped; the unmapped-set invariant holds.
@FileSystemGuy FileSystemGuy merged commit 8691fcb into mlcommons:main Jul 3, 2026
4 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Jul 3, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants