Skip to content

fix(#655): add vector_database + kv_cache subtrees to DoD fixture#657

Merged
FileSystemGuy merged 2 commits into
mainfrom
fix/655-dod-fixture-vdb-kvcache-subtrees
Jul 3, 2026
Merged

fix(#655): add vector_database + kv_cache subtrees to DoD fixture#657
FileSystemGuy merged 2 commits into
mainfrom
fix/655-dod-fixture-vdb-kvcache-subtrees

Conversation

@FileSystemGuy

Copy link
Copy Markdown
Contributor

Closes #655.

Summary

The Definition-of-Done end-to-end test (mlpstorage_py/tests/test_definition_of_done.py) exercises the full validator pipeline as a subprocess against a fixture built by conftest.build_submission. That fixture only ever creates training + checkpointing subtrees, so MODE_TO_CHECKERS never routed to VdbCheck (17 real §5 @rule bindings) or KVCacheCheck (§6 stub, will gain real bindings via PR #602). Their unit tests in test_vdb_checks.py (41 tests, all passing) exercise them in isolation, but the loader → checker hand-off, the log-line format, and the origin-path binding were unverified end-to-end.

This PR adds three kwargs on build_submission:

  • include_vdb=True — creates results/<sys>/vector_database/milvus/DISKANN/{datagen,run}/<ts>/ per loader.py:207-238. Five run timestamps because 5.3.1 vdbRunCount requires exactly 5. Each run dir carries a §5-conformant summary.json + metadata.json + a CAP-03 fs_separation.json sidecar so 5.4.2 vdbFilesystemCheck passes.
  • include_kv_cache=True — creates results/<sys>/kv_cache/llama3.1-8b/{datagen,run}/<ts>/ per loader.py:191-205. Because KVCacheCheck is a stub, the good-fixture assertion only needs to prove the loader routes through MODE_TO_CHECKERS without tripping [2.1.10 workloadCategories]. The subtree is scaffolded so a follow-up (once PR KV cache Rules for closed and open submission #602 gives §6 real bindings) can add a kv_cache_missing_field knob mirroring vdb_missing_metric_field.
  • vdb_missing_metric_field=<name> — pops the named field from the vdb run summary.json to engineer 5.3.4 vdbMetricsReported end-to-end (bad-fixture case).

Also fixed: stale docstring

vdb_checks.py:150-155 — the _iter_run_files docstring described "Phase 4 land time" behavior:

Loader.load() only fills run_files / datagen_files for mode == "training"; the else branch fills checkpoint_files for everything else. For vector_database leaves this means run_files is None.

That was already outdated when #655 was filed — #612 added explicit run_files / datagen_files population for both vector_database (loader.py:224-225) and kv_cache (loader.py:198-199). Updated the docstring to reflect the current loader behavior; the empty-iterable guard remains as a defensive walk for test-time SubmissionLogs fakes.

Follow-up not in this PR

  • Once PR KV cache Rules for closed and open submission #602 gives §6 real @rule bindings, add a kv_cache_missing_field kwarg and a TestDefinitionOfDoneKvcache bad-fixture case mirroring the vdb structure.
  • The vanilla-fixture "no §5 rules tripped" assertion only checks that no 5.x.y prefix appears at ERROR level; it does not extend TestDefinitionOfDoneGood.test_good_fixture_does_not_trip_bad_fixture_rule_ids to add 5.3.4 to _EXPECTED_BAD_FIXTURE_RULE_IDS. Doing so would require a bad-fixture engineering of both §5.3.4 and one §6 rule (once those exist), so both live-fixture cases assert the same subset. Bundled into the same §6 follow-up.

Test plan

TDD RED-first (per project convention):

All 5 new tests failed on RED with TypeError: unknown override; all pass on GREEN after two rounds (initial GREEN tripped 5.3.1 vdbRunCount + 5.4.2 vdbFilesystemCheck against my one-timestamp fixture; fixed by extending to 5 timestamps + adding fs_separation.json sidecars).

Parallel 4-suite sweep:

Related

…rees

Adds five failing tests exercising the fixture-coverage gap in
build_submission (storage#655):

- TestBuildSubmissionVdbKvcacheSubtrees (2 unit tests): the sealed
  kwargs guard rejects include_vdb and include_kv_cache — required
  before adding the subtrees.
- TestDefinitionOfDoneVdb (2 integration tests): bad-fixture case
  engineering vdb_missing_metric_field='p95_latency_ms' to trip
  5.3.4 vdbMetricsReported end-to-end; good-fixture case guarding
  against spurious §5 violations.
- TestDefinitionOfDoneKvcache (1 integration test): good kv_cache
  fixture routes through MODE_TO_CHECKERS without tripping
  [2.1.10 workloadCategories] as an unrecognized mode.

All five fail on unknown-override TypeError until include_vdb,
include_kv_cache, and vdb_missing_metric_field are added to
conftest.build_submission.
Closes the DoD fixture-coverage gap that made VdbCheck (17 real §5
@rule bindings) and KVCacheCheck (§6 stub) invisible to the
end-to-end validator pipeline.

conftest.build_submission gains three new kwargs:

- include_vdb=True — creates results/<sys>/vector_database/milvus/
  DISKANN/{datagen,run}/<ts>/ per loader.py:207-238. Five run
  timestamps (5.3.1 vdbRunCount requires exactly 5), each with a
  §5-conformant summary.json + metadata.json + a
  fs_separation.json CAP-03 sidecar (5.4.2 vdbFilesystemCheck).
- include_kv_cache=True — creates results/<sys>/kv_cache/llama3.1-8b/
  {datagen,run}/<ts>/ per loader.py:191-205. KVCacheCheck is a stub
  so the good-fixture case only needs to route through
  MODE_TO_CHECKERS without tripping [2.1.10 workloadCategories];
  the fixture is scaffolded so a follow-up (once PR #602 gives §6
  real bindings) can add a kv_cache_missing_field knob mirroring
  vdb_missing_metric_field.
- vdb_missing_metric_field=<name> — pops the named field from the
  vdb run summary.json to engineer 5.3.4 vdbMetricsReported for the
  bad-fixture case.

Also updates the stale docstring at vdb_checks.py:150-155 that
described "Phase 4 land time" loader behavior — issue #612 has
since made Loader.load() explicitly fill run_files /
datagen_files for both vector_database and kv_cache modes; the
guard remains as a defensive walk for test-time SubmissionLogs
fakes.
@FileSystemGuy FileSystemGuy requested a review from a team July 3, 2026 01:23
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@FileSystemGuy FileSystemGuy merged commit ccfa819 into main Jul 3, 2026
4 checks passed
@FileSystemGuy FileSystemGuy deleted the fix/655-dod-fixture-vdb-kvcache-subtrees branch July 3, 2026 01:28
@github-actions github-actions Bot locked and limited conversation to collaborators Jul 3, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DoD end-to-end fixture omits vector_database and kv_cache subtrees — §5 checks and future §6 checks never fire

1 participant