fix(#655): add vector_database + kv_cache subtrees to DoD fixture#657
Merged
Conversation
…rees Adds five failing tests exercising the fixture-coverage gap in build_submission (storage#655): - TestBuildSubmissionVdbKvcacheSubtrees (2 unit tests): the sealed kwargs guard rejects include_vdb and include_kv_cache — required before adding the subtrees. - TestDefinitionOfDoneVdb (2 integration tests): bad-fixture case engineering vdb_missing_metric_field='p95_latency_ms' to trip 5.3.4 vdbMetricsReported end-to-end; good-fixture case guarding against spurious §5 violations. - TestDefinitionOfDoneKvcache (1 integration test): good kv_cache fixture routes through MODE_TO_CHECKERS without tripping [2.1.10 workloadCategories] as an unrecognized mode. All five fail on unknown-override TypeError until include_vdb, include_kv_cache, and vdb_missing_metric_field are added to conftest.build_submission.
Closes the DoD fixture-coverage gap that made VdbCheck (17 real §5 @rule bindings) and KVCacheCheck (§6 stub) invisible to the end-to-end validator pipeline. conftest.build_submission gains three new kwargs: - include_vdb=True — creates results/<sys>/vector_database/milvus/ DISKANN/{datagen,run}/<ts>/ per loader.py:207-238. Five run timestamps (5.3.1 vdbRunCount requires exactly 5), each with a §5-conformant summary.json + metadata.json + a fs_separation.json CAP-03 sidecar (5.4.2 vdbFilesystemCheck). - include_kv_cache=True — creates results/<sys>/kv_cache/llama3.1-8b/ {datagen,run}/<ts>/ per loader.py:191-205. KVCacheCheck is a stub so the good-fixture case only needs to route through MODE_TO_CHECKERS without tripping [2.1.10 workloadCategories]; the fixture is scaffolded so a follow-up (once PR #602 gives §6 real bindings) can add a kv_cache_missing_field knob mirroring vdb_missing_metric_field. - vdb_missing_metric_field=<name> — pops the named field from the vdb run summary.json to engineer 5.3.4 vdbMetricsReported for the bad-fixture case. Also updates the stale docstring at vdb_checks.py:150-155 that described "Phase 4 land time" loader behavior — issue #612 has since made Loader.load() explicitly fill run_files / datagen_files for both vector_database and kv_cache modes; the guard remains as a defensive walk for test-time SubmissionLogs fakes.
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #655.
Summary
The Definition-of-Done end-to-end test (
mlpstorage_py/tests/test_definition_of_done.py) exercises the full validator pipeline as a subprocess against a fixture built byconftest.build_submission. That fixture only ever creates training + checkpointing subtrees, soMODE_TO_CHECKERSnever routed toVdbCheck(17 real §5@rulebindings) orKVCacheCheck(§6 stub, will gain real bindings via PR #602). Their unit tests intest_vdb_checks.py(41 tests, all passing) exercise them in isolation, but the loader → checker hand-off, the log-line format, and the origin-path binding were unverified end-to-end.This PR adds three kwargs on
build_submission:include_vdb=True— createsresults/<sys>/vector_database/milvus/DISKANN/{datagen,run}/<ts>/perloader.py:207-238. Five run timestamps because 5.3.1vdbRunCountrequires exactly 5. Each run dir carries a §5-conformantsummary.json+metadata.json+ a CAP-03fs_separation.jsonsidecar so 5.4.2vdbFilesystemCheckpasses.include_kv_cache=True— createsresults/<sys>/kv_cache/llama3.1-8b/{datagen,run}/<ts>/perloader.py:191-205. BecauseKVCacheCheckis a stub, the good-fixture assertion only needs to prove the loader routes throughMODE_TO_CHECKERSwithout tripping[2.1.10 workloadCategories]. The subtree is scaffolded so a follow-up (once PR KV cache Rules for closed and open submission #602 gives §6 real bindings) can add akv_cache_missing_fieldknob mirroringvdb_missing_metric_field.vdb_missing_metric_field=<name>— pops the named field from the vdb run summary.json to engineer 5.3.4vdbMetricsReportedend-to-end (bad-fixture case).Also fixed: stale docstring
vdb_checks.py:150-155— the_iter_run_filesdocstring described "Phase 4 land time" behavior:That was already outdated when #655 was filed — #612 added explicit
run_files/datagen_filespopulation for bothvector_database(loader.py:224-225) andkv_cache(loader.py:198-199). Updated the docstring to reflect the current loader behavior; the empty-iterable guard remains as a defensive walk for test-timeSubmissionLogsfakes.Follow-up not in this PR
@rulebindings, add akv_cache_missing_fieldkwarg and aTestDefinitionOfDoneKvcachebad-fixture case mirroring the vdb structure.5.x.yprefix appears at ERROR level; it does not extendTestDefinitionOfDoneGood.test_good_fixture_does_not_trip_bad_fixture_rule_idsto add5.3.4to_EXPECTED_BAD_FIXTURE_RULE_IDS. Doing so would require a bad-fixture engineering of both §5.3.4 and one §6 rule (once those exist), so both live-fixture cases assert the same subset. Bundled into the same §6 follow-up.Test plan
TDD RED-first (per project convention):
All 5 new tests failed on RED with
TypeError: unknown override; all pass on GREEN after two rounds (initial GREEN tripped 5.3.1 vdbRunCount + 5.4.2 vdbFilesystemCheck against my one-timestamp fixture; fixed by extending to 5 timestamps + adding fs_separation.json sidecars).Parallel 4-suite sweep:
Related
_extract_error_rule_idsusage depends on.kv_cache_missing_fieldfollow-up here once §6 gets real@rulebindings.