fix(#664): move CLOSED --inter-option-delay from 20 s to 90 s by FileSystemGuy · Pull Request #665 · mlcommons/storage

FileSystemGuy · 2026-07-03T04:34:02Z

Summary

Closes kvcache CLOSED --inter-option-delay: move enforced value from 20 s to 90 s #664. Aligns the CLOSED --inter-option-delay enforcement / defaults / help / manpage with PR KV cache Rules for closed and open submission #602 §6.3.2.1 (author @hazemawadalla confirmed 90 s on 2026-07-03). Runtime and tests were still on the temporary 20 s reconciliation from before the confirmation.
Adds a class-level autouse fixture on TestClosedEnforcement that patches _execute_command, _probe_results_dir_shared, and _interruptible_sleep. The *_returns_1 tests rely on the CLOSED guard short-circuiting before any subprocess launches; when the runtime and tests drift (as they did during this rename), the guard misses and _execute_run falls into the real mpirun / ssh-probe / aggregation path — on localhost with kvcache_bin_path=None that expands into a fork storm heavy enough to OOM a 32 GB host. The fixture ensures a future guard miss can only produce a clean assertion failure, never a runaway fork.

Behavior change

CLOSED submissions that previously would have accepted --inter-option-delay 20 will now hard-fail; they must use 90. Consistent with the merged §6.3.2.1 text on PR #602.

Sites changed

Runtime code:

mlpstorage_py/benchmarks/kvcache.py — CLOSED hard-fail, error text, and effective-value fallback: 20 → 90
mlpstorage_py/cli/kvcache_args.py — closed set_defaults: 20 → 90
mlpstorage_py/run_summary.py — effective-value fallback: 20 → 90
mlpstorage_py/cli/help_formatter.py — CLOSED pinned-defaults help text: 20s → 90s
ManPage.md — closed pins list: 20 → 90

Tests:

tests/unit/test_cli_kvcache.py — closed set_defaults assertion: 20 → 90
tests/unit/test_benchmarks_kvcache.py — CLOSED fixture: 20 → 90; renamed test_closed_inter_option_delay_non_20_returns_1 → _non_90_returns_1 (deliberately uses 20 as the non-90 value to catch a regression back to 20); autouse fork-storm safety net on TestClosedEnforcement.

Rules.md §6.3.2.1 is updated on PR #602's branch as a companion change (not touched here — this PR ships the runtime alignment independently so #602 can rebase and pass CI cleanly).

Test plan

uv run pytest tests/unit — 2516 passed, 1 skipped
uv run pytest mlpstorage_py/tests — 822 passed
uv run pytest vdb_benchmark/tests — 155 passed
uv run pytest kv_cache_benchmark/tests — 238 passed
RED-first verified: with runtime stashed, test_closed_inter_option_delay_non_90_returns_1 and 5 sibling tests fail cleanly (assertion errors, no subprocess spawn) — safety-net fixture works as designed

Per storage#664 the CLOSED --inter-option-delay pin moves from 20 s to 90 s (author confirmed 2026-07-03 for PR #602 §6.3.2.1). Update tests/unit/test_cli_kvcache.py + tests/unit/test_benchmarks_kvcache.py to assert the new value; runtime alignment follows in the next commit. Also add an autouse fixture on TestClosedEnforcement that patches _execute_command, _probe_results_dir_shared, and _interruptible_sleep for every test in the class. The *_returns_1 tests rely on the CLOSED guard short-circuiting before any subprocess launches, so they do not mock these collaborators. When the runtime and the tests drift (as they do during this rename), the guard misses and _execute_run falls into the real mpirun / ssh-probe / aggregation path — on localhost with kvcache_bin_path=None that expands into a fork storm heavy enough to OOM a 32 GB host. The fixture ensures a guard miss can only cause a clean assertion failure, never runaway forking.

…90 s PR #602 §6.3.2.1 fixes the CLOSED --inter-option-delay at 90 s (author hazemawadalla confirmed 2026-07-03). The runtime CLI, effective-value fallback, help text, and ManPage still enforced/documented 20 s — an artifact of a temporary reconciliation while the author was unresponsive. Bring them all in line with the spec. Sites changed: - mlpstorage_py/benchmarks/kvcache.py — CLOSED hard-fail, error text, and effective-value fallback all move 20 -> 90 - mlpstorage_py/cli/kvcache_args.py — closed set_defaults 20 -> 90 - mlpstorage_py/run_summary.py — effective-value fallback 20 -> 90 - mlpstorage_py/cli/help_formatter.py — CLOSED pinned-defaults help text inter-option-delay 20s -> 90s - ManPage.md — closed pins list inter-option-delay 20 -> 90 Rules.md §6.3.2.1 is updated on PR #602's branch as a companion change. Behavior change: CLOSED submissions that previously would have accepted --inter-option-delay 20 will now hard-fail; they must use 90.

github-actions · 2026-07-03T04:34:12Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

FileSystemGuy added 2 commits July 2, 2026 21:31

FileSystemGuy requested a review from a team July 3, 2026 04:34

FileSystemGuy merged commit 94802d3 into main Jul 3, 2026
4 checks passed

FileSystemGuy deleted the fix/664-inter-option-delay-90 branch July 3, 2026 04:39

github-actions Bot locked and limited conversation to collaborators Jul 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(#664): move CLOSED --inter-option-delay from 20 s to 90 s#665

fix(#664): move CLOSED --inter-option-delay from 20 s to 90 s#665
FileSystemGuy merged 2 commits into
mainfrom
fix/664-inter-option-delay-90

FileSystemGuy commented Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FileSystemGuy commented Jul 3, 2026

Summary

Behavior change

Sites changed

Test plan

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant