Skip to content

fix(#568): refine CAP-01 object-storage skip — key on storage_type, add helper + dedicated tests#581

Merged
russfellows merged 5 commits into
mainfrom
fix/568-cap01-object-storage-skip
Jun 29, 2026
Merged

fix(#568): refine CAP-01 object-storage skip — key on storage_type, add helper + dedicated tests#581
russfellows merged 5 commits into
mainfrom
fix/568-cap01-object-storage-skip

Conversation

@FileSystemGuy

@FileSystemGuy FileSystemGuy commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Refines the already-merged #579 fix for #568.

#579 (now in main) shipped a working but narrow fix that keyed on data_access_protocol == 'object'. After reviewing PR #574 (closed, no test coverage, broke 2 existing fixtures), this PR combines the best of both:

What changes vs. #579

  • mlpstorage_py/benchmarks/dlio.py — replaces the two data_access_protocol-based checks with a single shared _is_object_storage() helper that reads storage.storage_type from params_dict / combined_params (same lookup as _check_storage_scheme_consistency) and returns True for {'s3', 's3_torch'}. direct_fs (--o-direct) is correctly NOT skipped — local statvfs still applies.
  • tests/unit/test_capacity_gate.py — adds TestDLIOIsObjectStorage (9 tests covering precedence, fallback, direct_fs negative case, None combined_params guard, and the no---object edge case). Restructures the two test_destination_is_none_when_object_storage tests to use the helper-stub pattern.
  • tests/integration/test_systemname_yaml_end_to_end.py — stubs _is_object_storage=False on the A7-lock fixture (would otherwise silently fail when the helper exists).

Test plan

RED commit fails with AttributeError: type object 'DLIOBenchmark' has no attribute '_is_object_storage' on 9 helper tests + 2 destination tests. GREEN commit passes with no regressions:

  • tests/: 2392 passed, 13 deselected
  • mlpstorage_py/tests: 780 passed, 1 xfailed
  • vdb_benchmark/tests: 144 passed
  • kv_cache_benchmark/tests: 238 passed
  • Reporter (@jadnov) confirms the original --params storage.storage_type=s3 repro path also skips the gate cleanly under this refinement.

@FileSystemGuy FileSystemGuy requested a review from a team June 29, 2026 21:21
@github-actions

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Adds 11 new tests covering the object-storage escape hatch:

TestDLIOIsObjectStorage (new class): the shared helper that
reads storage.storage_type from params_dict / combined_params.
Locks the lookup precedence, the {'s3','s3_torch'} membership,
the direct_fs (--o-direct) NEGATIVE case, the
None-combined_params guard, and the
`--params storage.storage_type=s3 without --object` edge case
that a data_access_protocol-keyed signal would miss.

TestTrainingBenchmarkRequiredBytes /
TestCheckpointingBenchmarkRequiredBytes (extended):
test_destination_is_none_when_object_storage on both — gates
must return None so _pre_execution_gate fires the A8 hatch.

Also isolates the three existing destination tests
(test_destination_is_args_data_dir,
test_destination_is_checkpoint_folder_joined_with_model,
test_destination_is_none_when_checkpoint_folder_empty) from the
new helper by stubbing bm._is_object_storage = False; the prior
form passed by accident because the helper did not exist.

Drive-by: fixes pre-existing collection rot in the dep-stub block
(find_spec('pyarrow.ipc') raised after pyarrow itself was stubbed
with MagicMock).

2 destination + 9 helper tests fail as expected; fix lands next
commit.
Training and Checkpointing _capacity_gate_destination() returned the
raw destination unconditionally. In object mode that's an s3:// URI;
check_capacity_4field walks the parent chain to the filesystem root
and aborts with `[E401] CAP-01: no valid parent for s3://…` before any
work starts. --skip-validation does not help — CAP-01 lives in
_pre_execution_gate(), not validate_benchmark_environment().

Adds a shared DLIOBenchmark._is_object_storage() helper that reads
storage.storage_type from params_dict / combined_params (same lookup
as _check_storage_scheme_consistency) and returns True for {'s3',
's3_torch'}. Keying on storage_type (post-_apply_object_storage_params
state) rather than data_access_protocol catches the
`--params storage.storage_type=s3` path where a user wires up object
storage without passing `--object`. direct_fs (--o-direct) still
resolves to a local path and is unaffected.

Both _capacity_gate_destination overrides short-circuit on the helper,
returning None so Benchmark._pre_execution_gate fires the existing A8
remote-backend escape hatch (logged as
`CAP-01 skipped: destination not local`).

Also updates the integration A7-lock test to stub _is_object_storage
on the MagicMock fixture — the prior form passed by accident because
the helper did not exist.
@FileSystemGuy FileSystemGuy force-pushed the fix/568-cap01-object-storage-skip branch from 4dc4d6d to 90a761d Compare June 29, 2026 21:29
@FileSystemGuy FileSystemGuy changed the title fix(#568): skip CAP-01 statvfs for object-storage DLIO runs fix(#568): refine CAP-01 object-storage skip — key on storage_type, add helper + dedicated tests Jun 29, 2026
data_access_protocol is registered as a positional with choices=
['file','object'] in cli/common_args.py, not as a --object flag.
Updates the helper docstring, the TestDLIOIsObjectStorage class
docstring, the edge-case test name (was
test_storage_type_set_via_params_without_object_flag → now
test_storage_type_set_via_params_under_file_positional), and that
test's docstring to use the correct grammar.

Behavior unchanged — _is_object_storage keys on
storage.storage_type, which is downstream of however the user
selected their backend.

@russfellows russfellows left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll get this working eventually.

@russfellows russfellows merged commit 4beca4b into main Jun 29, 2026
3 checks passed
@russfellows russfellows deleted the fix/568-cap01-object-storage-skip branch June 29, 2026 22:51
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 29, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants