fix/results explorer disk budget resume plan#245
Conversation
joeharris76
commented
May 7, 2026
- chore(uat): capture disk budget resume evidence
- feat(uat): estimate disk budget and resume sweeps
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 93fd89ae38
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| print(f"[preflight warn] {warning}", file=sys.stderr) | ||
| if result.aborted: | ||
| print(f"[preflight] ABORT: {result.abort_reason}", file=sys.stderr) | ||
| return 0 |
There was a problem hiding this comment.
Return nonzero when preflight reports an abort
preflight_main always returns 0 even when run_preflight sets aborted=True, so automation cannot detect a failed preflight and may continue into execute/sweep steps despite an explicit free-space abort message. This only appears when python -m tests.uat._cli preflight is used, but in that path the command currently signals success for failure conditions.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Follow-up sweep actioned this PR review comment.
Disposition: fixed
Branch: `chore/pr-review-followups-since-2026-05-05`
Summary:
Disposition: fixed
Evidence: Current tests/uat/_cli.py still returned success after a preflight abort, while execute_main already returned 2 for the same condition. I changed standalone preflight_main to return 2 on abort and added focused dispatch coverage in tests/uat/test_cli_dispatch.py.
Verification: uv run -- python -m pytest tests/uat/test_cli_dispatch.py -q passed, 13 tests. uv run -- ruff check tests/uat/_cli.py tests/uat/test_cli_dispatch.py passed.
Future sweeps skip comments that already have this marker reply.
| warnings: list[str] = [] | ||
| aborted = False | ||
| abort_reason: str | None = None | ||
| disk_budget_summary = estimate_disk_budget_summary(disk_budget_config) if disk_budget_config is not None else None |
There was a problem hiding this comment.
Guard advisory disk-budget errors in preflight
The new disk-budget summary is documented as advisory, but this direct call can raise (for example, malformed TSV headers or unparsable numeric fields) and crash preflight before PreflightResult is returned. That turns a non-gating estimate into a hard stop for all sweeps, which contradicts the intended behavior where only the free-space cutoff should abort execution.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Follow-up sweep actioned this PR review comment.
Disposition: fixed
Branch: `chore/pr-review-followups-since-2026-05-05`
Summary:
Disposition: fixed
Evidence: Current preflight.py still had the advisory disk-budget call on the preflight path; it now converts estimator exceptions into warnings. Added test_preflight.py coverage for a malformed TSV header proving preflight returns non-aborted with no summary.
Verification: uv run -- python -m pytest tests/uat/test_preflight.py -q -n 0 passed; uv run -- ruff check tests/uat/phases/preflight.py tests/uat/test_preflight.py passed.
Future sweeps skip comments that already have this marker reply.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 93fd89ae38
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| print(f"[preflight warn] {warning}", file=sys.stderr) | ||
| if result.aborted: | ||
| print(f"[preflight] ABORT: {result.abort_reason}", file=sys.stderr) | ||
| return 0 |
There was a problem hiding this comment.
Return nonzero when preflight reports an abort
preflight_main always returns 0 even when run_preflight sets aborted=True, so automation cannot detect a failed preflight and may continue into execute/sweep steps despite an explicit free-space abort message. This only appears when python -m tests.uat._cli preflight is used, but in that path the command currently signals success for failure conditions.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Follow-up sweep actioned this PR review comment.
Disposition: no-current-action
Branch: `chore/pr-review-followups-since-2026-05-05`
Summary:
Disposition: no-current-action
Evidence: Current tests/uat/_cli.py already returns 2 when result.aborted is true, and tests/uat/test_cli_dispatch.py has focused regression coverage asserting the preflight subcommand returns 2 while printing the abort/warning output. Files were left unchanged.
Verification: uv run -- python -m pytest tests/uat/test_cli_dispatch.py::test_main_preflight_returns_2_when_preflight_aborts -q passed: 1 passed in 2.06s.
Future sweeps skip comments that already have this marker reply.
| warnings: list[str] = [] | ||
| aborted = False | ||
| abort_reason: str | None = None | ||
| disk_budget_summary = estimate_disk_budget_summary(disk_budget_config) if disk_budget_config is not None else None |
There was a problem hiding this comment.
Guard advisory disk-budget errors in preflight
The new disk-budget summary is documented as advisory, but this direct call can raise (for example, malformed TSV headers or unparsable numeric fields) and crash preflight before PreflightResult is returned. That turns a non-gating estimate into a hard stop for all sweeps, which contradicts the intended behavior where only the free-space cutoff should abort execution.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Follow-up sweep actioned this PR review comment.
Disposition: no-current-action
Branch: `chore/pr-review-followups-since-2026-05-05`
Summary:
Disposition: no-current-action
Evidence: Current preflight.py already guards estimate_disk_budget_summary(...) with try/except Exception and records an advisory warning. Focused coverage exists in test_preflight.py, asserting malformed disk-budget TSV input does not abort preflight. Worktree stayed unchanged.
Verification: uv run -- python -m pytest tests/uat/test_preflight.py::test_preflight_disk_budget_table_error_is_advisory -q passed, 1 test.
Future sweeps skip comments that already have this marker reply.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 93fd89ae38
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| print(f"[preflight warn] {warning}", file=sys.stderr) | ||
| if result.aborted: | ||
| print(f"[preflight] ABORT: {result.abort_reason}", file=sys.stderr) | ||
| return 0 |
There was a problem hiding this comment.
Return nonzero when preflight reports an abort
preflight_main always returns 0 even when run_preflight sets aborted=True, so automation cannot detect a failed preflight and may continue into execute/sweep steps despite an explicit free-space abort message. This only appears when python -m tests.uat._cli preflight is used, but in that path the command currently signals success for failure conditions.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Follow-up sweep actioned this PR review comment.
Disposition: no-current-action
Branch: `chore/pr-review-followups-since-2026-05-05`
Summary:
Disposition: no-current-action
Evidence: Current tests/uat/_cli.py already returns 2 when result.aborted, and tests/uat/test_cli_dispatch.py has focused coverage asserting the preflight subcommand returns 2 and emits the abort message. Worktree remained unchanged.
Verification: uv run -- python -m pytest tests/uat/test_cli_dispatch.py::test_main_preflight_returns_2_when_preflight_aborts -q passed: 1 passed.
Future sweeps skip comments that already have this marker reply.
| warnings: list[str] = [] | ||
| aborted = False | ||
| abort_reason: str | None = None | ||
| disk_budget_summary = estimate_disk_budget_summary(disk_budget_config) if disk_budget_config is not None else None |
There was a problem hiding this comment.
Guard advisory disk-budget errors in preflight
The new disk-budget summary is documented as advisory, but this direct call can raise (for example, malformed TSV headers or unparsable numeric fields) and crash preflight before PreflightResult is returned. That turns a non-gating estimate into a hard stop for all sweeps, which contradicts the intended behavior where only the free-space cutoff should abort execution.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Follow-up sweep actioned this PR review comment.
Disposition: no-current-action
Branch: `chore/pr-review-followups-since-2026-05-05`
Summary:
Disposition: no-current-action
Evidence: Current preflight.py already guards estimate_disk_budget_summary(...) with try/except Exception and records an advisory warning. Focused regression coverage exists in test_preflight.py, using a malformed TSV header and asserting preflight does not abort. Worktree remains unchanged.
Verification: uv run -- python -m pytest tests/uat/test_preflight.py -m fast -k disk_budget_table_error_is_advisory -q passed: 1 passed in 2.25s.
Future sweeps skip comments that already have this marker reply.
Address remaining items from the post-#255 review: - preflight: derive AUTOMATED_LOCAL_PLATFORMS from docker_assets so the bring-up script and the preflight allowlist share a single source of truth. Adds a regression test that pins both sides to the same set. - spec: document the matrix-summary TSV column-order bump that PR #247 introduced (`submit_terminal_state` between `result_path` and `validator_status`). - ops doc: warn that `make uat-explorer-smoke` runs a full `npm ci` + `npm run build` per invocation and is not locally cached. The disk_budget_table.tsv `peak_database_gib=0.0` clarification is deferred until PR #245 lands; nothing to annotate on develop yet. Co-authored-by: Joe Harris <57046+joeharris76@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
93fd89a to
f0e26b6
Compare
* fix(pr-followup): PR #245 comment 3201562602 — **<sub><sub> Path: tests/uat/_cli.py * fix(pr-followup): PR #245 comment 3201562610 — **<sub><sub> Path: tests/uat/phases/preflight.py * fix(pr-followup): PR #246 comment 3202118174 — **<sub><sub> Path: _project/audits/screenshots/results-explorer-hierarchy-usability-data-audit-20260507/screenshot-manifest.json * fix(pr-followup): PR #255 comment 3202201883 — **<sub><sub> Path: tests/uat/phases/explorer_smoke.py * fix(pr-followup): PR #256 comment 3202212814 — **<sub><sub> Path: results-explorer/src/pages/BenchmarkIndex.tsx * fix(pr-followup): PR #266 comment 3204995419 — **<sub><sub> Path: results-explorer/src/lib/facetDisplay.ts * fix(pr-followup): PR #267 comment 3205001733 — **<sub><sub> Path: _project/scripts/skill_sync_lock_audit.py --------- Co-authored-by: Joe Harris <57046+joeharris76@users.noreply.github.com>