fix(uat): address results explorer review followups by joeharris76 · Pull Request #255 · joeharris76/BenchBox

joeharris76 · 2026-05-07T14:24:49Z

No description provided.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9f03a17302

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-07T14:30:01Z


+    _stage_playwright_fixture_dir(
+        source_dir=output_dir,
+        fixture_dir=playwright_fixture_dir or _default_playwright_fixture_dir(),


Use per-run Playwright fixture directory

run_explorer_smoke() now rewrites a single shared fixture mount (results-explorer/test-fixtures/.generated/data) when playwright_fixture_dir is not provided, and both the CLI/orchestrator call this default path. If two UAT sweeps run concurrently in the same checkout, one run can overwrite the other run’s fixture link/copy before Playwright executes, causing flaky failures or validating the wrong dataset.

Useful? React with 👍 / 👎.

Follow-up sweep actioned this PR review comment. Disposition: fixed Branch: `chore/pr-review-followups-since-2026-05-05` Summary: Disposition: fixed

Evidence: Current explorer_smoke.py still defaulted to the shared fixture path when callers omitted playwright_fixture_dir. I changed it to use log_dir/playwright-fixtures/data, pass that via E2E_FIXTURE_DIR, assign a per-run E2E_PORT, and prevent Playwright server reuse for fixture overrides. Regression coverage in test_explorer_smoke.py now checks the per-run default, env wiring, and server reuse guard.

Verification: uv run -- python -m pytest tests/uat/test_explorer_smoke.py -m fast -q passed, uv run -- ruff check tests/uat/phases/explorer_smoke.py tests/uat/test_explorer_smoke.py passed, node --check results-explorer/scripts/serve-browser-tests.mjs passed, npm --prefix results-explorer run typecheck passed, and git diff --check passed.

Future sweeps skip comments that already have this marker reply.

Address remaining items from the post-#255 review: - preflight: derive AUTOMATED_LOCAL_PLATFORMS from docker_assets so the bring-up script and the preflight allowlist share a single source of truth. Adds a regression test that pins both sides to the same set. - spec: document the matrix-summary TSV column-order bump that PR #247 introduced (`submit_terminal_state` between `result_path` and `validator_status`). - ops doc: warn that `make uat-explorer-smoke` runs a full `npm ci` + `npm run build` per invocation and is not locally cached. The disk_budget_table.tsv `peak_database_gib=0.0` clarification is deferred until PR #245 lands; nothing to annotate on develop yet. Co-authored-by: Joe Harris <57046+joeharris76@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

The function was extended in #255 to flag both genuine status downgrades AND newly-derived rows missing from the checked-in matrix. Rename to `static_matrix_drift` so the name reflects the broader semantics; expand the docstring to enumerate the three drift kinds it detects. Also lock the contract between `_default_playwright_fixture_dir()` in explorer_smoke.py and `serve-browser-tests.mjs` with a unit test that asserts both ends agree on the `test-fixtures/.generated/data` mount. Surfaced as nits during the review of #255. Co-authored-by: Joe Harris <57046+joeharris76@users.noreply.github.com>

* fix(pr-followup): PR #245 comment 3201562602 — **![P2 Badge](https://img.shields.io/badge… Disposition: fixed Source: #245 (comment) Path: tests/uat/_cli.py * fix(pr-followup): PR #245 comment 3201562610 — **![P1 Badge](https://img.shields.io/badge… Disposition: fixed Source: #245 (comment) Path: tests/uat/phases/preflight.py * fix(pr-followup): PR #246 comment 3202118174 — **![P2 Badge](https://img.shields.io/badge… Disposition: fixed Source: #246 (comment) Path: _project/audits/screenshots/results-explorer-hierarchy-usability-data-audit-20260507/screenshot-manifest.json * fix(pr-followup): PR #255 comment 3202201883 — **![P2 Badge](https://img.shields.io/badge… Disposition: fixed Source: #255 (comment) Path: tests/uat/phases/explorer_smoke.py * fix(pr-followup): PR #256 comment 3202212814 — **![P2 Badge](https://img.shields.io/badge… Disposition: fixed Source: #256 (comment) Path: results-explorer/src/pages/BenchmarkIndex.tsx * fix(pr-followup): PR #266 comment 3204995419 — **![P1 Badge](https://img.shields.io/badge… Disposition: fixed Source: #266 (comment) Path: results-explorer/src/lib/facetDisplay.ts * fix(pr-followup): PR #267 comment 3205001733 — **![P2 Badge](https://img.shields.io/badge… Disposition: fixed Source: #267 (comment) Path: _project/scripts/skill_sync_lock_audit.py --------- Co-authored-by: Joe Harris <57046+joeharris76@users.noreply.github.com>

fix(uat): address results explorer review followups

9f03a17

joeharris76 enabled auto-merge (squash) May 7, 2026 14:24

chatgpt-codex-connector Bot reviewed May 7, 2026

View reviewed changes

joeharris76 merged commit ee86acc into develop May 7, 2026
6 checks passed

joeharris76 deleted the fix/results-explorer-uat-review-followups branch May 7, 2026 14:35

This was referenced May 7, 2026

fix(uat): resolve review-followup remediations #264

Merged

chore(tuning): rename static_regressions to static_matrix_drift #265

Merged

joeharris76 mentioned this pull request May 8, 2026

chore/pr review followups since 2026 05 05 #275

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(uat): address results explorer review followups#255

fix(uat): address results explorer review followups#255
joeharris76 merged 1 commit intodevelopfrom
fix/results-explorer-uat-review-followups

joeharris76 commented May 7, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Uh oh!

joeharris76 May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joeharris76 commented May 7, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

joeharris76 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant