Docs: audit Results Explorer hierarchy usability data by joeharris76 · Pull Request #246 · joeharris76/BenchBox

joeharris76 · 2026-05-07T13:08:40Z

Summary

This supplements PR #244 with the missing robust evidence for visual hierarchy, usability, and data-presentation quality in the BenchBox Results Explorer.

Adds a retained-worktree audit artifact, fresh screenshot evidence for the requested route and viewport matrix, and six evidence-bound remediation TODOs that reference PR #246, finding IDs, and exact screenshot filenames. The audit mapping was updated so every PR #246 finding has both broad PR #244 coverage and a screenshot-specific PR #246 TODO.

Evidence

Audit: _project/audits/results-explorer-hierarchy-usability-data-audit-20260507.md
Screenshots: _project/audits/screenshots/results-explorer-hierarchy-usability-data-audit-20260507/
TODOs: _project/TODO/main/planning/results-explorer-pr246-*.yaml

PR #246 TODOs

results-explorer-pr246-home-leaderboard-evidence-remediation
results-explorer-pr246-responsive-navigation-overflow-remediation
results-explorer-pr246-result-compare-evidence-remediation
results-explorer-pr246-browse-platform-evidence-remediation
results-explorer-pr246-query-workbench-evidence-remediation
results-explorer-pr246-final-evidence-gate

Validation

uv run --project _project/scripts -- python _project/scripts/todo_cli.py validate _project/TODO
uv run --project _project/scripts -- python _project/scripts/todo_cli.py check-graph
uv run --project _project/scripts -- python _project/scripts/todo_cli.py reindex
/todo review quality pass: no findings after checking evidence linkage, screenshot existence, work breakdown, verification, guardrails, scope limits, and full screenshot coverage

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c61f27d57f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-07T14:18:17Z

+    "visibleFlags": {
+      "rawError": false,
+      "binderError": false,
+      "cannotCompare": true


Mark comparable DuckDB-vs-AWS captures correctly

The manifest marks this compare capture as cannotCompare: true, but the corresponding screenshot evidence (compare-duckdb-aws-*.png, e.g. 1280px) shows a valid comparable run with a “No differences” comparability receipt and normal decision summary. This misclassification can skew any audit tooling or manual triage that relies on visibleFlags to distinguish guarded mixed-cohort compares from same-cohort comparisons, leading to incorrect conclusions for the DQ-4/U-2 evidence set.

Useful? React with 👍 / 👎.

Follow-up sweep actioned this PR review comment. Disposition: fixed Branch: `chore/pr-review-followups-since-2026-05-05` Summary: Disposition: fixed

Evidence: Updated the PR #246 hierarchy screenshot manifest so all four compare-duckdb-aws-* entries now mark visibleFlags.cannotCompare: false, matching the comparable DuckDB-vs-AWS evidence. Added focused regression coverage in test_results_explorer_audit_manifest.py.

Verification: uv run -- python -m pytest tests/unit/explorer/test_results_explorer_audit_manifest.py -q passed, and uv run -- ruff check tests/unit/explorer/test_results_explorer_audit_manifest.py passed.

Future sweeps skip comments that already have this marker reply.

* fix(pr-followup): PR #245 comment 3201562602 — **![P2 Badge](https://img.shields.io/badge… Disposition: fixed Source: #245 (comment) Path: tests/uat/_cli.py * fix(pr-followup): PR #245 comment 3201562610 — **![P1 Badge](https://img.shields.io/badge… Disposition: fixed Source: #245 (comment) Path: tests/uat/phases/preflight.py * fix(pr-followup): PR #246 comment 3202118174 — **![P2 Badge](https://img.shields.io/badge… Disposition: fixed Source: #246 (comment) Path: _project/audits/screenshots/results-explorer-hierarchy-usability-data-audit-20260507/screenshot-manifest.json * fix(pr-followup): PR #255 comment 3202201883 — **![P2 Badge](https://img.shields.io/badge… Disposition: fixed Source: #255 (comment) Path: tests/uat/phases/explorer_smoke.py * fix(pr-followup): PR #256 comment 3202212814 — **![P2 Badge](https://img.shields.io/badge… Disposition: fixed Source: #256 (comment) Path: results-explorer/src/pages/BenchmarkIndex.tsx * fix(pr-followup): PR #266 comment 3204995419 — **![P1 Badge](https://img.shields.io/badge… Disposition: fixed Source: #266 (comment) Path: results-explorer/src/lib/facetDisplay.ts * fix(pr-followup): PR #267 comment 3205001733 — **![P2 Badge](https://img.shields.io/badge… Disposition: fixed Source: #267 (comment) Path: _project/scripts/skill_sync_lock_audit.py --------- Co-authored-by: Joe Harris <57046+joeharris76@users.noreply.github.com>

joeharris76 added 2 commits May 7, 2026 09:08

docs: audit Results Explorer hierarchy usability data

b1a9aa0

docs: add pr246 remediation todos

c61f27d

joeharris76 marked this pull request as ready for review May 7, 2026 14:15

github-actions Bot enabled auto-merge (squash) May 7, 2026 14:15

chatgpt-codex-connector Bot reviewed May 7, 2026

View reviewed changes

github-actions Bot merged commit 555708d into develop May 7, 2026
6 checks passed

joeharris76 mentioned this pull request May 8, 2026

chore/pr review followups since 2026 05 05 #275

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs: audit Results Explorer hierarchy usability data#246

Docs: audit Results Explorer hierarchy usability data#246
github-actions[bot] merged 2 commits intodevelopfrom
docs/results-explorer-hierarchy-usability-data-audit

joeharris76 commented May 7, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Uh oh!

joeharris76 May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joeharris76 commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Evidence

PR #246 TODOs

Validation

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

joeharris76 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

joeharris76 commented May 7, 2026 •

edited

Loading