Docs: audit Results Explorer hierarchy usability data#246
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c61f27d57f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "visibleFlags": { | ||
| "rawError": false, | ||
| "binderError": false, | ||
| "cannotCompare": true |
There was a problem hiding this comment.
Mark comparable DuckDB-vs-AWS captures correctly
The manifest marks this compare capture as cannotCompare: true, but the corresponding screenshot evidence (compare-duckdb-aws-*.png, e.g. 1280px) shows a valid comparable run with a “No differences” comparability receipt and normal decision summary. This misclassification can skew any audit tooling or manual triage that relies on visibleFlags to distinguish guarded mixed-cohort compares from same-cohort comparisons, leading to incorrect conclusions for the DQ-4/U-2 evidence set.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Follow-up sweep actioned this PR review comment.
Disposition: fixed
Branch: `chore/pr-review-followups-since-2026-05-05`
Summary:
Disposition: fixed
Evidence: Updated the PR #246 hierarchy screenshot manifest so all four compare-duckdb-aws-* entries now mark visibleFlags.cannotCompare: false, matching the comparable DuckDB-vs-AWS evidence. Added focused regression coverage in test_results_explorer_audit_manifest.py.
Verification: uv run -- python -m pytest tests/unit/explorer/test_results_explorer_audit_manifest.py -q passed, and uv run -- ruff check tests/unit/explorer/test_results_explorer_audit_manifest.py passed.
Future sweeps skip comments that already have this marker reply.
* fix(pr-followup): PR #245 comment 3201562602 — **<sub><sub> Path: tests/uat/_cli.py * fix(pr-followup): PR #245 comment 3201562610 — **<sub><sub> Path: tests/uat/phases/preflight.py * fix(pr-followup): PR #246 comment 3202118174 — **<sub><sub> Path: _project/audits/screenshots/results-explorer-hierarchy-usability-data-audit-20260507/screenshot-manifest.json * fix(pr-followup): PR #255 comment 3202201883 — **<sub><sub> Path: tests/uat/phases/explorer_smoke.py * fix(pr-followup): PR #256 comment 3202212814 — **<sub><sub> Path: results-explorer/src/pages/BenchmarkIndex.tsx * fix(pr-followup): PR #266 comment 3204995419 — **<sub><sub> Path: results-explorer/src/lib/facetDisplay.ts * fix(pr-followup): PR #267 comment 3205001733 — **<sub><sub> Path: _project/scripts/skill_sync_lock_audit.py --------- Co-authored-by: Joe Harris <57046+joeharris76@users.noreply.github.com>
Summary
This supplements PR #244 with the missing robust evidence for visual hierarchy, usability, and data-presentation quality in the BenchBox Results Explorer.
Adds a retained-worktree audit artifact, fresh screenshot evidence for the requested route and viewport matrix, and six evidence-bound remediation TODOs that reference PR #246, finding IDs, and exact screenshot filenames. The audit mapping was updated so every PR #246 finding has both broad PR #244 coverage and a screenshot-specific PR #246 TODO.
Evidence
_project/audits/results-explorer-hierarchy-usability-data-audit-20260507.md_project/audits/screenshots/results-explorer-hierarchy-usability-data-audit-20260507/_project/TODO/main/planning/results-explorer-pr246-*.yamlPR #246 TODOs
results-explorer-pr246-home-leaderboard-evidence-remediationresults-explorer-pr246-responsive-navigation-overflow-remediationresults-explorer-pr246-result-compare-evidence-remediationresults-explorer-pr246-browse-platform-evidence-remediationresults-explorer-pr246-query-workbench-evidence-remediationresults-explorer-pr246-final-evidence-gateValidation
uv run --project _project/scripts -- python _project/scripts/todo_cli.py validate _project/TODOuv run --project _project/scripts -- python _project/scripts/todo_cli.py check-graphuv run --project _project/scripts -- python _project/scripts/todo_cli.py reindex/todo reviewquality pass: no findings after checking evidence linkage, screenshot existence, work breakdown, verification, guardrails, scope limits, and full screenshot coverage