Skip to content

Docs: audit Results Explorer hierarchy usability data#246

Merged
github-actions[bot] merged 2 commits intodevelopfrom
docs/results-explorer-hierarchy-usability-data-audit
May 7, 2026
Merged

Docs: audit Results Explorer hierarchy usability data#246
github-actions[bot] merged 2 commits intodevelopfrom
docs/results-explorer-hierarchy-usability-data-audit

Conversation

@joeharris76
Copy link
Copy Markdown
Owner

@joeharris76 joeharris76 commented May 7, 2026

Summary

This supplements PR #244 with the missing robust evidence for visual hierarchy, usability, and data-presentation quality in the BenchBox Results Explorer.

Adds a retained-worktree audit artifact, fresh screenshot evidence for the requested route and viewport matrix, and six evidence-bound remediation TODOs that reference PR #246, finding IDs, and exact screenshot filenames. The audit mapping was updated so every PR #246 finding has both broad PR #244 coverage and a screenshot-specific PR #246 TODO.

Evidence

  • Audit: _project/audits/results-explorer-hierarchy-usability-data-audit-20260507.md
  • Screenshots: _project/audits/screenshots/results-explorer-hierarchy-usability-data-audit-20260507/
  • TODOs: _project/TODO/main/planning/results-explorer-pr246-*.yaml

PR #246 TODOs

  • results-explorer-pr246-home-leaderboard-evidence-remediation
  • results-explorer-pr246-responsive-navigation-overflow-remediation
  • results-explorer-pr246-result-compare-evidence-remediation
  • results-explorer-pr246-browse-platform-evidence-remediation
  • results-explorer-pr246-query-workbench-evidence-remediation
  • results-explorer-pr246-final-evidence-gate

Validation

  • uv run --project _project/scripts -- python _project/scripts/todo_cli.py validate _project/TODO
  • uv run --project _project/scripts -- python _project/scripts/todo_cli.py check-graph
  • uv run --project _project/scripts -- python _project/scripts/todo_cli.py reindex
  • /todo review quality pass: no findings after checking evidence linkage, screenshot existence, work breakdown, verification, guardrails, scope limits, and full screenshot coverage

@joeharris76 joeharris76 marked this pull request as ready for review May 7, 2026 14:15
@github-actions github-actions Bot enabled auto-merge (squash) May 7, 2026 14:15
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c61f27d57f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

"visibleFlags": {
"rawError": false,
"binderError": false,
"cannotCompare": true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Mark comparable DuckDB-vs-AWS captures correctly

The manifest marks this compare capture as cannotCompare: true, but the corresponding screenshot evidence (compare-duckdb-aws-*.png, e.g. 1280px) shows a valid comparable run with a “No differences” comparability receipt and normal decision summary. This misclassification can skew any audit tooling or manual triage that relies on visibleFlags to distinguish guarded mixed-cohort compares from same-cohort comparisons, leading to incorrect conclusions for the DQ-4/U-2 evidence set.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    Follow-up sweep actioned this PR review comment.

    Disposition: fixed
    Branch: `chore/pr-review-followups-since-2026-05-05`

    Summary:
    Disposition: fixed

Evidence: Updated the PR #246 hierarchy screenshot manifest so all four compare-duckdb-aws-* entries now mark visibleFlags.cannotCompare: false, matching the comparable DuckDB-vs-AWS evidence. Added focused regression coverage in test_results_explorer_audit_manifest.py.

Verification: uv run -- python -m pytest tests/unit/explorer/test_results_explorer_audit_manifest.py -q passed, and uv run -- ruff check tests/unit/explorer/test_results_explorer_audit_manifest.py passed.

    Future sweeps skip comments that already have this marker reply.

@github-actions github-actions Bot merged commit 555708d into develop May 7, 2026
6 checks passed
joeharris76 added a commit that referenced this pull request May 8, 2026
* fix(pr-followup): PR #245 comment 3201562602 — **<sub><sub>![P2 Badge](https://img.shields.io/badge…

Disposition: fixed
Source: #245 (comment)
Path: tests/uat/_cli.py

* fix(pr-followup): PR #245 comment 3201562610 — **<sub><sub>![P1 Badge](https://img.shields.io/badge…

Disposition: fixed
Source: #245 (comment)
Path: tests/uat/phases/preflight.py

* fix(pr-followup): PR #246 comment 3202118174 — **<sub><sub>![P2 Badge](https://img.shields.io/badge…

Disposition: fixed
Source: #246 (comment)
Path: _project/audits/screenshots/results-explorer-hierarchy-usability-data-audit-20260507/screenshot-manifest.json

* fix(pr-followup): PR #255 comment 3202201883 — **<sub><sub>![P2 Badge](https://img.shields.io/badge…

Disposition: fixed
Source: #255 (comment)
Path: tests/uat/phases/explorer_smoke.py

* fix(pr-followup): PR #256 comment 3202212814 — **<sub><sub>![P2 Badge](https://img.shields.io/badge…

Disposition: fixed
Source: #256 (comment)
Path: results-explorer/src/pages/BenchmarkIndex.tsx

* fix(pr-followup): PR #266 comment 3204995419 — **<sub><sub>![P1 Badge](https://img.shields.io/badge…

Disposition: fixed
Source: #266 (comment)
Path: results-explorer/src/lib/facetDisplay.ts

* fix(pr-followup): PR #267 comment 3205001733 — **<sub><sub>![P2 Badge](https://img.shields.io/badge…

Disposition: fixed
Source: #267 (comment)
Path: _project/scripts/skill_sync_lock_audit.py

---------

Co-authored-by: Joe Harris <57046+joeharris76@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant