Skip to content

docs: add benchmark proof and distribution assets#42

Merged
mohanagy merged 9 commits into
mainfrom
feature/proof-distribution
May 2, 2026
Merged

docs: add benchmark proof and distribution assets#42
mohanagy merged 9 commits into
mainfrom
feature/proof-distribution

Conversation

@mohanagy
Copy link
Copy Markdown
Owner

@mohanagy mohanagy commented May 2, 2026

Summary

  • add a real GoValidate Platform PR-review benchmark artifact with committed raw evidence and verifier
  • publish benchmark pages from docs/benchmarks/ via GitHub Pages and link them from the README
  • add repo-side Smithery and awesome-mcp listing assets for future submissions

Test Plan

  • npm run typecheck
  • npm run test:run
  • npm run build

Summary by CodeRabbit

  • New Features

    • Automated GitHub Pages deployment for the benchmark documentation hub
    • New public benchmark proof pages (retrieval and PR-review) with headline metrics and evidence
  • Documentation

    • Added "Public proof" section and a marketplace listing pack (marketplace manifests and submission docs)
    • New benchmark landing and styled documentation pages linking evidence and reproducibility instructions
  • Tests

    • Added verification tests and reproducibility scripts for benchmark artifacts and reports

mohanagy and others added 7 commits May 2, 2026 03:29
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 2, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 49612dbe-02bd-4bca-a939-88db3e6bd4be

📥 Commits

Reviewing files that changed from the base of the PR and between 929874c and 1168106.

📒 Files selected for processing (2)
  • .github/workflows/pages.yml
  • docs/benchmarks/styles.css
✅ Files skipped from review due to trivial changes (2)
  • .github/workflows/pages.yml
  • docs/benchmarks/styles.css

📝 Walkthrough

Walkthrough

Adds benchmark proof artifacts and a styled benchmarks landing, marketplace submission docs, and a GitHub Pages workflow; introduces sanitization (hashing) of path-derived identifiers in PR-impact payloads and updates tests to verify sanitized prompts and benchmark artifacts.

Changes

Benchmark Proofs & Marketplace Distribution

Layer / File(s) Summary
Benchmark Data & Artifacts
docs/benchmarks/2026-04-30-govalidate/index.html, docs/benchmarks/2026-05-02-govalidate-pr-review/*
Adds two benchmark proof runs (retrieval and PR-review) with pages, README, report.json, verify.sh, prompt/answer artifacts and related static files.
Presentation Layer
docs/benchmarks/styles.css, docs/benchmarks/index.html
Adds CSS design tokens, layout components, and a benchmarks landing page linking individual benchmark proofs.
Marketplace Documentation
docs/distribution/marketplaces/README.md, .../smithery.json, .../smithery.md, .../awesome-mcp.md
Adds marketplace listing pack and submission artifacts (Smithery manifest, submission doc, Awesome MCP entry, install/quickstart and proof links).
Project Linkage
README.md
Adds a “Public proof” section linking to benchmark hubs and updates Documentation list to include the marketplace listing pack.
Deployment Infrastructure
.github/workflows/pages.yml
Adds GitHub Actions workflow to deploy docs/benchmarks/ to GitHub Pages on push to main and via manual dispatch, with least-privilege permissions and concurrency settings.

PR-Impact Payload Sanitization

Layer / File(s) Summary
Core Sanitization
src/infrastructure/review-compare.ts
Detects path-derived identifier strings and replaces them with stable hashed review_node_* identifiers before rendering verbose/compact review prompts; updates token accounting to use sanitized payloads/prompts.
Wiring / Artifact Generation
src/infrastructure/review-compare.ts, docs/benchmarks/...
Sanitized payloads are used to render and write prompt artifacts and to compute prompt/token estimates saved in report outputs.
Tests / Verification
tests/unit/review-compare.test.ts, tests/unit/benchmark-artifact.test.ts
createRepo helper accepts pathLikeNodeIds to generate path-like IDs; new tests assert prompts contain no raw path-derived IDs and that sanitized node_id values match review_node_[0-9a-f]+; benchmark artifact tests validate README/report/prompt artifacts and verify.sh execution.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 I hopped through docs and pages bright,

proofs bundled up, deployed to light.
IDs hushed, hashed safe and neat,
tests nod true with every beat.
A market pack and styles so sweet — hop, celebrate! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main changes: adding benchmark proof materials and distribution assets (marketplace listings).
Description check ✅ Passed The description covers the Summary section with clear objectives and includes a Test Plan with all specified checks (typecheck, test:run, build) marked as completed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/proof-distribution

Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/pages.yml:
- Around line 27-40: Replace floating action tags with immutable commit SHAs for
each `uses:` entry in the workflow: change `actions/checkout@v6`,
`actions/configure-pages@v5`, `actions/upload-pages-artifact@v4`, and
`actions/deploy-pages@v4` to their corresponding pinned commit SHAs (e.g.,
`actions/checkout@<sha>`). Locate these `uses:` lines in the
`.github/workflows/pages.yml` snippet and replace the tag syntax with the full
commit SHA fetched from the action's GitHub repo (ensure you pin the exact
commit you tested), committing the updated workflow.

In `@docs/benchmarks/2026-05-01-govalidate-pr-review/compact-prompt.txt`:
- Around line 395-403: The benchmark payloads currently commit
workstation/username-derived node_id values (e.g., path fragments like
users_mohammednaji_...) so update the serialization in mergeRunSections() to
sanitize node_id before persisting: detect node_id values that contain
filesystem/username patterns and replace them with a stable, non-PII identifier
(e.g., a deterministic hash or slug derived from the original id) and use that
sanitized id in the committed payload; apply the same sanitization logic to the
other payload assembly sites referenced around the same area (the blocks that
produce node_id entries) so all emitted node_id fields are PII-free and stable
across runs.

In `@docs/benchmarks/styles.css`:
- Around line 154-157: The code selector's font-family list contains a
duplicated and quoted token ("SFMono-Regular" and SFMono-Regular) which causes
stylelint errors; edit the code rule in styles.css (selector: code) to remove
the duplicate and the unnecessary quotes so the font-stack reads SFMono-Regular,
ui-monospace, Menlo, Monaco, Consolas, monospace (i.e., keep one SFMono-Regular
entry without quotes).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: a286f439-6fb2-428e-8dd0-4ac2d335af3b

📥 Commits

Reviewing files that changed from the base of the PR and between 3f7914d and bf41607.

📒 Files selected for processing (17)
  • .github/workflows/pages.yml
  • README.md
  • docs/benchmarks/2026-04-30-govalidate/index.html
  • docs/benchmarks/2026-05-01-govalidate-pr-review/README.md
  • docs/benchmarks/2026-05-01-govalidate-pr-review/compact-answer.txt
  • docs/benchmarks/2026-05-01-govalidate-pr-review/compact-prompt.txt
  • docs/benchmarks/2026-05-01-govalidate-pr-review/index.html
  • docs/benchmarks/2026-05-01-govalidate-pr-review/report.json
  • docs/benchmarks/2026-05-01-govalidate-pr-review/verbose-answer.txt
  • docs/benchmarks/2026-05-01-govalidate-pr-review/verbose-prompt.txt
  • docs/benchmarks/2026-05-01-govalidate-pr-review/verify.sh
  • docs/benchmarks/index.html
  • docs/benchmarks/styles.css
  • docs/distribution/marketplaces/README.md
  • docs/distribution/marketplaces/awesome-mcp.md
  • docs/distribution/marketplaces/smithery.json
  • docs/distribution/marketplaces/smithery.md

Comment thread .github/workflows/pages.yml Outdated
Comment thread docs/benchmarks/2026-05-01-govalidate-pr-review/compact-prompt.txt Outdated
Comment thread docs/benchmarks/styles.css
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/infrastructure/review-compare.ts`:
- Around line 86-87: The sanitization allowlist REVIEW_PROMPT_ID_FIELDS is
missing ID-bearing "node" fields so add "node" to the set (and any other
plural/array variants used for impact items) so path-derived identifiers like
per_node_impact[].node get replaced; make the identical update to the other
allowlist instance around PATH_DERIVED_ID_TOKENS usage (the second set defined
later) to ensure both REVIEW_PROMPT_ID_FIELDS and the later token/field set
include "node" (and related per_node_impact key) so these values are sanitized
before persisting prompts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 9d714646-5b17-44e4-8975-50cbd0df2872

📥 Commits

Reviewing files that changed from the base of the PR and between bf41607 and 929874c.

📒 Files selected for processing (16)
  • README.md
  • docs/benchmarks/2026-05-02-govalidate-pr-review/README.md
  • docs/benchmarks/2026-05-02-govalidate-pr-review/compact-answer.txt
  • docs/benchmarks/2026-05-02-govalidate-pr-review/compact-prompt.txt
  • docs/benchmarks/2026-05-02-govalidate-pr-review/index.html
  • docs/benchmarks/2026-05-02-govalidate-pr-review/report.json
  • docs/benchmarks/2026-05-02-govalidate-pr-review/verbose-answer.txt
  • docs/benchmarks/2026-05-02-govalidate-pr-review/verbose-prompt.txt
  • docs/benchmarks/2026-05-02-govalidate-pr-review/verify.sh
  • docs/benchmarks/index.html
  • docs/distribution/marketplaces/README.md
  • docs/distribution/marketplaces/smithery.json
  • docs/distribution/marketplaces/smithery.md
  • src/infrastructure/review-compare.ts
  • tests/unit/benchmark-artifact.test.ts
  • tests/unit/review-compare.test.ts
✅ Files skipped from review due to trivial changes (10)
  • docs/benchmarks/index.html
  • docs/benchmarks/2026-05-02-govalidate-pr-review/index.html
  • docs/benchmarks/2026-05-02-govalidate-pr-review/README.md
  • docs/distribution/marketplaces/smithery.md
  • docs/benchmarks/2026-05-02-govalidate-pr-review/compact-prompt.txt
  • docs/distribution/marketplaces/README.md
  • docs/distribution/marketplaces/smithery.json
  • docs/benchmarks/2026-05-02-govalidate-pr-review/report.json
  • docs/benchmarks/2026-05-02-govalidate-pr-review/compact-answer.txt
  • README.md

Comment on lines +86 to +87
const REVIEW_PROMPT_ID_FIELDS = new Set(['node_id', 'from_id', 'to_id'])
const PATH_DERIVED_ID_TOKENS = new Set([
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Sanitization allowlist misses ID-bearing node fields.

Line 86 currently limits replacement to node_id/from_id/to_id, but pr_impact also carries IDs in fields like per_node_impact[].node. Those values can remain unsanitized and leak path-derived identifiers into persisted prompts.

Proposed fix
-const REVIEW_PROMPT_ID_FIELDS = new Set(['node_id', 'from_id', 'to_id'])
+const REVIEW_PROMPT_ID_FIELDS = new Set(['node_id', 'from_id', 'to_id', 'node'])

 function sanitizePersistedReviewPayload<T>(value: T): T {
@@
-      if (
-        REVIEW_PROMPT_ID_FIELDS.has(key) &&
+      if (
+        REVIEW_PROMPT_ID_FIELDS.has(key) &&
         typeof entryValue === 'string' &&
         entryValue.length > 0 &&
         isPathDerivedIdentifier(entryValue)
       ) {
         return [key, sanitizePersistedIdentifier(entryValue)]
       }

Also applies to: 263-271

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/infrastructure/review-compare.ts` around lines 86 - 87, The sanitization
allowlist REVIEW_PROMPT_ID_FIELDS is missing ID-bearing "node" fields so add
"node" to the set (and any other plural/array variants used for impact items) so
path-derived identifiers like per_node_impact[].node get replaced; make the
identical update to the other allowlist instance around PATH_DERIVED_ID_TOKENS
usage (the second set defined later) to ensure both REVIEW_PROMPT_ID_FIELDS and
the later token/field set include "node" (and related per_node_impact key) so
these values are sanitized before persisting prompts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mohanagy mohanagy merged commit b7dd285 into main May 2, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant