Skip to content

ci: agent-driven UI smoke test on Vercel preview#2238

Merged
teeohhem merged 1 commit into
mainfrom
teeohhem/ui-preview-agent-smoke
May 7, 2026
Merged

ci: agent-driven UI smoke test on Vercel preview#2238
teeohhem merged 1 commit into
mainfrom
teeohhem/ui-preview-agent-smoke

Conversation

@teeohhem
Copy link
Copy Markdown
Contributor

@teeohhem teeohhem commented May 7, 2026

Summary

Adds a workflow that runs an agent (claude-code-action + Playwright MCP) against the Vercel preview build of any PR touching packages/app or packages/common-utils. The agent reads the "How to test on Vercel preview" section of the PR body, parses the listed routes and numbered steps, and executes them verbatim against the preview — treating Verify / Confirm / Assert steps as assertions and posting a single ✅ / ❌ summary comment.

Tightens the existing PR template's test section into a parseable shape: an explicit **Preview routes:** line and a numbered **Steps:** list. PRs with no plan, or that leave the section as the template placeholder, get a one-line skip comment instead of speculative testing.

The preview is LOCAL_MODE with a pre-configured demo ClickHouse, so the agent doesn't need to register a user or add a connection — it just opens the listed routes and runs the author's steps.

Why

Today the PR description tells humans what to verify, but nothing executes it. This pipes that same plan to an agent and surfaces results back as a comment. It's deliberately observe-only for now — no failing status check — so we can watch the false-positive rate before promoting it to a required check.

How it works

  1. Triggers on pull_request_target for paths under packages/app/** or packages/common-utils/**. Also workflow_dispatch so we can manually re-run on a PR for testing.
  2. Waits for the Vercel preview using patrickedqvist/wait-for-vercel-preview.
  3. Installs Playwright + Chromium.
  4. Spawns anthropics/claude-code-action@v1 with the @playwright/mcp server. Prompt instructs the agent to:
    • read /tmp/pr-body.md,
    • find ### How to test on Vercel preview,
    • parse **Preview routes:** and **Steps:**,
    • execute each step on each route, treating Verify / Confirm / Assert as assertions,
    • post a single PR comment summarizing pass / fail per route, with console errors and 5xx responses inline.
  5. Comment is identified by <!-- ui-preview-smoke --> so subsequent runs replace it.

Test plan

This workflow can't smoke-test itself, so:

  • Confirm ANTHROPIC_API_KEY is set in repo secrets (already in use by claude-code-review.yml).
  • Confirm NEXT_PUBLIC_IS_LOCAL_MODE, NEXT_PUBLIC_HDX_LOCAL_DEFAULT_CONNECTIONS, NEXT_PUBLIC_HDX_LOCAL_DEFAULT_SOURCES are set on the Vercel Preview environment (not just Production).
  • After merge, manually run via gh workflow run ui-preview-smoke.yml -f pr_number=<some-recent-ui-pr> to validate end-to-end without waiting for a new PR.
  • Watch the first ~10 PR runs for false positives before adding any blocking behavior.

How to test on Vercel preview

N/A — this PR only changes CI workflow + PR template.

References

Adds a workflow that runs an agent (claude-code-action + Playwright MCP)
against the Vercel preview build of any PR touching packages/app or
packages/common-utils. The agent reads the "How to test on Vercel
preview" section of the PR body and executes the listed routes and
numbered steps verbatim, treating Verify/Confirm/Assert steps as
assertions and posting a single ✅/❌ summary comment on the PR.

Tightens the existing PR template's "How to test" section so authors
write a parseable plan: an explicit "**Preview routes:**" line and a
numbered "**Steps:**" list. PRs without a plan, or with the section
left as the template placeholder, get a one-line skip comment from the
agent rather than speculative testing.

The preview is LOCAL_MODE with a pre-configured demo ClickHouse, so the
agent does not need to register a user or add a connection — it just
opens the listed routes and runs the author's steps. Run shape:
~30-90s, single PR comment, no failing status check (start observe-only;
promote to required once false-positive rate is known).
@vercel
Copy link
Copy Markdown

vercel Bot commented May 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
hyperdx-oss Ignored Ignored May 7, 2026 6:41pm

Request Review

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 7, 2026

⚠️ No Changeset found

Latest commit: 749a842

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions Bot added the review/tier-1 Trivial — auto-merge candidate once CI passes label May 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

🟢 Tier 1 — Trivial

Docs, images, lock files, or a dependency bump. No functional code changes detected.

Why this tier:

  • All files are docs / images / lock files

Review process: Auto-merge once CI passes. No human review required.
SLA: Resolves automatically.

Stats
  • Production files changed: 0
  • Production lines changed: 0
  • Branch: teeohhem/ui-preview-agent-smoke
  • Author: teeohhem

To override this classification, remove the review/tier-1 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@teeohhem teeohhem merged commit c8a7043 into main May 7, 2026
18 checks passed
@teeohhem teeohhem deleted the teeohhem/ui-preview-agent-smoke branch May 7, 2026 18:43
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

PR Review

  • Comment never gets posted. The prompt tells the agent to "post a single PR comment" (skip path and result path), but --allowedTools only grants Bash(cat /tmp/pr-body.md), Bash(gh pr view:*), mcp__playwright__* — there is no gh pr comment / gh issue comment and no follow-up workflow step that consumes the summary structured output (compare claude-code-review.yml's peter-evans/find-comment + create-or-update-comment pair). The summary field goes nowhere. → Add an Extract summary + find-comment + create-or-update-comment chain (mirror claude-code-review.yml) and remove the "post a comment" instructions from the agent prompt.
  • 🔒 Prompt injection via PR body in pull_request_target context. pr.body is attacker-controlled, and the agent has Playwright network egress + pull-requests: write via GITHUB_TOKEN. A malicious "Steps:" entry like 1. Open https://attacker.com?x=<secret> can exfiltrate via navigation, and the agent will happily render attacker-authored markdown back into a comment if the posting path is ever wired up. → Gate the job on github.event.pull_request.head.repo.full_name == github.repository (or maintainer author_association) for pull_request_target, and/or restrict Playwright navigation to the resolved preview origin.
  • workflow_dispatch runs will not match the PR commit. The inline comment acknowledges "For workflow_dispatch we need to point at the PR head commit" but the step never passes one — wait-for-vercel-preview will resolve against the workflow ref (main), not steps.pr.outputs.head_sha. → Pass sha: ${{ steps.pr.outputs.head_sha }} to patrickedqvist/wait-for-vercel-preview.
  • ⚠️ Setup Node + playwright install --with-deps chromium is dead weight. The Playwright MCP server runs via npx -y @playwright/mcp@latest in its own subprocess and manages its own browser binary; the system-level globals don't get used. → Either drop both steps, or replace the MCP launch with a command that reuses the installed binary.
  • ⚠️ Expression-into-JS injection on inputs.pr_number. Number('${{ inputs.pr_number }}') interpolates raw user input into a JS source string (a value like 1');doSomething();// would execute). Inherited from claude-code-review.yml, but worth fixing here. → Pass via env: PR_NUMBER: ${{ inputs.pr_number }} and use Number(process.env.PR_NUMBER).
  • ⚠️ id-token: write is unused. No OIDC consumer in this job. → Drop it to keep the token surface minimal.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

E2E Test Results

All tests passed • 166 passed • 3 skipped • 1213s

Status Count
✅ Passed 166
❌ Failed 0
⚠️ Flaky 3
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Deep Review

Scope: 2 files (192 lines): .github/pull_request_template.md and new .github/workflows/ui-preview-smoke.yml — a pull_request_target workflow that runs claude-code-action + Playwright MCP against Vercel preview deploys and parses the PR body as a test plan.

🔴 P0/P1 -- must fix

  • .github/workflows/ui-preview-smoke.yml:36-152 -- pull_request_target writes the attacker-controlled PR body to /tmp/pr-body.md and feeds it to an LLM that holds pull-requests: write, id-token: write, ANTHROPIC_API_KEY, and mcp__playwright__*; prompt-injection in the body can coerce the agent to forge approval comments, read other PRs via gh pr view, or navigate Playwright to an attacker URL with secrets in the query.
    • Fix: Move the trigger to pull_request plus a maintainer-gated workflow_run / labeled-PR pattern, or pre-extract and sanitize the test plan in a deterministic JS step so the agent never sees raw body markdown.
    • security, adversarial
  • .github/workflows/ui-preview-smoke.yml:36,58,67,77,87 -- Every action is pinned to a mutable tag (actions/github-script@v9, patrickedqvist/wait-for-vercel-preview@v1.3.1, actions/setup-node@v4, anthropics/claude-code-action@v1, @playwright/mcp@latest); on a pull_request_target workflow with secrets, any retag is a direct exfil path, and the third-party single-maintainer Vercel-wait action is the highest-risk link.
    • Fix: Pin every third-party action to a full commit SHA with the version as a trailing comment, pin @playwright/mcp to an exact version, and add Dependabot for the actions ecosystem.
    • security, adversarial
  • .github/workflows/ui-preview-smoke.yml:43 -- Number('${{ inputs.pr_number }}') interpolates a string-typed dispatch input directly into a JS literal inside actions/github-script; a value like 1'); /* code */; (' breaks out of the string and runs arbitrary JS with pull-requests: write and id-token: write.
    • Fix: Pass the input via env: PR_NUMBER_INPUT: ${{ inputs.pr_number }} and read with process.env.PR_NUMBER_INPUT, then validate Number.isInteger(prNumber) && prNumber > 0.
    • adversarial, security, correctness
  • .github/workflows/ui-preview-smoke.yml:151 -- mcp__playwright__* wildcard authorizes every Playwright tool including navigate, evaluate, and request methods with no origin allowlist, giving prompt-injected agents an arbitrary outbound channel for in-context data.
    • Fix: Enumerate only the tools needed (navigate, screenshot, click, fill) and pass an origin allowlist to @playwright/mcp constraining navigation to the resolved Vercel preview host.
    • security, adversarial
  • .github/workflows/ui-preview-smoke.yml:113-152 -- The agent's --allowedTools permits only Bash(cat /tmp/pr-body.md), Bash(gh pr view:*), and mcp__playwright__* — none can post a PR comment, and the workflow has no find-comment / create-or-update-comment step like sibling claude-code-review.yml and deep-review.yml; the structured-output summary field has no posting wiring, so the workflow's primary deliverable likely never appears.
    • Fix: Add an explicit peter-evans/find-comment@<sha> + peter-evans/create-or-update-comment@<sha> step keyed on <!-- ui-preview-smoke --> consuming the action's structured output, mirroring the existing review workflows.
    • reliability, agent-native
  • .github/workflows/ui-preview-smoke.yml:56-65 -- wait-for-vercel-preview receives no sha: input; on workflow_dispatch github.event.pull_request is null, so the action falls back to the default-branch HEAD and either times out or returns the wrong preview URL — the inline comment acknowledges the gap and core.setOutput('head_sha', ...) is set but never consumed.
    • Fix: Pass sha: ${{ steps.pr.outputs.head_sha }} to wait-for-vercel-preview, drop the now-explanatory comment, and remove the dead output if it remains unused.
    • correctness, reliability, maintainability
  • .github/workflows/ui-preview-smoke.yml:151 -- Bash(gh pr view:*) accepts any args, so a prompt-injected agent can run gh pr view <other-PR> --json body,comments,reviews against private security PRs and embed the result in its public comment.
    • Fix: Drop Bash(gh pr view:*) entirely (the body is already on disk via cat /tmp/pr-body.md); if PR metadata is genuinely needed, pre-stage it to a JSON file in the github-script step and only allow Bash(cat /tmp/pr-meta.json).
    • security, adversarial
  • .github/workflows/ui-preview-smoke.yml:26,61,73-74,146 -- timeout-minutes: 15 minus wait-for-vercel-preview max_timeout: 600 minus ~120s for setup-node + playwright install --with-deps chromium leaves ~3 min for an agent the prompt budgets at 8 min; slow Vercel builds get hard-killed before any comment posts.
    • Fix: Either raise timeout-minutes to 25 or drop max_timeout to 300, and make the budget explicit in a comment so future changes don't reintroduce the gap.
    • reliability, correctness
  • .github/workflows/ui-preview-smoke.yml:108-128 -- The parser depends on four exact magic strings (### How to test on Vercel preview, **Preview routes:**, **Steps:**, <!-- ui-preview-smoke -->) shared with pull_request_template.md with no single source of truth or tests; a template rename silently turns every PR into "no plan, skip."
    • Fix: Extract the section deterministically in the github-script step (regex on the heading, strip HTML comments and code fences) and pass the parsed {routes, steps} to the agent as structured input rather than letting the LLM parse markdown.
    • maintainability, correctness
  • .github/workflows/ui-preview-smoke.yml:76-152 -- If claude-code-action fails (Anthropic 5xx, schema validation failure, OOM, cancellation), the job goes red with no PR comment; authors see no signal and no run-link to investigate.
    • Fix: Add a final if: failure() step that posts a fallback comment via gh pr comment linking ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}.
  • .github/workflows/ui-preview-smoke.yml:56-97 -- An empty steps.vercel.outputs.url (action timeout, deployment skipped, status check missing) interpolates into the prompt as Preview URL: and the agent navigates to bare paths like /chart, returning misleading agent commentary instead of an infrastructure error.
    • Fix: After the wait step, fail fast: if [[ -z "${{ steps.vercel.outputs.url }}" ]]; then echo "::error::Vercel preview URL is empty"; exit 1; fi, optionally with curl --fail --silent --head to also detect 5xx previews.
  • .github/workflows/ui-preview-smoke.yml:125-147 -- The agent decides ✅/❌ from its own observations with no out-of-band verifier; under model variance a 200-with-broken-DOM page becomes "all routes passed", and the <!-- ui-preview-smoke --> sticky framing makes the comment look like authoritative CI status.
    • Fix: Capture HAR + console logs + screenshots in a deterministic Playwright step, attach as artifacts, and reduce the agent's role to summarizing those artifacts; phrase the comment explicitly as "agent-observed, advisory" and never use ✅ without a deterministic check.
  • .github/workflows/ui-preview-smoke.yml:97 -- ${{ steps.vercel.outputs.url }} is interpolated into the agent prompt with no origin validation; a buggy or compromised wait-for-vercel-preview can redirect navigation to an arbitrary host.
    • Fix: Validate the URL against ^https://[a-z0-9-]+\.vercel\.app(/|$) before passing to the prompt and fail-closed otherwise.

🟡 P2 -- recommended

  • .github/workflows/ui-preview-smoke.yml:76-152 -- The PR description claims <!-- ui-preview-smoke --> makes "subsequent runs replace" the comment, but no find-and-update logic exists in the workflow and the agent has no comment-write tool; each sync push will accumulate a fresh comment.
  • .github/workflows/ui-preview-smoke.yml:87 -- @playwright/mcp@latest resolves to whatever is current at run time; an upstream breaking change silently breaks the entire mcp__playwright__* allowlist with no diff to attribute the failure to.
    • Fix: Pin to an exact version (e.g. @playwright/mcp@0.0.28) and update deliberately via Dependabot.
    • reliability, maintainability
  • .github/workflows/ui-preview-smoke.yml:43-50 -- Number() of a non-integer or empty inputs.pr_number produces NaN, which pulls.get rejects with an opaque 422/404 mid-workflow rather than a clear "invalid PR number" message.
    • Fix: Validate before calling: if (!Number.isInteger(prNumber) || prNumber <= 0) { core.setFailed('Invalid pr_number: must be a positive integer'); return; }.
    • reliability, correctness
  • .github/workflows/ui-preview-smoke.yml:93-148 -- A 56-line freeform LLM prompt is embedded in YAML where it has no syntax highlighting, no diff-by-intent, and is hard to iterate on without unrelated workflow churn.
    • Fix: Move the prompt body to .github/agent-prompts/ui-smoke.md, read it with a cat/run step, and reference the captured output in prompt:.
  • .github/workflows/ui-preview-smoke.yml:30 -- id-token: write is declared but no step exchanges an OIDC token; this needlessly enables any compromised step (or jailbroken agent) to mint a JWT against any cloud trust policy that accepts this repo's workflows.
    • Fix: Remove id-token: write until a concrete OIDC consumer is added.
  • .github/workflows/ui-preview-smoke.yml:149-150 -- --setting-sources user is a no-op on a fresh GitHub-hosted runner today but means any future runner-image change that populates ~/.claude/ will silently alter agent behavior in CI.
    • Fix: Drop the flag, or replace with --setting-sources project if project-level config is intended.
  • .github/workflows/ui-preview-smoke.yml:71-74 -- npm install -g playwright followed by playwright install --with-deps chromium adds 60-120s of CI time, but the agent uses npx -y @playwright/mcp@latest which manages its own browser; the global install is almost certainly dead weight.
    • Fix: Delete the "Install Playwright + Chromium" step; if the MCP server actually requires pre-staged binaries, document why and pin both versions.
    • reliability, maintainability
  • .github/workflows/ui-preview-smoke.yml:17-18 -- cancel-in-progress: true between the agent step and the (currently missing) comment post means a fast push on a long-running PR can leave a previous run's stale comment with no replacement.
    • Fix: Combine with the upsert step above and run the comment-update inside if: always() so a cancelled run still removes its in-flight signal.
  • .github/workflows/ui-preview-smoke.yml:3-18 -- Concurrency is per-PR but global; N concurrent UI PRs (Dependabot lockfile updates, simultaneous review-ready conversions) launch N concurrent ~15-minute Anthropic-credit-burning jobs with no global cap.
    • Fix: Add a global queue with a small max-parallel (or guard with a ui-smoke label that maintainers apply) and add a billing alert on Anthropic spend.
🔵 P3 nitpicks (10)
  • .github/workflows/ui-preview-smoke.yml:54 -- core.setOutput('head_sha', ...) is set but never read.
    • Fix: Either consume it (see P1 Update README.md spelling #6) or remove the output and the misleading comment beneath it.
    • correctness, maintainability
  • .github/workflows/ui-preview-smoke.yml:108-127 -- The parser is sensitive to heading-level drift, missing bold markers, and reformatted route lines; any author edit silently classifies the PR as "no plan."
    • Fix: Use a tolerant regex (case-insensitive heading match, accept Preview routes with or without bold) or document the contract loudly inside the template's HTML comment.
    • correctness, agent-native
  • .github/workflows/ui-preview-smoke.yml:22-24 -- The multi-line if: breaks == across lines and reads as two separate conditions to a hasty reader.
    • Fix: Use a single-line expression or a YAML block scalar (>-) with one clause per line and explicit parentheses.
  • .github/workflows/ui-preview-smoke.yml:56-65 -- A failed Vercel build that still exposes a 500 page is reported by the agent as a PR-introduced regression with a ❌ comment, falsely indicting the author.
    • Fix: Preflight curl --fail --silent --head '${{ steps.vercel.outputs.url }}' and surface 5xx as a distinct workflow error.
  • .github/workflows/ui-preview-smoke.yml:76-92 -- A transient Anthropic 503 after a 9-minute Vercel wait wastes the entire run with no retry.
    • Fix: Wrap the agent invocation in a small step-level retry, or rely on the new failure-fallback comment to make transient failures actionable.
  • .github/workflows/ui-preview-smoke.yml:46-54 -- A workflow_dispatch against a closed/merged PR runs the agent and posts on a closed PR with no state check.
    • Fix: Early-return when pr.state !== 'open': core.setFailed('PR is not open'); return;.
  • .github/pull_request_template.md:38-40 -- The template says leave blank or write "N/A — non-UI change" (em-dash) while the prompt checks for "N/A" and "non-UI change" as separate tokens; lowercase or "Not applicable" silently flow into the execute path.
    • Fix: Define a single canonical token (e.g. smoke: skip) used verbatim in both files.
  • .github/pull_request_template.md:38-44 -- The assertion verb list (Verify/Confirm/Assert/Check/Ensure) lives only in the workflow prompt; authors writing "Make sure" or "Validate" bypass assertion handling.
    • Fix: Reproduce the verb list in the template's HTML comment.
  • .github/workflows/ui-preview-smoke.yml:117-122 -- The skip-comment text in the prompt prefixes <!-- ui-preview-smoke --> with > (markdown blockquote); a future find-comment step searching the raw body for the bare marker may not match.
    • Fix: Emit the marker on its own line without the blockquote prefix.
  • .github/workflows/ui-preview-smoke.yml:31 -- actions: read is granted but no step reads workflow run logs; if the GITHUB_TOKEN is exfiltrated it lets the attacker fetch any prior run's logs (where masked secrets sometimes leak).
    • Fix: Drop actions: read.

Reviewers (9): correctness, security, adversarial, reliability, maintainability, project-standards, testing, agent-native, learnings.

Testing gaps:

  • No fixture-based test for the PR-template ↔ workflow-parser contract (e.g. .github/scripts/__tests__/pr-body-parsing.test.js mirroring pr-triage-classify.test.js) covering well-formed, blank, N/A, placeholder-only, and malformed-heading inputs.
  • No deterministic skip-condition guard before the expensive Vercel wait + Playwright install + agent steps.
  • No actionlint (or equivalent) validation of .github/workflows/** in CI; expression-injection patterns and dead-output regressions would be caught at PR time.
  • No prompt-injection canary fixture (a deliberately adversarial PR body) exercised in a sandbox to confirm the agent honors its constraints rather than the body's instructions.
  • No assertion that required Vercel env vars (NEXT_PUBLIC_IS_LOCAL_MODE, etc.) are present on the Preview environment — currently a manual checklist item in the PR test plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/tier-1 Trivial — auto-merge candidate once CI passes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant