Skip to content

revert: remove ui-preview-smoke agent workflow#2247

Merged
teeohhem merged 1 commit into
mainfrom
teeohhem/revert-ui-preview-smoke
May 8, 2026
Merged

revert: remove ui-preview-smoke agent workflow#2247
teeohhem merged 1 commit into
mainfrom
teeohhem/revert-ui-preview-smoke

Conversation

@teeohhem
Copy link
Copy Markdown
Contributor

@teeohhem teeohhem commented May 8, 2026

Summary

Backs out the agent-driven UI smoke workflow added in #2238 (and patched in #2240). Keeps the PR template's "How to test on Vercel preview" section — it's still useful guidance for human reviewers.

Why

The workflow was meant to give a low-friction "did the UI break?" signal on PRs by having an agent (claude-code-action + Playwright MCP) execute a PR-author-written test plan against the Vercel preview. In practice it overlapped heavily with the existing Playwright e2e suite without the durability:

  • Author still had to write the test plan. **Preview routes:** + numbered **Steps:** is most of the cognitive work of writing a real Playwright spec.
  • Result was ephemeral. An LLM-interpreted, non-deterministic per-run check produced no test artifact. A Playwright spec asserting the same behavior would run on every future PR for years and catch regressions in code paths the original author never thought about.
  • Per-run cost was ~$0.85 + a Vercel deploy + ~5 min of CI. Flaky by construction — LLM judgement varies run-to-run.
  • Maintenance was a tax, not an investment. PR ci(ui-preview-smoke): wire MCP via .mcp.json and post via workflow steps #2242 went through 5 review rounds resolving security and reliability findings (heredoc injection, mutable-tag pins, sed silent pass-through, plan-check throw paths, Vercel timeout fallbacks, JSON-array bypass, marker placement, leading-newline edge cases, dual-poster collision...). The maximalist resolution was +543 lines; the disciplined /ce-resolve-pr-feedback resolution was +79; the latest deep-review still surfaced new P1 findings (JS injection on Number('${{ inputs.pr_number }}'), third-party tag pinning, missing sha: input on workflow_dispatch).

The marginal value-add over a deterministic test was small enough that the maintenance ratchet exceeded it.

Better path forward

  • Lean into the existing Playwright stack (make e2e / make dev-e2e, tests/e2e/page-objects/). For PRs like fix(table-chart): wrap mode now breaks long URLs/IDs instead of overflowing into adjacent columns #2234 (table-chart text overflow), a Playwright spec asserting "wrap mode keeps long URLs within their column" would have caught the bug AND prevented future regressions across every chart PR.
  • For changes that genuinely benefit from preview-deploy testing, a deterministic screenshot-based workflow (e.g. Argos / Percy, or a ~50-line Playwright run against ${{ vercel.outputs.url }}) is cheaper and more durable than the LLM-driven version.

How to test on Vercel preview

N/A — CI workflow removal.

References

Backs out the agent-driven UI smoke workflow added in #2238 (and
patched in #2240). Keeps the PR template's "How to test on Vercel
preview" section — it's still useful guidance for human reviewers.

Why:

The workflow was supposed to give us a low-friction "did the UI
break?" signal on PRs by having an agent (claude-code-action +
Playwright MCP) execute a PR-author-written test plan against the
Vercel preview. In practice it overlapped heavily with the existing
Playwright e2e suite without the durability:

  - Author still had to write `**Preview routes:**` + numbered
    `**Steps:**` — most of the cognitive work of writing a real test.
  - Result was an LLM-interpreted, non-deterministic check producing
    no test artifact. A Playwright spec asserting the same behavior
    would run on every future PR for years.
  - Per-run cost was ~$0.85 of agent time + a Vercel deploy + ~5 min
    of CI; flaky by construction (LLM judgement varies run-to-run).
  - Maintenance cost was real and ongoing: PR #2242 went through
    5 review rounds resolving security and reliability findings,
    cumulatively producing 543 added lines (later trimmed to +79
    via /ce-resolve-pr-feedback), and the latest deep-review still
    surfaced new P1 findings (JS injection via Number('${{ inputs }}'),
    mutable-tag pinning of third-party actions, missing sha: input
    on workflow_dispatch).

Better path forward: lean into the existing Playwright stack
(`make e2e` / `make dev-e2e`). For changes that warrant
preview-deploy testing specifically, a deterministic screenshot-based
workflow (~50 lines) is a cheaper and more durable replacement than
the LLM-driven version.
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 8, 2026

⚠️ No Changeset found

Latest commit: 309d7b2

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented May 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
hyperdx-oss Ignored Ignored May 8, 2026 5:37pm

Request Review

@github-actions github-actions Bot added the review/tier-1 Trivial — auto-merge candidate once CI passes label May 8, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

🟢 Tier 1 — Trivial

Docs, images, lock files, or a dependency bump. No functional code changes detected.

Why this tier:

  • All files are docs / images / lock files

Review process: Auto-merge once CI passes. No human review required.
SLA: Resolves automatically.

Stats
  • Production files changed: 0
  • Production lines changed: 0
  • Branch: teeohhem/revert-ui-preview-smoke
  • Author: teeohhem

To override this classification, remove the review/tier-1 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

PR Review

  • ⚠️ .github/pull_request_template.md lines 23-36 still reference the deleted agent and workflow path (.github/workflows/ui-preview-smoke.yml) — stale after this revert → Update the HTML comment to drop the agent-consumption language and reframe the section as human-reviewer guidance only (matches the PR body's stated intent of keeping the section).

Otherwise the revert is clean: single-file deletion, no other references to the workflow remain in the repo.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Deep Review

This PR is a single-file revert: it deletes .github/workflows/ui-preview-smoke.yml (163 lines). The commit message states the PR template section was deliberately kept as human guidance, and the deleted workflow is independent of the durable Playwright suite (packages/app/tests/e2e/, driven by .github/workflows/e2e-tests.yml), so no test artifacts are orphaned. One residual cleanup item below.

✅ No critical issues found.

🟡 P2 -- recommended

  • .github/pull_request_template.md:24 -- The retained "How to test on Vercel preview" section still attributes itself to a workflow this PR deletes (HTML comment says it is "consumed by the UI preview smoke-test agent (.github/workflows/ui-preview-smoke.yml)" and that "the agent executes these steps verbatim"), so every future PR will ship with a dangling reference even though the commit message states the section is intentionally kept for human reviewers.
    • Fix: Rewrite the HTML comment block (lines 23-36) to describe the section as manual Vercel-preview testing guidance — drop the "consumed by the … agent" wording, the .github/workflows/ui-preview-smoke.yml path, and the "executes these steps verbatim" sentence.
    • correctness, maintainability, testing, project-standards

Reviewers (4): correctness, maintainability, testing, project-standards.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

E2E Test Results

All tests passed • 166 passed • 3 skipped • 1196s

Status Count
✅ Passed 166
❌ Failed 0
⚠️ Flaky 3
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

@teeohhem teeohhem merged commit 02171a3 into main May 8, 2026
19 checks passed
@teeohhem teeohhem deleted the teeohhem/revert-ui-preview-smoke branch May 8, 2026 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/tier-1 Trivial — auto-merge candidate once CI passes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant