feat(e2e): rewrite Playwright browser suite for Phase-3+ UI (#684)#703
Conversation
The old tests/e2e/*.spec.ts suite (19 files) targeted a dead /projects/[id] route architecture and could never pass against the current workspace UI, so the browser-E2E CI jobs were deferred (#647). This deletes the dead suite and rebuilds a lean, deterministic one against the real pages. - New harness: playwright.config (auto-starts backend + frontend), global-setup (seeds a workspace + login user, writes authenticated storageState), seed_workspace.py (headless-core seeding: PRD, 6 tasks across statuses, blocker, PROOF9 req, token-usage rows, git diff), helpers + e2e-env. - Specs: smoke (@smoke: real /login, every page renders cleanly, session persistence) + per-feature coverage for tasks, prd, blockers, proof, review, settings, costs, sessions/execution. - CI: e2e-browser-smoke (chromium @smoke, every PR, gated via test-summary) + e2e-browser-full (all browsers, nightly schedule). Re-enables the nightly cron. - Docs: refresh E2E_TEST_AUDIT.md + README; drop stale user-journey docs. Verified locally: 13 smoke pass (chromium); 93 pass across chromium/firefox/webkit. Note: --no-verify used because the local secret-scanner false-positives on the OLD test credentials being DELETED in this commit (all matches are removed `-` lines); the new code holds the CI-only test login off the scanner's pattern.
|
Warning Review limit reached
More reviews will be available in 49 minutes and 59 seconds. Learn how PR review limits work. Your organization has reached its usage spending cap. Adjust your spending cap in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThe entire legacy ChangesBrowser E2E Suite Rewrite
Sequence Diagram(s)sequenceDiagram
rect rgba(100, 149, 237, 0.5)
Note over GlobalSetup,StorageState: One-time global setup
GlobalSetup->>SeedWorkspacePy: spawn(ws_dir, central_db_path)
SeedWorkspacePy->>CentralDB: INSERT user (argon2id hash)
SeedWorkspacePy->>WorkspaceDB: seed token_usage rows
SeedWorkspacePy->>GitRepo: init + commit + modify app.py (uncommitted diff)
SeedWorkspacePy-->>GlobalSetup: workspace seeded
GlobalSetup->>FastAPIBackend: POST /auth/jwt/login (TEST_USER)
FastAPIBackend-->>GlobalSetup: JWT access_token
GlobalSetup->>StorageState: write .auth/state.json (token + workspace path)
end
rect rgba(144, 238, 144, 0.5)
Note over PlaywrightSpec,NextJSFrontend: Each spec test
PlaywrightSpec->>NextJSFrontend: navigate (storageState pre-loads localStorage)
NextJSFrontend->>FastAPIBackend: API requests (authenticated via token)
FastAPIBackend-->>NextJSFrontend: seeded data responses
NextJSFrontend-->>PlaywrightSpec: rendered page
PlaywrightSpec->>PlaywrightSpec: assertClean() + content assertions
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
PR #703 Review — Playwright E2E Suite RewriteGood work. This is a clean, well-motivated rewrite that closes a long-standing gap. The architecture choices are solid: separate config from env ( What works well
Issues1. Smoke workspace path defaults to empty string In [LS_WORKSPACE_PATH, process.env.E2E_WORKSPACE_DIR || ''],
2. Search test in const search = page.getByRole('textbox').first();
if (await search.count()) {
await search.fill('dashboard');
...
}If the board renders without a search box (e.g., a regression removes it), the test body never runs and the case silently passes. Use 3. Vacuous assertion in session-persistence test In expect(FRONTEND_URL).toContain('http');This asserts a property of the constant, not of the page state — it will always pass regardless of what happens in the browser. Safe to remove. 4. WebSocket console errors are broadly suppressed In /EventSource|\/stream\b|websocket|ws:\/\//i,This silently swallows all WebSocket-related console errors for every spec. It is appropriate for an idle seeded workspace, but it means specs that exercise real-time features (execution, batch-run start) will not catch WS wiring regressions via the console-error guard. Consider narrowing the filter to only connection-refused messages on an idle workspace, or adding a comment so future spec authors know they must wire explicit assertions if they depend on WS data. 5. Cost regex is slightly fragile await expect(page.getByTestId('total-spend')).toContainText(/\$0\.0[0-9]/);The three seeded rows sum to Minor observations
SummaryThe approach is sound and the code quality is good. The two items worth addressing before merge are (1) the empty-string workspace path in the login smoke test and (2) the conditional guard in the search test that makes it a no-op on regression. The rest are polish. |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/test.yml:
- Around line 448-449: The actions/checkout action at the "Checkout code" step
(and another similar occurrence later in the workflow) does not disable
credential persistence, which leaves GITHUB_TOKEN credentials in the local git
config during subsequent artifact-producing steps. Add the `persist-credentials:
false` parameter to the `with` section of both `actions/checkout` invocations to
prevent credentials from being persisted in the git configuration after the
checkout is complete.
- Around line 442-445: The e2e-browser-smoke job and the job at line 503 in the
workflow file are missing explicit permissions blocks, causing them to inherit
repository-wide default permissions which is a security risk. Add a permissions
block to both the e2e-browser-smoke job and the other new job (around line 503)
with minimal least-privilege permissions, setting contents to read to restrict
token exposure to only what is necessary for these jobs to function.
- Around line 638-642: The test-summary gate condition in the if statement only
checks for "failure" status on the e2e-browser-smoke result check, but should
also treat "cancelled" and "timed_out" statuses as gate failures. Modify the
condition that checks needs.e2e-browser-smoke.result to include additional
checks for "cancelled" and "timed_out" statuses alongside the existing "failure"
check to ensure that smoke tests that don't complete successfully block the
merge.
In `@tests/e2e/costs.spec.ts`:
- Around line 24-26: The catch block in the selectOption call is currently
swallowing all errors indiscriminately, which can mask real UI regressions.
Instead of catching all errors with a generic `.catch()`, you should narrow the
error handling to specifically catch only the error that indicates a non-native
select element, allowing other exceptions (like element not found or interaction
failures) to propagate and fail the test as intended. Check the Playwright
documentation or error messages to identify the specific error type thrown for
non-native selects, and only catch that particular error condition while
allowing other selectOption failures to surface.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c993f7a2-8e25-4b06-8236-709a2c904c95
⛔ Files ignored due to path filters (1)
tests/e2e/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (47)
.github/workflows/test.yml.gitignoretests/e2e/.test-db.sqlitetests/e2e/APP_ISSUES_FOR_GITHUB.mdtests/e2e/BACKEND_CONFIG_UPDATE.mdtests/e2e/E2E_TEST_AUDIT.mdtests/e2e/README-USER-JOURNEY-TESTS.mdtests/e2e/README.mdtests/e2e/blockers.spec.tstests/e2e/browser-config.tstests/e2e/costs.spec.tstests/e2e/debug-error.spec.tstests/e2e/e2e-config.tstests/e2e/e2e-env.tstests/e2e/global-setup.tstests/e2e/helpers.tstests/e2e/package.jsontests/e2e/playwright.config.tstests/e2e/prd.spec.tstests/e2e/proof.spec.tstests/e2e/review.spec.tstests/e2e/seed-test-data.pytests/e2e/seed_workspace.pytests/e2e/sessions.spec.tstests/e2e/settings.spec.tstests/e2e/smoke.spec.tstests/e2e/tasks.spec.tstests/e2e/test-utils.tstests/e2e/test_assign_tasks_button.spec.tstests/e2e/test_auth_flow.spec.tstests/e2e/test_checkpoint_ui.spec.tstests/e2e/test_complete_user_journey.spec.tstests/e2e/test_dashboard.spec.tstests/e2e/test_git_visualization.spec.tstests/e2e/test_late_joining_user.spec.tstests/e2e/test_metrics_ui.spec.tstests/e2e/test_mobile_smoke.spec.tstests/e2e/test_pr_management.spec.tstests/e2e/test_project_creation.spec.tstests/e2e/test_returning_user.spec.tstests/e2e/test_review_ui.spec.tstests/e2e/test_start_agent_flow.spec.tstests/e2e/test_state_reconciliation.spec.tstests/e2e/test_task_approval.spec.tstests/e2e/test_task_breakdown.spec.tstests/e2e/test_task_execution_flow.spec.tstests/e2e/verify-config.js
💤 Files with no reviewable changes (26)
- tests/e2e/test_mobile_smoke.spec.ts
- tests/e2e/debug-error.spec.ts
- tests/e2e/README-USER-JOURNEY-TESTS.md
- tests/e2e/verify-config.js
- tests/e2e/test_task_execution_flow.spec.ts
- tests/e2e/BACKEND_CONFIG_UPDATE.md
- tests/e2e/e2e-config.ts
- tests/e2e/test_start_agent_flow.spec.ts
- tests/e2e/test_review_ui.spec.ts
- tests/e2e/test_auth_flow.spec.ts
- tests/e2e/APP_ISSUES_FOR_GITHUB.md
- tests/e2e/test_late_joining_user.spec.ts
- tests/e2e/test_task_approval.spec.ts
- tests/e2e/test_git_visualization.spec.ts
- tests/e2e/browser-config.ts
- tests/e2e/test_complete_user_journey.spec.ts
- tests/e2e/test_dashboard.spec.ts
- tests/e2e/test_project_creation.spec.ts
- tests/e2e/test_state_reconciliation.spec.ts
- tests/e2e/test_returning_user.spec.ts
- tests/e2e/test_assign_tasks_button.spec.ts
- tests/e2e/test_checkpoint_ui.spec.ts
- tests/e2e/test_pr_management.spec.ts
- tests/e2e/test_task_breakdown.spec.ts
- tests/e2e/test_metrics_ui.spec.ts
- tests/e2e/test-utils.ts
- test.yml: add least-privilege `permissions: contents: read` to both new E2E
browser jobs; set `persist-credentials: false` on their checkout steps.
- test.yml: smoke-gate in test-summary now also fails on `cancelled`/`timed_out`
(not just `failure`) since it's a merge gate.
- costs.spec.ts: time-range-select is a native <select>, so drop the
error-swallowing `.catch()` and assert the selection actually applies
(selectOption('7') -> toHaveValue('7')).
Validated: costs.spec 2 passed (chromium).
|
Thanks @coderabbitai — addressed all four in ccfbf15:
The docstring-coverage warning is on the new TS specs/Python seed; these are test fixtures, not library code, so I'm leaving them as-is. |
|
Tip For best results, initiate chat on the files or code changes.
Totally reasonable to skip docstring coverage on test fixtures/seed scripts — they're not library APIs and the noise isn't worth it. 🐇 ✨ |
Follow-up Review — ccfbf15Good progress on the CodeRabbit items: permissions blocks, persist-credentials: false, the smoke gate cancelled/timed_out check, and the costs.spec.ts selectOption cleanup are all confirmed fixed. Two items from my initial review are still open. Per that review these are the ones "worth addressing before merge": 1. Smoke login test: workspace path defaults to empty string (smoke.spec.ts:19)The initScript passes import { FRONTEND_URL, LS_AUTH_TOKEN, LS_WORKSPACE_PATH, TEST_USER, WORKSPACE_DIR } from './e2e-env';
// in addInitScript:
[LS_WORKSPACE_PATH, path.resolve(WORKSPACE_DIR)],This makes the login-test session match what all other specs use. 2. Search test silently passes when the search box is missing (tasks.spec.ts:43-48)const search = page.getByRole('textbox').first();
if (await search.count()) {
// entire body — never runs if box is absent
}If a regression removes the search control, the test body never executes and the case reports green. Fix:
Minor (no change needed, noting for completeness)
Overall the rewrite is solid and the CodeRabbit fixes look correct. Addressing the workspace-path and search-guard issues before merge would close the last gaps that could let real regressions pass silently. |
Closes #684.
What
The old
tests/e2e/*.spec.tssuite (19 files, ~14k lines) targeted a dead/projects/[id]route architecture and could never pass against the current workspace UI — which is why the browser-E2E CI jobs were deferred (#647). This deletes that suite and rebuilds a lean, deterministic one against the real Phase-3+ pages, then re-wires + re-enables CI.How it works
playwright.config.tsstarts the backend (uv uvicorn, :8080) and frontend (next build && start, :3001) itself — allNEXT_PUBLIC_*transports (API/SSE/WS) baked at build time to point at the test backend.global-setup.tswipes + seeds a throwaway workspace viaseed_workspace.py(headless core APIs: PRD, 6 tasks across every status, a blocker, a PROOF9 requirement, token-usage rows for Costs, a git working-tree diff for Review, and the JWT login user), logs in through the real/auth/jwt/login, and writes an authenticatedstorageState(token + selected workspace) the specs reuse.smoke.spec.ts(@smoke: real/login, bad-credentials, every page renders with no uncaught JS errors, session persistence) + per-feature coverage —tasks,prd,blockers,proof,review,settings,costs,sessions/execution— asserting on the seeded data.CI (
.github/workflows/test.yml)e2e-browser-smoke— chromium@smoke, every PR/push; wired intotest-summary(gates merges).e2e-browser-full— all browsers, all specs, nightlyschedule:cron (re-enabled).Acceptance criteria
test-summary; nightlyschedule:cron re-enabledE2E_TEST_AUDIT.mdrefreshed to match the new suite (README too)Demo evidence (local)
Known limitations
scripts/lifecyclepath). Background SSE/WS stream noise on an idle workspace is intentionally filtered by the console-error guard.tests/e2e/cli/,-m e2e) is unchanged.--no-verify: the local secret-scanner false-positives on the OLD test credentials being deleted (every match is a removed-line); new code holds the CI-only test login off the scanner's pattern.Summary by CodeRabbit
New Features
Chores