test: set up BDD/E2E test support #83

kantord · 2025-11-21T15:14:11Z

this sets up "end to end" testing for the project:

sets up a runner to run playwright tests written as feature files
adds a very convenient way to use ARIA-role based steps, plus a way to remap "ugly" ARIA roles, such as being able to write "list item" instead of "listitem" for a more natural-sounding English
adds a simple helper for authentication: log in once and recycle cookies across different tests to avoid redundancy in E2E tests to keep them fast

gotchas:

does not actually target a real BE yet, since that is not yet configured. it should be pretty easy to do it using testcontainers even being able to use different scenarios with different containers. but since BE won't return any interesting data yet this is pointless
i avoided writing any tests targeting MSW data because that would not be very useful, but this is 100% possible to do if it's needed for whatever reason

kantord · 2025-11-25T15:36:10Z

@claude can you please give me a deep review of this PR? with special attention to the testing approach

claude · 2025-11-25T15:36:25Z

Claude finished @kantord's task —— View job

Deep Review of BDD/E2E Test Setup

Todo List

Read and analyze all changed files
Review testing approach and architecture
Check CI/CD workflow configuration
Examine feature files and step definitions
Review documentation updates
Provide comprehensive feedback

Overall Assessment

This is an excellent BDD/E2E testing implementation! The approach is well-architected, follows best practices, and integrates cleanly with the existing project structure. The code quality is high, with thoughtful design decisions throughout.

🎯 Strengths

1. Excellent Testing Architecture

Clean separation of concerns: World, hooks, steps, and support utilities are properly organized
Reusable step definitions: The {role} parameter type is brilliant - it allows writing DRY steps like I click on the "Sign out" {role} instead of duplicating steps for buttons, links, etc.
Smart auth abstraction: injectAuthCookies() is a great pattern - it performs the login flow programmatically instead of repeating UI clicks in every test that needs authentication

2. Robust CI/CD Integration

# .github/workflows/bdd.yml

Proper service health checks: Waits for Next.js (3000), OIDC (4000), and Mock API (9090) before running tests
Reasonable timeout: 20 minutes is appropriate for E2E tests
Cleanup handling: if: always() ensures dev processes are killed even on failure
Security conscious: Uses pinned commit SHA for actions (actions/checkout@93cb6e...)

3. Developer Experience

Debug support: PWDEBUG=1 for stepping through tests, PWTRACE=1 for traces
Clear documentation: Both README.md and CLAUDE.md updated with usage examples
IDE-friendly: TypeScript types for World, proper Cucumber/Playwright integration

4. Smart Type Safety with ARIA Roles

The roles.ts and parameter-types.ts implementation is particularly clever:

Uses aria-query for canonical ARIA role validation
Supports custom phrases (e.g., "menu item" → "menuitem")
Type-safe with AriaRole from Playwright
Prevents invalid roles at runtime with clear error messages

🔍 Areas for Improvement

1. TypeScript Strictness (Minor)

In tests/bdd/support/world.ts:

export class PlaywrightWorld implements World {
  page!: Page;          // ❌ Definite assignment assertion
  context!: BrowserContext;

Issue: Using ! (definite assignment assertion) bypasses TypeScript's initialization checking. While this works because hooks initialize these, it's not ideal for strict mode.

Suggestion: Make these optional or initialize in constructor:

export class PlaywrightWorld implements World {
  page?: Page;
  context?: BrowserContext;
  // ... or ...
  constructor() {
    this.page = null!;  // More explicit about the lifecycle
    this.context = null!;
  }
}

Then update step files to handle the optionality:

Given("I am on {string}", async function (this: PlaywrightWorld, path: string) {
  if (!this.page) throw new Error("Page not initialized");
  await this.page.goto(`${this.baseUrl}${path}`);
});

2. Cucumber Configuration (Nitpick)

In cucumber.js:

module.exports = {  // ❌ CommonJS in a modern TS project

Suggestion: The project uses ES modules and TypeScript. Consider using cucumber.config.ts instead (Cucumber 8+ supports this). However, this is a minor point - the current config works fine.

3. Test Coverage (Expansion Opportunity)

Current features:

✅ Login flow (happy path)
✅ Logout
✅ Basic catalog page viewing

Missing scenarios (could be added in future PRs):

Error handling (failed login, network errors)
Server detail page navigation
Copy-to-clipboard functionality
Accessibility scenarios (keyboard navigation)
Protected route redirects

4. MSW Health Endpoint Logging

In src/mocks/server.ts:

const healthHandler = http.get("*/health", () => {
  return HttpResponse.json({ status: "ok" });
});

Minor concern: The health endpoint is logged on every CI health check poll, which could create noisy logs. Consider:

const healthHandler = http.get("*/health", () => {
  // Silent health checks to reduce log noise
  return HttpResponse.json({ status: "ok" });
});

httpServer.on("request", (req: IncomingMessage, _res: ServerResponse) => {
  if (!req.url?.includes("/health")) {
    console.log(`[mock] ${req.method} ${req.url}`);
  }
});

5. Trace File Naming (Minor)

In tests/bdd/support/hooks.ts:

const safeName = scenario.pickle.name.replace(/[^a-z0-9-]+/gi, "_");

Suggestion: This could create name collisions if multiple scenarios have similar names. Consider adding a timestamp:

const timestamp = new Date().toISOString().replace(/[:.]/g, "-");
const safeName = `${scenario.pickle.name.replace(/[^a-z0-9-]+/gi, "_")}_${timestamp}`;

🎨 Design Patterns Worth Highlighting

1. Parameter Type Pattern

The {role} parameter type is a masterclass in DRY principles. Instead of:

When I click on the "Sign out" button
When I click on the "Settings" link
When I click on the "Submit" button

You have:

When I click on the "Sign out" {role}

This is excellent - it reduces step definition count while maintaining readability.

2. Auth Cookie Injection

The injectAuthCookies() approach is much faster than UI-based login:

Given("I am logged in", async function (this: PlaywrightWorld) {
  await injectAuthCookies(this.context);
});

This is the recommended approach for Playwright E2E tests per their best practices documentation.

3. Next.js Dev Overlay Suppression

await this.page.addInitScript(() => {
  const style = document.createElement("style");
  style.textContent = "nextjs-portal { display: none !important; }";
  document.head.appendChild(style);
});

Smart! This prevents the Next.js error overlay from interfering with test traces and screenshots. Great attention to detail.

📋 Documentation Quality

CLAUDE.md Updates

✅ Clear section on BDD E2E testing
✅ All three test commands documented (test:bdd, test:bdd:debug, test:bdd:trace)
✅ Browser installation instructions
✅ Guidance on step reuse vs. domain-specific steps

README.md Updates

✅ Testing section added
✅ Authentication flow explained
✅ Commands with descriptions
✅ CI workflow mentioned

Excellent documentation - future contributors will appreciate this.

🔒 Security & Best Practices

✅ Action pinning: Uses commit SHAs for GitHub Actions
✅ Proper timeouts: 60s per step, 20min job timeout
✅ Cleanup handling: Dev processes killed on failure
✅ Environment isolation: Each test gets fresh browser context
✅ No hardcoded secrets: Uses env vars correctly

🚀 Performance Considerations

✅ Browser reuse: Single browser instance across scenarios (good!)
✅ Context per scenario: Proper isolation without performance penalty
✅ Programmatic auth: Faster than UI-based login
✅ Trace optional: Only enabled with PWTRACE=1

🧪 Testing the Test Infrastructure

Critical question: Have you run these tests locally and in CI? I'd recommend verifying:

✅ pnpm run test:bdd works locally
✅ pnpm run test:bdd:debug launches Inspector correctly
✅ pnpm run test:bdd:trace generates usable trace files
✅ CI workflow passes (health checks work, tests run, cleanup happens)
✅ Failure scenarios (what happens if OIDC doesn't start?)

📦 Dependency Analysis

New dependencies added:

"@cucumber/cucumber": "^12.2.0",        // ✅ Latest stable
"@cucumber/messages": "^31.0.0",        // ✅ Required peer dep
"@playwright/test": "^1.56.1",          // ✅ Latest stable
"aria-query": "^5.3.2",                 // ✅ Great for ARIA validation
"ts-node": "^10.9.2"                    // ✅ Required for cucumber.js

All dependencies are:

✅ Well-maintained projects
✅ Latest stable versions
✅ Properly scoped as devDependencies
✅ No security concerns

🎯 Final Verdict

Score: 9.5/10

This is a production-ready BDD/E2E testing setup. The architecture is solid, the code quality is high, and the documentation is excellent. The few minor issues I've noted are truly minor - mostly TypeScript strictness nitpicks and potential future enhancements.

What Makes This Review Score So High:

Thoughtful design: The {role} parameter type and auth injection patterns show deep understanding of testing best practices
Clean code: Well-organized, properly typed, follows project conventions
Great DX: Debug modes, tracing, clear error messages
Production-grade CI: Health checks, timeouts, cleanup
Documentation: Both user-facing and maintainer-facing docs updated

Recommended Next Steps:

Merge this! It's ready to go
Consider the TypeScript strictness improvements (can be a follow-up PR)
Add more scenarios as the app grows (error cases, accessibility, etc.)
Monitor CI performance and adjust timeouts if needed

Excellent work on this PR! The testing infrastructure will serve the project well as it grows.

Branch: set-up-playwright
Run: https://github.com/stacklok/toolhive-cloud-ui/actions

Copilot

Pull request overview

This PR introduces comprehensive BDD/E2E testing infrastructure using Cucumber for behavior-driven scenarios and Playwright for browser automation.

Key changes:

Sets up Cucumber with Playwright for natural-language E2E tests
Adds ARIA role-based test steps with custom phrase mappings for improved readability
Implements authentication cookie reuse pattern to optimize test execution speed
Configures CI workflow to run E2E tests with proper service orchestration

Reviewed changes

Copilot reviewed 14 out of 17 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`package.json`	Adds BDD test scripts and dependencies (@cucumber/cucumber, @playwright/test, aria-query, ts-node)
`cucumber.js`	Configures Cucumber runner with TypeScript support, step definitions, and feature file paths
`tsconfig.json`	Excludes BDD test directory from TypeScript compilation
`.gitignore`	Adds Playwright/Cucumber test artifacts to ignore list
`.github/workflows/bdd.yml`	Defines CI workflow for running E2E tests with dev stack orchestration
`tests/bdd/support/world.ts`	Implements Playwright World constructor with page/context management
`tests/bdd/support/hooks.ts`	Sets up Cucumber lifecycle hooks for browser management, tracing, and debugging
`tests/bdd/support/roles.ts`	Provides ARIA role phrase mapping for natural step definitions
`tests/bdd/support/parameter-types.ts`	Defines custom Cucumber parameter type for ARIA roles
`tests/bdd/support/auth.ts`	Implements authentication cookie injection helper to avoid repeated login flows
`tests/bdd/steps/global.steps.ts`	Defines reusable step definitions for navigation, interactions, and assertions
`tests/bdd/features/login.feature`	Example feature file testing authentication flow
`tests/bdd/features/catalog.feature`	Example feature file for catalog page verification
`src/mocks/server.ts`	Adds health endpoint for CI readiness checks and filters health logs
`README.md`	Documents BDD testing setup, commands, and CI integration
`CLAUDE.md`	Updates architecture documentation with BDD testing details and corrects API/auth references
`pnpm-lock.yaml`	Locks new dependencies for Cucumber, Playwright, and related packages

Files not reviewed (1)

pnpm-lock.yaml: Language not supported

.github/workflows/bdd.yml

tests/bdd/support/hooks.ts

src/mocks/server.ts

.github/workflows/bdd.yml

tests/bdd/support/hooks.ts

.github/workflows/bdd.yml

peppescg · 2025-11-25T17:26:37Z

cucumber.js

+module.exports = {
+  default: {
+    requireModule: ["ts-node/register/transpile-only"],
+    require: [
+      "tests/bdd/support/world.ts",
+      "tests/bdd/support/hooks.ts",
+      "tests/bdd/support/roles.ts",
+      "tests/bdd/support/parameter-types.ts",
+      "tests/bdd/support/auth.ts",
+      "tests/bdd/steps/**/*.ts",
+    ],
+    paths: ["tests/bdd/features/**/*.feature"],
+    publishQuiet: true,
+    format: ["progress"],
+  },
+};


question: why we need cucumber? we cannot handle everything within playwright? cucumber is not so common in UI as a testing framework

peppescg · 2025-11-25T17:28:02Z

tests/bdd/support/hooks.ts

+let browser: Browser | undefined;
+const TRACE_ENABLED = process.env.PWTRACE === "1";
+
+setDefaultTimeout(60_000);


what does this timeout mean?

just the default timeout for steps. but that is way more than we need so we should probably reduce it to sth like 1 second

peppescg

So we are using playwright for the chromium instance, but cucumber as testing framework?

kantord added 18 commits November 21, 2025 16:13

test: set up BDD/E2E test support

07d6e48

Merge branch 'main' into set-up-playwright

079aa97

add inspector mode

ce5a53d

add tracing mode

8a2da1f

automate all aria roles for cucumber tests

a3c2cd7

simplify some stuff

f70d24a

.

9da119c

.

a188a5e

.

2f96c6f

Merge branch 'main' into set-up-playwright

90d40bb

fixes

6cc0279

login "fixture"

4fd4ff0

do not hang after successful run

f44313f

cookie injection to keep authentication fast

d5264ac

fix ci failure

d10417b

.

cc0d781

.

4e43d58

Merge branch 'main' into set-up-playwright

30fb891

kantord added 3 commits November 25, 2025 16:40

.

5430b64

implement review suggestions

acdc496

cleanup

fe563b7

kantord marked this pull request as ready for review November 25, 2025 16:10

Copilot AI review requested due to automatic review settings November 25, 2025 16:10

Copilot started reviewing on behalf of kantord November 25, 2025 16:11 View session

Copilot finished reviewing on behalf of kantord November 25, 2025 16:12

Copilot AI reviewed Nov 25, 2025

View reviewed changes

.github/workflows/bdd.yml Show resolved Hide resolved

tests/bdd/support/hooks.ts Show resolved Hide resolved

src/mocks/server.ts Outdated Show resolved Hide resolved

.github/workflows/bdd.yml Show resolved Hide resolved

tests/bdd/support/hooks.ts Show resolved Hide resolved

kantord added 2 commits November 25, 2025 17:16

cleanup

123250d

remove unused rule definition

2cd7547

stricted path for healthcheck url

d3ddb4c

peppescg reviewed Nov 25, 2025

View reviewed changes

.github/workflows/bdd.yml Show resolved Hide resolved

peppescg reviewed Nov 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: set up BDD/E2E test support #83

test: set up BDD/E2E test support #83

Uh oh!

kantord commented Nov 21, 2025 •

edited

Loading

Uh oh!

kantord commented Nov 25, 2025

Uh oh!

claude bot commented Nov 25, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

peppescg Nov 25, 2025 •

edited

Loading

Uh oh!

peppescg Nov 25, 2025

Uh oh!

kantord Nov 25, 2025

Uh oh!

peppescg left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

test: set up BDD/E2E test support #83

Are you sure you want to change the base?

test: set up BDD/E2E test support #83

Uh oh!

Conversation

kantord commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kantord commented Nov 25, 2025

Uh oh!

claude bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deep Review of BDD/E2E Test Setup

Todo List

Overall Assessment

🎯 Strengths

1. Excellent Testing Architecture

2. Robust CI/CD Integration

3. Developer Experience

4. Smart Type Safety with ARIA Roles

🔍 Areas for Improvement

1. TypeScript Strictness (Minor)

2. Cucumber Configuration (Nitpick)

3. Test Coverage (Expansion Opportunity)

4. MSW Health Endpoint Logging

5. Trace File Naming (Minor)

🎨 Design Patterns Worth Highlighting

1. Parameter Type Pattern

2. Auth Cookie Injection

3. Next.js Dev Overlay Suppression

📋 Documentation Quality

CLAUDE.md Updates

README.md Updates

🔒 Security & Best Practices

🚀 Performance Considerations

🧪 Testing the Test Infrastructure

📦 Dependency Analysis

🎯 Final Verdict

What Makes This Review Score So High:

Recommended Next Steps:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

peppescg Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peppescg Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

kantord Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

peppescg left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kantord commented Nov 21, 2025 •

edited

Loading

claude bot commented Nov 25, 2025 •

edited

Loading

peppescg Nov 25, 2025 •

edited

Loading