Skip to content

Conversation

@kantord
Copy link
Member

@kantord kantord commented Nov 21, 2025

this sets up "end to end" testing for the project:

  • sets up a runner to run playwright tests written as feature files
  • adds a very convenient way to use ARIA-role based steps, plus a way to remap "ugly" ARIA roles, such as being able to write "list item" instead of "listitem" for a more natural-sounding English
  • adds a simple helper for authentication: log in once and recycle cookies across different tests to avoid redundancy in E2E tests to keep them fast

gotchas:

  • does not actually target a real BE yet, since that is not yet configured. it should be pretty easy to do it using testcontainers even being able to use different scenarios with different containers. but since BE won't return any interesting data yet this is pointless
  • i avoided writing any tests targeting MSW data because that would not be very useful, but this is 100% possible to do if it's needed for whatever reason

@kantord
Copy link
Member Author

kantord commented Nov 25, 2025

@claude can you please give me a deep review of this PR? with special attention to the testing approach

@claude
Copy link

claude bot commented Nov 25, 2025

Claude finished @kantord's task —— View job


Deep Review of BDD/E2E Test Setup

Todo List

  • Read and analyze all changed files
  • Review testing approach and architecture
  • Check CI/CD workflow configuration
  • Examine feature files and step definitions
  • Review documentation updates
  • Provide comprehensive feedback

Overall Assessment

This is an excellent BDD/E2E testing implementation! The approach is well-architected, follows best practices, and integrates cleanly with the existing project structure. The code quality is high, with thoughtful design decisions throughout.

🎯 Strengths

1. Excellent Testing Architecture

  • Clean separation of concerns: World, hooks, steps, and support utilities are properly organized
  • Reusable step definitions: The {role} parameter type is brilliant - it allows writing DRY steps like I click on the "Sign out" {role} instead of duplicating steps for buttons, links, etc.
  • Smart auth abstraction: injectAuthCookies() is a great pattern - it performs the login flow programmatically instead of repeating UI clicks in every test that needs authentication

2. Robust CI/CD Integration

# .github/workflows/bdd.yml
  • Proper service health checks: Waits for Next.js (3000), OIDC (4000), and Mock API (9090) before running tests
  • Reasonable timeout: 20 minutes is appropriate for E2E tests
  • Cleanup handling: if: always() ensures dev processes are killed even on failure
  • Security conscious: Uses pinned commit SHA for actions (actions/checkout@93cb6e...)

3. Developer Experience

  • Debug support: PWDEBUG=1 for stepping through tests, PWTRACE=1 for traces
  • Clear documentation: Both README.md and CLAUDE.md updated with usage examples
  • IDE-friendly: TypeScript types for World, proper Cucumber/Playwright integration

4. Smart Type Safety with ARIA Roles

The roles.ts and parameter-types.ts implementation is particularly clever:

  • Uses aria-query for canonical ARIA role validation
  • Supports custom phrases (e.g., "menu item" → "menuitem")
  • Type-safe with AriaRole from Playwright
  • Prevents invalid roles at runtime with clear error messages

🔍 Areas for Improvement

1. TypeScript Strictness (Minor)

In tests/bdd/support/world.ts:

export class PlaywrightWorld implements World {
  page!: Page;          // ❌ Definite assignment assertion
  context!: BrowserContext;

Issue: Using ! (definite assignment assertion) bypasses TypeScript's initialization checking. While this works because hooks initialize these, it's not ideal for strict mode.

Suggestion: Make these optional or initialize in constructor:

export class PlaywrightWorld implements World {
  page?: Page;
  context?: BrowserContext;
  // ... or ...
  constructor() {
    this.page = null!;  // More explicit about the lifecycle
    this.context = null!;
  }
}

Then update step files to handle the optionality:

Given("I am on {string}", async function (this: PlaywrightWorld, path: string) {
  if (!this.page) throw new Error("Page not initialized");
  await this.page.goto(`${this.baseUrl}${path}`);
});

2. Cucumber Configuration (Nitpick)

In cucumber.js:

module.exports = {  // ❌ CommonJS in a modern TS project

Suggestion: The project uses ES modules and TypeScript. Consider using cucumber.config.ts instead (Cucumber 8+ supports this). However, this is a minor point - the current config works fine.

3. Test Coverage (Expansion Opportunity)

Current features:

  • ✅ Login flow (happy path)
  • ✅ Logout
  • ✅ Basic catalog page viewing

Missing scenarios (could be added in future PRs):

  • Error handling (failed login, network errors)
  • Server detail page navigation
  • Copy-to-clipboard functionality
  • Accessibility scenarios (keyboard navigation)
  • Protected route redirects

4. MSW Health Endpoint Logging

In src/mocks/server.ts:

const healthHandler = http.get("*/health", () => {
  return HttpResponse.json({ status: "ok" });
});

Minor concern: The health endpoint is logged on every CI health check poll, which could create noisy logs. Consider:

const healthHandler = http.get("*/health", () => {
  // Silent health checks to reduce log noise
  return HttpResponse.json({ status: "ok" });
});

httpServer.on("request", (req: IncomingMessage, _res: ServerResponse) => {
  if (!req.url?.includes("/health")) {
    console.log(`[mock] ${req.method} ${req.url}`);
  }
});

5. Trace File Naming (Minor)

In tests/bdd/support/hooks.ts:

const safeName = scenario.pickle.name.replace(/[^a-z0-9-]+/gi, "_");

Suggestion: This could create name collisions if multiple scenarios have similar names. Consider adding a timestamp:

const timestamp = new Date().toISOString().replace(/[:.]/g, "-");
const safeName = `${scenario.pickle.name.replace(/[^a-z0-9-]+/gi, "_")}_${timestamp}`;

🎨 Design Patterns Worth Highlighting

1. Parameter Type Pattern

The {role} parameter type is a masterclass in DRY principles. Instead of:

When I click on the "Sign out" button
When I click on the "Settings" link
When I click on the "Submit" button

You have:

When I click on the "Sign out" {role}

This is excellent - it reduces step definition count while maintaining readability.

2. Auth Cookie Injection

The injectAuthCookies() approach is much faster than UI-based login:

Given("I am logged in", async function (this: PlaywrightWorld) {
  await injectAuthCookies(this.context);
});

This is the recommended approach for Playwright E2E tests per their best practices documentation.

3. Next.js Dev Overlay Suppression

await this.page.addInitScript(() => {
  const style = document.createElement("style");
  style.textContent = "nextjs-portal { display: none !important; }";
  document.head.appendChild(style);
});

Smart! This prevents the Next.js error overlay from interfering with test traces and screenshots. Great attention to detail.

📋 Documentation Quality

CLAUDE.md Updates

✅ Clear section on BDD E2E testing
✅ All three test commands documented (test:bdd, test:bdd:debug, test:bdd:trace)
✅ Browser installation instructions
✅ Guidance on step reuse vs. domain-specific steps

README.md Updates

✅ Testing section added
✅ Authentication flow explained
✅ Commands with descriptions
✅ CI workflow mentioned

Excellent documentation - future contributors will appreciate this.

🔒 Security & Best Practices

Action pinning: Uses commit SHAs for GitHub Actions
Proper timeouts: 60s per step, 20min job timeout
Cleanup handling: Dev processes killed on failure
Environment isolation: Each test gets fresh browser context
No hardcoded secrets: Uses env vars correctly

🚀 Performance Considerations

Browser reuse: Single browser instance across scenarios (good!)
Context per scenario: Proper isolation without performance penalty
Programmatic auth: Faster than UI-based login
Trace optional: Only enabled with PWTRACE=1

🧪 Testing the Test Infrastructure

Critical question: Have you run these tests locally and in CI? I'd recommend verifying:

  1. pnpm run test:bdd works locally
  2. pnpm run test:bdd:debug launches Inspector correctly
  3. pnpm run test:bdd:trace generates usable trace files
  4. ✅ CI workflow passes (health checks work, tests run, cleanup happens)
  5. ✅ Failure scenarios (what happens if OIDC doesn't start?)

📦 Dependency Analysis

New dependencies added:

"@cucumber/cucumber": "^12.2.0",        // ✅ Latest stable
"@cucumber/messages": "^31.0.0",        // ✅ Required peer dep
"@playwright/test": "^1.56.1",          // ✅ Latest stable
"aria-query": "^5.3.2",                 // ✅ Great for ARIA validation
"ts-node": "^10.9.2"                    // ✅ Required for cucumber.js

All dependencies are:

  • ✅ Well-maintained projects
  • ✅ Latest stable versions
  • ✅ Properly scoped as devDependencies
  • ✅ No security concerns

🎯 Final Verdict

Score: 9.5/10

This is a production-ready BDD/E2E testing setup. The architecture is solid, the code quality is high, and the documentation is excellent. The few minor issues I've noted are truly minor - mostly TypeScript strictness nitpicks and potential future enhancements.

What Makes This Review Score So High:

  1. Thoughtful design: The {role} parameter type and auth injection patterns show deep understanding of testing best practices
  2. Clean code: Well-organized, properly typed, follows project conventions
  3. Great DX: Debug modes, tracing, clear error messages
  4. Production-grade CI: Health checks, timeouts, cleanup
  5. Documentation: Both user-facing and maintainer-facing docs updated

Recommended Next Steps:

  1. Merge this! It's ready to go
  2. Consider the TypeScript strictness improvements (can be a follow-up PR)
  3. Add more scenarios as the app grows (error cases, accessibility, etc.)
  4. Monitor CI performance and adjust timeouts if needed

Excellent work on this PR! The testing infrastructure will serve the project well as it grows.


Branch: set-up-playwright
Run: https://github.com/stacklok/toolhive-cloud-ui/actions

@kantord kantord marked this pull request as ready for review November 25, 2025 16:10
Copilot AI review requested due to automatic review settings November 25, 2025 16:10
Copilot finished reviewing on behalf of kantord November 25, 2025 16:12
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces comprehensive BDD/E2E testing infrastructure using Cucumber for behavior-driven scenarios and Playwright for browser automation.

Key changes:

  • Sets up Cucumber with Playwright for natural-language E2E tests
  • Adds ARIA role-based test steps with custom phrase mappings for improved readability
  • Implements authentication cookie reuse pattern to optimize test execution speed
  • Configures CI workflow to run E2E tests with proper service orchestration

Reviewed changes

Copilot reviewed 14 out of 17 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
package.json Adds BDD test scripts and dependencies (@cucumber/cucumber, @playwright/test, aria-query, ts-node)
cucumber.js Configures Cucumber runner with TypeScript support, step definitions, and feature file paths
tsconfig.json Excludes BDD test directory from TypeScript compilation
.gitignore Adds Playwright/Cucumber test artifacts to ignore list
.github/workflows/bdd.yml Defines CI workflow for running E2E tests with dev stack orchestration
tests/bdd/support/world.ts Implements Playwright World constructor with page/context management
tests/bdd/support/hooks.ts Sets up Cucumber lifecycle hooks for browser management, tracing, and debugging
tests/bdd/support/roles.ts Provides ARIA role phrase mapping for natural step definitions
tests/bdd/support/parameter-types.ts Defines custom Cucumber parameter type for ARIA roles
tests/bdd/support/auth.ts Implements authentication cookie injection helper to avoid repeated login flows
tests/bdd/steps/global.steps.ts Defines reusable step definitions for navigation, interactions, and assertions
tests/bdd/features/login.feature Example feature file testing authentication flow
tests/bdd/features/catalog.feature Example feature file for catalog page verification
src/mocks/server.ts Adds health endpoint for CI readiness checks and filters health logs
README.md Documents BDD testing setup, commands, and CI integration
CLAUDE.md Updates architecture documentation with BDD testing details and corrects API/auth references
pnpm-lock.yaml Locks new dependencies for Cucumber, Playwright, and related packages
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

Comment on lines +1 to +16
module.exports = {
default: {
requireModule: ["ts-node/register/transpile-only"],
require: [
"tests/bdd/support/world.ts",
"tests/bdd/support/hooks.ts",
"tests/bdd/support/roles.ts",
"tests/bdd/support/parameter-types.ts",
"tests/bdd/support/auth.ts",
"tests/bdd/steps/**/*.ts",
],
paths: ["tests/bdd/features/**/*.feature"],
publishQuiet: true,
format: ["progress"],
},
};
Copy link
Collaborator

@peppescg peppescg Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why we need cucumber? we cannot handle everything within playwright? cucumber is not so common in UI as a testing framework

let browser: Browser | undefined;
const TRACE_ENABLED = process.env.PWTRACE === "1";

setDefaultTimeout(60_000);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this timeout mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just the default timeout for steps. but that is way more than we need so we should probably reduce it to sth like 1 second

Copy link
Collaborator

@peppescg peppescg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are using playwright for the chromium instance, but cucumber as testing framework?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants