Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
3c8b2b4
feat: add autonomous E2E CI failure fix workflow with skills and Play…
zdrapela Apr 14, 2026
80d18bd
fix(skills): clarify /test ? response format and job selection in sub…
zdrapela Apr 14, 2026
c682a96
fix(skills): reduce /test ? poll interval to 5s, bot responds in seconds
zdrapela Apr 14, 2026
cd6eeac
fix(skills): clarify required jobs not triggered on draft PRs
zdrapela Apr 14, 2026
215d1cd
chore: regenerate rulesync output files
zdrapela Apr 14, 2026
af8e35c
fix(skills): fix step numbering, add Playwright report input, script …
zdrapela Apr 14, 2026
bbb22ec
fix(skills): standardize healer initialization phrasing across all 3 …
zdrapela Apr 14, 2026
77c92a4
fix(skills): document handling of multiple test failures from a singl…
zdrapela Apr 14, 2026
7b3f6ba
fix(skills): clarify when full project stability check is required vs…
zdrapela Apr 14, 2026
bf75623
fix(skills): add Cursor fallback notes for Playwright healer agent
zdrapela Apr 14, 2026
138c106
chore: regenerate rulesync output files
zdrapela Apr 14, 2026
643d1c9
fix: prevent Playwright HTML report from blocking the session
zdrapela Apr 14, 2026
377b6f2
refactor: deduplicate e2e-fix-workflow, cross-reference existing rules
zdrapela Apr 14, 2026
65918b8
fix(skills): check main for existing fix before healer on release bra…
zdrapela Apr 14, 2026
0496957
fix(skills): cherry-pick from main always takes priority, resolve con…
zdrapela Apr 14, 2026
1fc3fcc
fix(skills): fall through to healer if cherry-picked fix does not res…
zdrapela Apr 14, 2026
96d669b
fix(rules): replace hardcoded branch/image tables with regex derivation
zdrapela Apr 14, 2026
699828a
fix(rules): remove unnecessary feature branch base detection
zdrapela Apr 14, 2026
c7c42c2
feat(commands): add --no-qodo flag to skip Qodo review in /fix-e2e
zdrapela Apr 14, 2026
27e8d45
refactor: strip e2e-fix-workflow rule to mapping tables and overview
zdrapela Apr 14, 2026
c6a4bc4
fix(commands): replace hardcoded image mapping with generic pattern
zdrapela Apr 14, 2026
c1440dd
fix(commands): remove inline image mapping, defer to e2e-fix-workflow…
zdrapela Apr 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 174 additions & 0 deletions .claude/commands/fix-e2e.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
---
description: >-
Autonomously investigate and fix a failing RHDH E2E CI test. Accepts a Prow
job URL or Jira ticket ID. Deploys RHDH, reproduces the failure, fixes the
test using Playwright agents, and submits a PR with Qodo review.
---
# Fix E2E CI Failure

Autonomous workflow to investigate, reproduce, fix, and submit a PR for a failing RHDH E2E test.

## Input

`$ARGUMENTS` — A failure URL or ticket, optionally followed by `--no-qodo`:
- **Prow URL**: `https://prow.ci.openshift.org/view/gs/...`
- **Playwright report URL**: `https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/.../index.html[#?testId=...]`
- **Jira ticket ID**: `RHIDP-XXXX`
- **Jira URL**: `https://redhat.atlassian.net/browse/RHIDP-XXXX`

**Options**:
- `--no-qodo` — Skip Qodo agentic review (steps 5-7 in Phase 7). Use this to avoid depleting a limited Qodo quota.

## Workflow

Execute the following phases in order. Load each skill as needed for detailed instructions. If a phase fails, report the error and stop — do not proceed blindly.

### Phase 1: Parse CI Failure

**Skill**: `e2e-parse-ci-failure`

Parse the input to extract:
- Failing test name and spec file path
- Playwright project name
- Release branch (main, release-1.9, etc.)
- Platform (OCP, AKS, EKS, GKE)
- Deployment method (Helm, Operator)
- Error type and message
- local-run.sh job name parameter

**Decision gate**: If the input cannot be parsed (invalid URL, inaccessible Jira ticket), report the error and ask the user for clarification.

**Multiple failures**: If the job has more than one failing test:
1. Present all failures in a table with test name, spec file, error type, and consistency (e.g., "failed 3/3" vs "failed 1/3")
2. Group failures that likely share a root cause (same spec file, same error pattern, same page object)
3. **Ask the user** which failure(s) to focus on
4. If failures share a root cause, fix them together in one PR. If they're unrelated, fix them in separate branches/PRs — complete one before starting the next.

### Phase 2: Setup Fix Branch

First, check the current branch:

```bash
git branch --show-current
```

- **On `main` or `release-*`**: You're on a base branch — create a feature branch using the skill:
```bash
git fetch upstream <release-branch>
git checkout -b fix/e2e-<test-description> upstream/<release-branch>
```
If a Jira ticket was provided, include the ticket ID in the branch name:
`fix/RHIDP-XXXX-e2e-<test-description>`

- **On any other branch** (e.g., `fix/e2e-*`): You're likely already on a feature branch. **Ask the user** whether to:
1. Use the current branch as-is
2. Create a new branch from the upstream release branch

### Phase 3: Deploy RHDH

**Skill**: `e2e-deploy-rhdh`

Deploy RHDH to a cluster using `e2e-tests/local-run.sh`. CLI mode requires **all three** flags (`-j`, `-r`, `-t`):

**OCP jobs** — use `-s` (deploy-only) to skip automated test execution so you can run the specific failing test manually:
```bash
cd e2e-tests
./local-run.sh -j <full-prow-job-name> -r <image-repo> -t <image-tag> -s
```

**K8s jobs (AKS, EKS, GKE)** — do **not** use `-s`. These jobs require the full execution pipeline and do not support deploy-only mode:
```bash
cd e2e-tests
./local-run.sh -j <full-prow-job-name> -r <image-repo> -t <image-tag>
```

Use the **full Prow CI job name** for `-j` (not shortened names).

Derive the image repo (`-r`) and tag (`-t`) from the release branch — see the `e2e-fix-workflow` rule for the derivation logic.

After deployment completes, set up the local test environment:
```bash
source e2e-tests/local-test-setup.sh <showcase|rbac>
```

**Decision gate**: Before attempting deployment, verify cluster connectivity (`oc whoami`). If no cluster is available, **ask the user for explicit approval** before skipping this phase — do not skip silently. If deployment fails, the `e2e-deploy-rhdh` skill has error recovery procedures. If deployment cannot be recovered after investigation, report the deployment issue and stop.

### Phase 4: Reproduce Failure

**Skill**: `e2e-reproduce-failure`

Run the specific failing test to confirm it reproduces locally. Use `--project=any-test` to avoid running the smoke test dependency — it matches any spec file without extra overhead:

```bash
cd e2e-tests
yarn playwright test <spec-file> --project=any-test --retries=0 --workers=1
```

**Decision gates**:
- **No cluster or deployment available**: If Phase 3 was skipped or no running RHDH instance exists, **ask the user for explicit approval** before skipping reproduction — do not skip silently.
- **Consistent failure**: Proceed to Phase 5
- **Flaky** (fails sometimes): Proceed to Phase 5, focus on reliability
- **Cannot reproduce** (passes every time after 10 runs): Before giving up, try running the entire CI project with `CI=true yarn playwright test --project=<ci-project> --retries=0` to simulate CI conditions (3 workers, full test suite). If that also passes, report the results and **ask the user for explicit approval** before proceeding.

### Phase 5: Diagnose and Fix

**Skill**: `e2e-diagnose-and-fix`

Analyze the failure and implement a fix:

1. **Classify the failure**: locator drift, timing, assertion mismatch, data dependency, platform-specific, deployment config
2. **Use Playwright Test Agents**: Invoke the healer agent (`@playwright-test-healer`) for automated test repair — it can debug the test, inspect the UI, generate locators, and edit the code
3. **Follow Playwright best practices**: Consult the `playwright-locators` and `ci-e2e-testing` project rules. Use semantic role-based locators (`getByRole`, `getByLabel`), auto-waiting assertions, Page Object Model, component annotations. Fetch official Playwright best practices via Context7 or https://playwright.dev/docs/best-practices if needed
4. **Cross-repo investigation**: If the issue is in deployment config, search `rhdh-operator` and `rhdh-chart` repos. Use Sourcebot or Context7 if available; otherwise fall back to `gh search code` or clone the repo locally and grep

**Decision gate**: If the analysis reveals a product bug (not a test issue), you must be **absolutely certain** before marking a test with `test.fixme()`. The Playwright healer agent must have confirmed the test is correct and the application behavior is wrong. Ask the user for confirmation before proceeding. Then:
1. File or update a Jira bug in the `RHDHBUGS` project
2. Mark the test with `// TODO:` linking to the Jira ticket, followed by `test.fixme()`:
```typescript
// TODO: https://redhat.atlassian.net/browse/RHDHBUGS-XXXX
test.fixme('Description of the product bug');
```
3. Proceed to Phase 6 with the `test.fixme()` change

### Phase 6: Verify Fix

**Skill**: `e2e-verify-fix`

Verify the fix:
1. Run the fixed test once — must pass
2. Run 5 times — must pass 5/5
3. Run code quality checks: `yarn tsc:check`, `yarn lint:check`, `yarn prettier:check`
4. Fix any lint/formatting issues

**Decision gate**: If the test still fails or is flaky, return to Phase 5 and iterate. If verification cannot be run (no cluster, environment issues), **ask the user for explicit approval** before proceeding without it.

### Phase 7: Submit PR and Handle Review

**Skill**: `e2e-submit-and-review`

1. **Resolve pre-commit hooks**: Run `yarn install` in all relevant workspaces (root, `e2e-tests/`, `.ci/`) before committing
2. **Commit**: Stage changes, commit with conventional format
3. **Push**: `git push -u origin <branch>`
4. **Create draft PR**: Always use `--draft`. Determine the GitHub username from the fork remote: `git remote get-url origin | sed 's|.*github.com[:/]||;s|/.*||'`. Then use `gh pr create --draft --repo redhat-developer/rhdh --head <username>:<branch> --base <release-branch>`
5. **Trigger Qodo review** (skip if `--no-qodo`): Comment `/agentic_review` on the PR
6. **Wait for review** (skip if `--no-qodo`): Poll for Qodo bot review (check every 15s, up to 5 minutes)
7. **Address feedback** (skip if `--no-qodo`): Apply valid suggestions, explain rejections
8. **Trigger affected CI job**: Comment `/test ?` on the PR to list available presubmit jobs, then comment `/test <job-name>` to trigger the presubmit job matching the platform and deployment method from Phase 1
9. **Monitor CI**: Watch CI checks with `gh pr checks`

### Final Report

After all phases complete, produce a summary:

```
E2E Fix Summary:
- Input: <Prow URL or Jira ticket>
- Test: <spec file> (<playwright project>)
- Branch: <fix branch> → <release branch>
- Root cause: <classification and description>
- Fix: <what was changed>
- Verification: <X/X passes>
- PR: <PR URL>
- CI Status: <PASS/PENDING/FAIL>
- Qodo Review: <status>
```
78 changes: 78 additions & 0 deletions .claude/rules/e2e-fix-workflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# E2E Test Fix Workflow

Reference knowledge for the `/fix-e2e` command. For detailed instructions, load the corresponding skill for each phase.

## Workflow Overview

The `/fix-e2e` command orchestrates a 7-phase workflow to autonomously fix E2E CI failures:

1. **Parse CI Failure** (`e2e-parse-ci-failure`) — Extract failure details from Prow URL, Playwright report, or Jira ticket
2. **Setup Fix Branch** — Create a branch from the correct upstream release branch
3. **Deploy RHDH** (`e2e-deploy-rhdh`) — Deploy RHDH to a cluster using `local-run.sh`
4. **Reproduce Failure** (`e2e-reproduce-failure`) — Confirm the failure reproduces locally
5. **Diagnose and Fix** (`e2e-diagnose-and-fix`) — Analyze root cause and implement a fix
6. **Verify Fix** (`e2e-verify-fix`) — Run the test multiple times and check code quality
7. **Submit and Review** (`e2e-submit-and-review`) — Create PR, trigger review, monitor CI

**Critical rule**: No phase may be skipped without **explicit user approval**.

## Job Name Mapping Tables

These tables are the **single source of truth** — referenced by `e2e-parse-ci-failure` and other skills.

### Job Name → Release Branch

Extract the release branch from the Prow job name using the `-rhdh-<branch>-` pattern:

```bash
BRANCH=$(echo "$JOB_NAME" | grep -oE '\-rhdh-(main|release-[0-9]+\.[0-9]+)-' | sed 's/^-rhdh-//;s/-$//')
```

### Job Name → Platform and Deployment Method

| Pattern | Platform | Method |
|---------|----------|--------|
| `*ocp*helm*` | OCP | Helm |
| `*ocp*operator*` | OCP | Operator |
| `*aks*helm*` | AKS | Helm |
| `*aks*operator*` | AKS | Operator |
| `*eks*helm*` | EKS | Helm |
| `*eks*operator*` | EKS | Operator |
| `*gke*helm*` | GKE | Helm |
| `*gke*operator*` | GKE | Operator |
| `*osd-gcp*` | OSD-GCP | Helm/Operator |

### Job Name → Playwright Projects

| Job pattern | Projects |
|-------------|----------|
| `*ocp*helm*nightly*` (not upgrade) | `showcase`, `showcase-rbac`, `showcase-runtime`, `showcase-sanity-plugins`, `showcase-localization-*` |
| `*ocp*helm*upgrade*` | `showcase-upgrade` |
| `*ocp*operator*nightly*` (not auth) | `showcase-operator`, `showcase-operator-rbac` |
| `*ocp*operator*auth-providers*` | `showcase-auth-providers` |
| `*ocp*helm*pull*` | `showcase`, `showcase-rbac` |
| `*aks*`/`*eks*`/`*gke*` helm | `showcase-k8s`, `showcase-rbac-k8s` |
| `*aks*`/`*eks*`/`*gke*` operator | `showcase-k8s`, `showcase-rbac-k8s` |

### Job Name → local-run.sh `-j` Parameter

Use the **full Prow CI job name** directly as the `-j` parameter. Do NOT use shortened names.

**OCP** (deploy-only with `-s`): `./local-run.sh -j <full-job-name> -r <repo> -t <tag> -s`
**K8s** (full execution, no `-s`): `./local-run.sh -j <full-job-name> -r <repo> -t <tag>`

### Release Branch → Image Repo and Tag

```bash
if [[ "$BRANCH" == "main" ]]; then
REPO="rhdh-community/rhdh"; TAG="next"
else
REPO="rhdh/rhdh-hub-rhel9"; TAG="${BRANCH#release-}"
fi
```

## Coding Conventions

All test code must follow the project's coding rules:
- **`playwright-locators`** — locator priority, anti-patterns, assertions, Page Objects
- **`ci-e2e-testing`** — test structure, component annotations, utility classes, CI scripts
Loading
Loading