Add automated Claude failure analysis for Prow CI #2034

kaovilai · 2025-11-22T07:00:00Z

Integrates Claude Code via Vertex AI to automatically analyze E2E test
failures in Prow CI and generate comprehensive failure reports.

Changes:

build/ci-Dockerfile: Install Node.js 20.x and Claude CLI with multi-arch support
tests/e2e/scripts/analyze_failures.sh: New analysis script with:
- Vertex AI integration via Claude Code CLI headless mode (--print flag)
- Comprehensive secret redaction (AWS keys, GCP keys, tokens, passwords)
- Graceful degradation when credentials unavailable
- 10-minute timeout with partial analysis on timeout
Makefile: Set GOOGLE_APPLICATION_CREDENTIALS and ANTHROPIC_VERTEX_PROJECT_ID
from vault files before running analysis script
docs/design/claude-prow-failure-analysis_design.md: Complete design document
tests/e2e/claude_test_failure_test.go: Simple failing test for verification
tests/e2e/backup_restore_suite_test.go: Realistic failing test that triggers
must-gather collection

The analysis runs post-suite and does not affect test execution or results.
Output is saved to ${ARTIFACT_DIR}/claude-failure-analysis.md with automatic
secret redaction to prevent credential leakage.

Requires Vertex AI credentials in vault collection files:

/var/run/oadp-credentials/gcp-claude-code-credentials
/var/run/oadp-credentials/gcp-claude-code-project-id

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Why the changes were made

How to test the changes made

openshift-ci · 2025-11-22T07:00:04Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai · 2025-11-22T07:00:05Z

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Excluded labels (none allowed) (1)

do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2025-11-22T07:00:12Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kaovilai

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [kaovilai]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kaovilai · 2025-11-23T20:22:23Z

/test all

kaovilai · 2025-11-24T16:14:19Z

/test all

kaovilai · 2025-11-24T18:40:43Z

new cred
/test all

Integrates Claude Code via Vertex AI to automatically analyze E2E test failures in Prow CI and generate comprehensive failure reports. Changes: - build/ci-Dockerfile: Install Node.js 20.x and Claude CLI with multi-arch support - tests/e2e/scripts/analyze_failures.sh: New analysis script with: - Vertex AI integration via Claude Code CLI headless mode (--print flag) - Comprehensive secret redaction (AWS keys, GCP keys, tokens, passwords) - Graceful degradation when credentials unavailable - 10-minute timeout with partial analysis on timeout - Makefile: Set GOOGLE_APPLICATION_CREDENTIALS and ANTHROPIC_VERTEX_PROJECT_ID from vault files before running analysis script - docs/design/claude-prow-failure-analysis_design.md: Complete design document - tests/e2e/claude_test_failure_test.go: Simple failing test for verification - tests/e2e/backup_restore_suite_test.go: Realistic failing test that triggers must-gather collection The analysis runs post-suite and does not affect test execution or results. Output is saved to ${ARTIFACT_DIR}/claude-failure-analysis.md with automatic secret redaction to prevent credential leakage. Requires Vertex AI credentials in vault collection files: - /var/run/oadp-credentials/gcp-claude-code-credentials - /var/run/oadp-credentials/gcp-claude-code-project-id 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

kaovilai · 2025-11-24T22:42:37Z

/test all

…mentation This commit introduces a new file, CLAUDE.md, which provides comprehensive guidance for developers working with the OADP project. It includes sections on project overview, prerequisites, essential development commands, testing commands, cloud authentication deployment, and important environment variables. Additionally, the existing failure analysis documentation in docs/design/claude-prow-failure-analysis_design.md has been updated to reflect changes in the analysis process, emphasizing the use of JUnit reports and per-test logs instead of build logs, which are not available during analysis. The analysis script has also been modified to focus on available artifacts and improve clarity in the analysis tasks. Changes: - New file: CLAUDE.md with detailed developer instructions - Updated failure analysis design document to clarify artifact usage and analysis process - Modifications to the analyze_failures.sh script to remove references to build-log.txt and enhance artifact handling This update aims to streamline the development workflow and improve the efficiency of failure analysis in Prow CI. Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

kaovilai · 2025-11-25T16:32:38Z

/test all

This commit updates the analyze_failures.sh script to include the 'set -o pipefail' option, ensuring that the first non-zero exit code in a pipeline is returned. Additionally, the documentation within the script has been revised to clarify the known flake patterns, now referencing the source file that contains detailed information about flake patterns and error ignore patterns. This aims to streamline the failure analysis process by providing clearer guidance on diagnosing test failures. Changes: - Added 'set -o pipefail' to improve error handling in pipelines. - Updated documentation to reference the source file for known flake patterns and error ignore patterns. Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

This commit improves the analyze_failures.sh script by adding support for preprocessing large log files using a subagent pattern. It introduces functions to extract relevant error messages and context from large logs, creating a summary file for quick access during analysis. Additionally, the script now checks for the availability of the Claude CLI before execution and captures exit codes properly to ensure accurate error handling. Changes: - Added functions for extracting errors from large log files. - Implemented preprocessing of large artifacts to create a summary of extracted errors. - Enhanced documentation to clarify the analysis process and artifact usage. These updates aim to streamline the failure analysis workflow and improve the accuracy of insights generated from log files. Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

kaovilai · 2025-11-25T18:41:33Z

/test all

…e analysis This commit updates the `.claude/config.json` file to remove specific path permissions that are now handled at runtime using the `--allowedTools` flag. The documentation in `docs/design/claude-prow-failure-analysis_design.md` has been expanded to clarify the new permission model, emphasizing the use of runtime permissions to bypass sandbox restrictions. Additionally, the `analyze_failures.sh` script has been updated to utilize the `--allowedTools` flag for explicit file access during analysis. Changes: - Removed specific path permissions from `.claude/config.json`. - Enhanced documentation to explain the runtime permissions model. - Updated `analyze_failures.sh` to use `--allowedTools` for file access. These updates aim to improve the clarity and functionality of the failure analysis process in Prow CI. Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

kaovilai · 2025-11-25T21:22:59Z

/test all

kaovilai · 2025-11-25T21:24:11Z

/retest

ai-retester: Both failures are “intentional” – they’re deliberately triggered by the test author to make sure the Claude‑based failure‑analysis plumbing works.

Claude Analysis Test Failure – The test asserts true should be false. That is exactly what the test code is checking, so it fails as expected.
MySQL CSI Claude Test (INTENTIONAL FAILURE) – In this test the code raises an error (CLAUDE TEST: This is an intentional test failure…) after the backup/restore flow, which is meant to force a test failure so the Claude analysis is invoked. Hence both errors are intentional, not a bug in the deployment itself.

comment for /pull/2034

- Add --allowedTools flag to grant Read access to artifact paths - Fix argument order: --allowedTools must come before --print - Add AVAILABLE TOOLS section to prompts so Claude knows its constraints - Simplify .claude/config.json (path permissions handled at runtime) - Update design doc with runtime permissions approach The Claude Code CLI sandbox restricts filesystem access to the current working directory. Since artifacts are at /logs/artifacts/ (outside CWD), we use --allowedTools to explicitly grant Read permissions at runtime.

…n and scripts - Replace `--allowedTools` with `--add-dir` for granting directory access in the analysis script. - Enhance documentation to clarify the use of `--add-dir` and `--allowedTools` for bypassing sandbox CWD restrictions. - Ensure consistent usage of CLI flags across the `analyze_failures.sh` script and design documentation. These changes improve clarity and functionality in the failure analysis process, ensuring proper access to necessary directories during runtime. Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

kaovilai · 2025-11-28T14:17:31Z

/test all

kaovilai · 2025-11-29T09:56:29Z

Current output

Perfect! Now I have all the information I need. Let me generate the comprehensive failure analysis report:

OADP E2E Test Failure Analysis

Generated by Claude via Vertex AI on 2025-11-28 16:02 UTC

Executive Summary

Total Tests: 51

Failed Tests: 2

Known Flakes: 0

Critical Issues: 0 (both failures are intentional test cases)

Environmental Issues: 0

Analysis Result: All test failures are intentional test cases designed to verify the Claude AI failure analysis system. The OADP backup/restore functionality is working correctly - backups completed successfully, restores completed successfully, and cluster health is good.

Failed Tests Analysis

1. MySQL CSI Claude Test (INTENTIONAL FAILURE) [FLAKE]

Root Cause: Intentional test failure injected via custom verification function to validate Claude analysis pipeline

Evidence:
junit_report.xml: "CLAUDE TEST: This is an intentional test failure to verify Claude analysis script"
Test code (/go/src/github.com/openshift/oadp-operator/tests/e2e/backup_restore_suite_test.go:421):
  return fmt.Errorf("CLAUDE TEST: This is an intentional test failure to verify Claude analysis script")
must-gather: Both backup and restore completed successfully (Phase: Completed)
  - Backup: mysql-csi-claude-test-405f0736-cc6f-11f0-a9e6-0a580a81a1fb (Completed)
  - Restore: mysql-csi-claude-test-405f073b-cc6f-11f0-a9e6-0a580a81a1fb (Completed)
Diagnosis: This test intentionally injects a failure in the PostRestoreVerify custom function at line 421 of backup_restore_suite_test.go. The actual backup and restore operations completed successfully:

Backup Phase (15:32:04 - 15:33:11): Successfully created CSI snapshot (snap-0df4d9ac3e001128b) of 1GB MySQL PVC

Restore Phase (15:34:00 - 15:34:03): Successfully restored from snapshot, created PVC and pods

Failure Injection (15:34:11): Test deliberately returns error from custom verification function

The test demonstrates that OADP functionality is working correctly - the failure is purely synthetic.

Likely Cause: Test scaffolding for AI analysis validation (not a real failure)

Recommended Actions:

Remove this test after validating the Claude analysis system is working

This test should not be included in production CI runs

Consider moving to a separate test suite for tooling validation

Related Issues: N/A - This is a deliberate test case, not a bug

2. Claude Analysis Test Failure [FLAKE]

Root Cause: Intentional assertion failure to verify Claude analysis script functionality

Evidence:
junit_report.xml: "Expected <bool>: true to be false"
Test code (/go/src/github.com/openshift/oadp-operator/tests/e2e/claude_test_failure_test.go:13):
  gomega.Expect(true).To(gomega.BeFalse(), "This is an intentional failure to test Claude analysis script")
Diagnosis: This is a standalone test case (claude_test_failure_test.go) that deliberately invokes an impossible assertion (expecting true to be false). The test file includes clear documentation:
// This test is intentionally designed to fail for testing Claude failure analysis
// It should be removed after verifying the Claude analysis script works correctly
This test executes instantly (0.000228615s) and has no actual OADP operations - it's purely a validation test for the analysis pipeline.

Likely Cause: Tooling validation test (not a real failure)

Recommended Actions:

Delete /go/src/github.com/openshift/oadp-operator/tests/e2e/claude_test_failure_test.go after Claude analysis validation is complete

Remove the "MySQL CSI Claude Test" entry from backup_restore_suite_test.go (lines 412-423)

Update CI configuration to exclude these tests from production runs

Related Issues: N/A - This is a deliberate test case designed to be temporary

Known Flakes Detected

✗ VolumeSnapshotBeingCreated race condition (not detected)
✗ AWS rate limiting (not detected)

No known flake patterns were found in the logs.

Cluster Health Summary

From must-gather analysis:

OADP Components:

Velero deployment: Running successfully (pod: velero-bc95449f7-8mmdz)

OADP Controller: Running (pod: openshift-adp-controller-manager-d899f68bc-vjznh)

Backup Storage Location: ts-velero-test-1 - Available ✅

Volume Snapshot Location: None configured (expected for CSI-only backups)

OADP Version: 99.0.0 (development build)

Cluster Resources:

Platform: OpenShift 4.20.0-0.nightly-2025-11-27-212931 on AWS

Architecture: linux/amd64

CSI drivers: ebs.csi.aws.com available

Storage classes: Available (gp2-csi storage class in use)

Backup/Restore Operations:
Successfully completed 2 backup/restore cycles during test run:

mysql-csi-e2e-9844e6ae: Backup Completed (15:27:17-15:28:12), Restore Completed (15:29:03-15:29:06)

mysql-csi-claude-test: Backup Completed (15:32:04-15:33:11), Restore Completed (15:34:00-15:34:03)

All CSI snapshots created successfully:

snap-0bff8df5fc977aa6d (1GB)

snap-0df4d9ac3e001128b (1GB)

DataProtectionTest:

e2e-uploadtest-dpt-1764344071: Complete ✅

Upload speed: 283 MBps

Encryption: AES256 enabled

Recent Events:

Some expected restore warnings for pre-existing resources (ConfigMaps, Secrets, RoleBindings) - these are normal

Minor errors about authorization.openshift.io/v1 RoleBindings not supporting watch (expected, handled by fallback)

No critical errors in velero.log (preprocessed-logs.txt shows only debug-level EOF messages from plugin lifecycle)

Recommendations (Prioritized)

Immediate Actions (Critical)

None required - No critical bugs detected. Both failures are intentional test cases.

Investigation Needed

None required - All OADP functionality operating correctly.

Flake Handling

None required - No flakes detected.

Configuration Review

Remove validation tests: Delete temporary Claude analysis test files after validation complete

File: /go/src/github.com/openshift/oadp-operator/tests/e2e/claude_test_failure_test.go

Lines 412-423 in /go/src/github.com/openshift/oadp-operator/tests/e2e/backup_restore_suite_test.go

Test suite organization: Consider creating separate test suite for tooling/pipeline validation tests

CI configuration: Ensure intentional failure tests are excluded from production test runs

Analysis Confidence

High Confidence: Both failures are intentional test cases (100% certainty from source code inspection)

Medium Confidence: N/A

Low Confidence: N/A

Suggested Next Steps for Developer

✅ Validate Claude analysis is working - This analysis confirms the AI system can correctly identify and analyze test failures

Clean up test scaffolding: Remove the two intentional failure tests now that validation is complete

Document the pattern: If periodic validation of the analysis pipeline is needed, create a dedicated test suite

Re-run tests: After removing intentional failures, the test suite should have 100% pass rate (49 passed tests, 30 skipped, 0 failed)

Additional Context

The test run demonstrates excellent OADP functionality:

CSI snapshot backup/restore: Working perfectly

Backup storage: S3-compatible storage (AWS) functioning correctly

Restore operations: All resources restored successfully with expected warnings

Performance: 283 MBps upload speed, efficient snapshot operations

No real failures: The only failures are synthetic test cases

Passing Tests Include:

MySQL application CSI backup/restore (baseline test passed)

Multiple BSL with custom CA cert handling (3 BSLs tested)

DPA reconciliation tests (11 different DPA configurations tested)

DPA deletion test passed

All tests confirm OADP operator is functioning correctly

The test suite demonstrates comprehensive validation of OADP deployment, configuration, and backup/restore capabilities on AWS infrastructure.

weshayutin · 2025-12-01T14:23:01Z

that's pretty hot! I'll be back tomorrow if you guys want to chat :) THANK YOU! For pushing this along

weshayutin · 2025-12-03T15:53:22Z

/test 4.20-e2e-test-aws

Original PR: openshift#2034 Author: Tiger Kaovilai <tkaovila@redhat.com> Date: Fri Nov 28 09:17:23 2025 -0500 refactor: update runtime permissions in failure analysis documentation and scripts - Replace `--allowedTools` with `--add-dir` for granting directory access in the analysis script. - Enhance documentation to clarify the use of `--add-dir` and `--allowedTools` for bypassing sandbox CWD restrictions. - Ensure consistent usage of CLI flags across the `analyze_failures.sh` script and design documentation. These changes improve clarity and functionality in the failure analysis process, ensuring proper access to necessary directories during runtime. Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

weshayutin · 2025-12-03T15:53:53Z

/hold

weshayutin · 2025-12-03T15:54:00Z

moving to #2038

openshift-ci · 2025-12-03T18:40:53Z

@kaovilai: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/4.20-e2e-test-kubevirt-aws	`57152d3`	link	true	`/test 4.20-e2e-test-kubevirt-aws`
ci/prow/4.20-e2e-test-cli-aws	`57152d3`	link	true	`/test 4.20-e2e-test-cli-aws`
ci/prow/4.20-e2e-test-aws	`57152d3`	link	true	`/test 4.20-e2e-test-aws`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 22, 2025

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 22, 2025

kaovilai force-pushed the claude-prow-failure-analysis branch from b51f15b to 5429631 Compare November 24, 2025 16:12

kaovilai force-pushed the claude-prow-failure-analysis branch from 5429631 to 22ad6b9 Compare November 24, 2025 22:42

kaovilai added 2 commits November 25, 2025 11:40

kaovilai added 2 commits November 28, 2025 09:09

mpryc mentioned this pull request Dec 3, 2025

PR to analyze failures based on the Tigers PR. #2038

Open

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 3, 2025

Add automated Claude failure analysis for Prow CI #2034

Are you sure you want to change the base?

Add automated Claude failure analysis for Prow CI #2034

Conversation

kaovilai commented Nov 22, 2025

Why the changes were made

How to test the changes made

Uh oh!

openshift-ci bot commented Nov 22, 2025

Uh oh!

coderabbitai bot commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

openshift-ci bot commented Nov 22, 2025

Uh oh!

kaovilai commented Nov 23, 2025

Uh oh!

kaovilai commented Nov 24, 2025

Uh oh!

kaovilai commented Nov 24, 2025

Uh oh!

kaovilai commented Nov 24, 2025

Uh oh!

kaovilai commented Nov 25, 2025

Uh oh!

kaovilai commented Nov 25, 2025

Uh oh!

kaovilai commented Nov 25, 2025

Uh oh!

kaovilai commented Nov 25, 2025

Uh oh!

kaovilai commented Nov 28, 2025

Uh oh!

kaovilai commented Nov 29, 2025

OADP E2E Test Failure Analysis

Executive Summary

Failed Tests Analysis

1. MySQL CSI Claude Test (INTENTIONAL FAILURE) [FLAKE]

2. Claude Analysis Test Failure [FLAKE]

Known Flakes Detected

Cluster Health Summary

Recommendations (Prioritized)

Immediate Actions (Critical)

Investigation Needed

Flake Handling

Configuration Review

Analysis Confidence

Suggested Next Steps for Developer

Additional Context

Uh oh!

weshayutin commented Dec 1, 2025

Uh oh!

weshayutin commented Dec 3, 2025

Uh oh!

weshayutin commented Dec 3, 2025

Uh oh!

weshayutin commented Dec 3, 2025

Uh oh!

openshift-ci bot commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Nov 22, 2025 •

edited

Loading