PR to analyze failures based on the Tigers PR. #2038

mpryc · 2025-12-03T15:53:35Z

Original PR: #2034

Author: Tiger Kaovilai <tkaovila@redhat.com>
Date:   Fri Nov 28 09:17:23 2025 -0500

    refactor: update runtime permissions in failure analysis documentation and scripts

    - Replace `--allowedTools` with `--add-dir` for granting directory access in the analysis script.
    - Enhance documentation to clarify the use of `--add-dir` and `--allowedTools` for bypassing sandbox CWD restrictions.
    - Ensure consistent usage of CLI flags across the `analyze_failures.sh` script and design documentation.

    These changes improve clarity and functionality in the failure analysis process, ensuring proper access to necessary directories during runtime.

    Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

Why the changes were made

How to test the changes made

Original PR: openshift#2034 Author: Tiger Kaovilai <tkaovila@redhat.com> Date: Fri Nov 28 09:17:23 2025 -0500 refactor: update runtime permissions in failure analysis documentation and scripts - Replace `--allowedTools` with `--add-dir` for granting directory access in the analysis script. - Enhance documentation to clarify the use of `--add-dir` and `--allowedTools` for bypassing sandbox CWD restrictions. - Ensure consistent usage of CLI flags across the `analyze_failures.sh` script and design documentation. These changes improve clarity and functionality in the failure analysis process, ensuring proper access to necessary directories during runtime. Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

coderabbitai · 2025-12-03T15:53:46Z

Walkthrough

These changes integrate Claude AI into the E2E testing pipeline for automated failure analysis. A new permissions configuration is added, documentation describes Claude integration requirements and workflows, the Dockerfile installs Claude CLI and Node.js dependencies, the Makefile's test-e2e target orchestrates failure analysis post-run with CI-specific credential setup, and a new analysis script preprocesses artifacts and invokes Claude via Vertex AI.

Changes

Cohort / File(s)	Summary
Configuration & Documentation `\.claude/config.json`, `CLAUDE.md`, `docs/design/claude-prow-failure-analysis_design.md`	Adds Claude permission controls (allow/deny arrays), new integration guide covering prerequisites, development commands, E2E workflows, and Claude-based automated failure analysis; introduces comprehensive design document detailing end-to-end Prow integration, artifact preprocessing, Claude prompt generation, Vertex AI credential setup, security considerations, and phased rollout plan.
CI Infrastructure `Makefile`, `build/ci-Dockerfile`	Modifies test-e2e target to capture exit codes, conditionally set Google Cloud credentials and Vertex AI region on OPENSHIFT_CI, and invoke failure analysis script; updates Dockerfile to use multi-arch kubectl installation via TARGETARCH, installs Node.js and Claude CLI, and makes analyze script executable.
Analysis Implementation `tests/e2e/scripts/analyze_failures.sh`	New script for post-test failure analysis that preprocesses large log artifacts (redaction, Claude-based extraction), generates structured prompts, executes Claude with timeouts, captures output as markdown, and preserves original test exit codes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas requiring extra attention:

tests/e2e/scripts/analyze_failures.sh — Complex logic involving artifact preprocessing, redaction patterns, Claude CLI invocation with timeouts, error handling fallbacks, and secret sanitization; verify robustness of preprocessing decisions and Claude output parsing.
Makefile test-e2e target — Credential setup and conditional CI logic; ensure environment variable handling is correct and exit code propagation doesn't mask CI failures.
build/ci-Dockerfile — Multi-arch kubectl approach and Node.js installation via external setup script; verify architecture detection and package manager interactions.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2025-12-03T15:54:01Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mpryc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mpryc]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

tests/e2e/scripts/analyze_failures.sh (2)
56-72: Tighten large‑log preprocessing to avoid duplication and follow shell best practices

Two points worth adjusting here:

Avoid masking get_file_size exit status (SC2155)
Declaring and assigning file_size in one line can hide a failure from get_file_size. Prefer:
-    local file_size=$(get_file_size "$log_file")
+    local file_size
+    file_size=$(get_file_size "$log_file")
Prevent double‑processing must‑gather logs
The first find "${ARTIFACT_DIR}" ... walk will already include ${ARTIFACT_DIR}/must-gather/**, and then you loop over ${ARTIFACT_DIR}/must-gather again. That can duplicate work and inflate token usage for big runs. One simple fix is to exclude must-gather from the first search:
-        done < <(find "${ARTIFACT_DIR}" -maxdepth 4 -name "*.log" -type f 2>/dev/null | while read f; do
+        done < <(find "${ARTIFACT_DIR}" -maxdepth 4 -name "*.log" -type f ! -path "${ARTIFACT_DIR}/must-gather/*" 2>/dev/null | while read f; do
This keeps per‑test logs and must‑gather logs logically separated in the two loops and avoids redundant subagent calls.

Also applies to: 119-167

21-22: Make exit‑code handling and temp‑file cleanup more robust

The failure‑analysis behavior is correct, but a couple of small tweaks will harden the script:

Initialize and quote EXIT_CODE defensively

Right now EXIT_CODE=$1 and later if [ $EXIT_CODE -ne 0 ]; then and exit $EXIT_CODE. This is fine when the caller always passes a numeric code, but it is easy to make the script more bullet‑proof:
-EXIT_CODE=$1
+EXIT_CODE=${1:-0}
...
-if [ $EXIT_CODE -ne 0 ]; then
+if [ "$EXIT_CODE" -ne 0 ]; then
...
-exit $EXIT_CODE
+exit "$EXIT_CODE"
Use a quoted single‑quoted trap to avoid premature expansion (SC2064)
-    TEMP_OUTPUT=$(mktemp)
-    trap "rm -f $TEMP_OUTPUT" EXIT
+    TEMP_OUTPUT=$(mktemp)
+    trap 'rm -f "$TEMP_OUTPUT"' EXIT
This ensures the trap always refers to the actual temp file path and behaves correctly even if the variable were changed later.

Also applies to: 169-181, 188-193, 365-367, 372-373, 432-432
docs/design/claude-prow-failure-analysis_design.md (1)

629-648: Update test-e2e Makefile snippet to match the current implementation

The Makefile example here has diverged from the real test-e2e target (flag names, provider/credentials wiring, and CI‑only Claude setup). For someone copying this snippet, that mismatch will be confusing and could break local runs.

Recommend regenerating this section from the live Makefile (including the EXIT_CODE handling and analyze_failures.sh call) so the design doc remains an accurate reference.

build/ci-Dockerfile (1)

8-22: CI image wiring is correct; consider pinning Node/Claude versions

The changes correctly:

Ensure tests/e2e/scripts/analyze_failures.sh is executable for CI.

Switch kubectl installation to a multi‑arch flow using TARGETARCH.

Add Node.js and the global @anthropic-ai/claude-code CLI needed by the analysis script.

To improve reproducibility of CI images, consider pinning:

A specific Node.js major/minor (via a fixed NodeSource channel or explicit package version).

A specific @anthropic-ai/claude-code version (e.g. npm install -g @anthropic-ai/claude-code@<version>).

That will make it easier to audit and debug CI behavior over time when dependencies evolve.
Makefile (1)
860-886: EXIT_CODE capture and Claude invocation look good; optionally initialize EXIT_CODE explicitly

The updated test-e2e target correctly:

Runs ginkgo with the extended args (including $(HCP_EXTERNAL_ARGS)).

Captures the test exit status into EXIT_CODE via ... || EXIT_CODE=$$?.

In OpenShift CI, wires up Vertex AI env vars from /var/run/oadp-credentials/* and invokes analyze_failures.sh with the captured code.

Exits with the original test result so Prow status is preserved.

For extra robustness against odd environments where EXIT_CODE might already be set, you could initialize it in this recipe shell before running ginkgo:
test-e2e: test-e2e-setup install-ginkgo
	EXIT_CODE=0; \
	ginkgo run ... || EXIT_CODE=$$?; \
	...
	exit $$EXIT_CODE
The current behavior is still correct in normal CI usage; this is just a defensive hardening.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between fcd0541 and fa6ce87.

📒 Files selected for processing (6)

.claude/config.json (1 hunks)
CLAUDE.md (1 hunks)
Makefile (1 hunks)
build/ci-Dockerfile (1 hunks)
docs/design/claude-prow-failure-analysis_design.md (1 hunks)
tests/e2e/scripts/analyze_failures.sh (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

build/ci-Dockerfile
CLAUDE.md
tests/e2e/scripts/analyze_failures.sh
Makefile
docs/design/claude-prow-failure-analysis_design.md

🪛 LanguageTool

docs/design/claude-prow-failure-analysis_design.md

[style] ~12-~12: Consider a different adjective to strengthen your wording.
Context: ...analysis is time-consuming and requires deep domain knowledge of Velero, CSI snapsho...

(DEEP_PROFOUND)

[grammar] ~26-~26: Use a hyphen to join words.
Context: ...n't block test result reporting) ## Non Goals - Live cluster diagnostics during...

(QB_NEW_EN_HYPHEN)

[grammar] ~956-~956: Ensure spelling is correct
Context: ...ly more complex plumbing Decision: Chose Option B for better error isolation and...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~1075-~1075: Ensure spelling is correct
Context: ...original test code - Vertex AI timeout (>10min): Script logs timeout, preserves test r...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[style] ~1256-~1256: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...ifact Sets Question: How to handle very large must-gather archives or many per-test l...

(EN_WEAK_ADJECTIVE)

[style] ~1259-~1259: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...Claude Code CLI may truncate or fail on very large inputs - Some test runs generate extens...

(EN_WEAK_ADJECTIVE)

🪛 markdownlint-cli2 (0.18.1)

docs/design/claude-prow-failure-analysis_design.md

434-434: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

517-517: Headings must start at the beginning of the line

(MD023, heading-start-left)

527-527: Headings must start at the beginning of the line

(MD023, heading-start-left)

531-531: Headings must start at the beginning of the line

(MD023, heading-start-left)

532-532: Headings must start at the beginning of the line

(MD023, heading-start-left)

533-533: Headings must start at the beginning of the line

(MD023, heading-start-left)

534-534: Headings must start at the beginning of the line

(MD023, heading-start-left)

566-566: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

596-596: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

623-623: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

632-632: Hard tabs
Column: 1

(MD010, no-hard-tabs)

633-633: Hard tabs
Column: 1

(MD010, no-hard-tabs)

634-634: Hard tabs
Column: 1

(MD010, no-hard-tabs)

635-635: Hard tabs
Column: 1

(MD010, no-hard-tabs)

636-636: Hard tabs
Column: 1

(MD010, no-hard-tabs)

637-637: Hard tabs
Column: 1

(MD010, no-hard-tabs)

638-638: Hard tabs
Column: 1

(MD010, no-hard-tabs)

639-639: Hard tabs
Column: 1

(MD010, no-hard-tabs)

640-640: Hard tabs
Column: 1

(MD010, no-hard-tabs)

641-641: Hard tabs
Column: 1

(MD010, no-hard-tabs)

642-642: Hard tabs
Column: 1

(MD010, no-hard-tabs)

643-643: Hard tabs
Column: 1

(MD010, no-hard-tabs)

644-644: Hard tabs
Column: 1

(MD010, no-hard-tabs)

645-645: Hard tabs
Column: 1

(MD010, no-hard-tabs)

646-646: Hard tabs
Column: 1

(MD010, no-hard-tabs)

647-647: Hard tabs
Column: 1

(MD010, no-hard-tabs)

757-757: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

788-788: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

821-821: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

850-850: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

873-873: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Shellcheck (0.11.0)

tests/e2e/scripts/analyze_failures.sh

[warning] 62-62: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 366-366: Use single quotes, otherwise this expands now rather than when signalled.

(SC2064)

🔇 Additional comments (3)

.claude/config.json (1)

1-45: Permissions policy matches intended read‑only, sandboxed usage

Allow/deny lists are coherent with the log‑analysis use case and correctly block destructive and networked actions (rm, curl/wget, kubectl apply/delete, docker, git push, WebSearch/WebFetch). No issues from a security/CI safety standpoint.

tests/e2e/scripts/analyze_failures.sh (1)

27-44: Overall flow, safety checks, and redaction look solid

The script’s structure is sound:

It gates analysis on a non‑zero test exit code and explicit CLI/Vertex configuration checks.

It keeps analysis read‑only and bounded via timeout (60s per subagent, 600s overall) and early exits that preserve the original test result.

All user‑visible Claude output is passed through redact_secrets, which substantially reduces credential‑leakage risk in the generated markdown.

No blocking issues here from a correctness or security perspective.

Also applies to: 169-181, 368-424

CLAUDE.md (1)

1-239: Claude integration overview is consistent with the implementation

The description of how Claude hooks into Prow (artifacts used, where claude-failure-analysis.md appears, and the opt‑out/credential behavior) matches the bash script and Makefile wiring. This should be a useful high‑level entry point for maintainers.

weshayutin · 2025-12-03T21:19:51Z

/test 4.20-e2e-test-aws

weshayutin · 2025-12-04T18:05:03Z

/test all

openshift-ci · 2025-12-04T21:26:21Z

@mpryc: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/4.20-e2e-test-aws	`fa6ce87`	link	true	`/test 4.20-e2e-test-aws`
ci/prow/4.20-e2e-test-cli-aws	`fa6ce87`	link	true	`/test 4.20-e2e-test-cli-aws`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot requested review from sseago and weshayutin December 3, 2025 15:53

weshayutin mentioned this pull request Dec 3, 2025

Add automated Claude failure analysis for Prow CI #2034

Draft

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 3, 2025

coderabbitai bot reviewed Dec 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PR to analyze failures based on the Tigers PR. #2038

PR to analyze failures based on the Tigers PR. #2038

mpryc commented Dec 3, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Dec 3, 2025 •

edited

Loading

Uh oh!

openshift-ci bot commented Dec 3, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

weshayutin commented Dec 3, 2025

Uh oh!

weshayutin commented Dec 4, 2025

Uh oh!

openshift-ci bot commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PR to analyze failures based on the Tigers PR. #2038

Are you sure you want to change the base?

PR to analyze failures based on the Tigers PR. #2038

Conversation

mpryc commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why the changes were made

How to test the changes made

Uh oh!

coderabbitai bot commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

openshift-ci bot commented Dec 3, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

weshayutin commented Dec 3, 2025

Uh oh!

weshayutin commented Dec 4, 2025

Uh oh!

openshift-ci bot commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mpryc commented Dec 3, 2025 •

edited

Loading

coderabbitai bot commented Dec 3, 2025 •

edited

Loading