Skip to content

adding investigate-ci-failure skills#1659

Closed
blublinsky wants to merge 1 commit into
openshift:mainfrom
blublinsky:ci-investigation
Closed

adding investigate-ci-failure skills#1659
blublinsky wants to merge 1 commit into
openshift:mainfrom
blublinsky:ci-investigation

Conversation

@blublinsky
Copy link
Copy Markdown
Contributor

@blublinsky blublinsky commented May 27, 2026

Description

Summary
Adds an operator-specific investigate-ci-failure Cursor skill so agents can triage failed PR checks on openshift/lightspeed-operator using Prow and Konflux without Konflux SSO for most cases.

Adapted from the lightspeed-service skill and validated against PR #1641 (smoke-tested gh, GCS, and Quay paths).

What’s in the skill
Dual CI entry point — gh pr checks, Prow commit statuses vs Konflux check runs
Prow (operator) — GCS artifact layout for ci/prow/*, especially bundle-e2e-4-XX/e2e-test/build-log.txt (Ginkgo in test/e2e/), distinct from service e2e-ols-cluster/
Konflux — Task tables in check run output.text (not output.summary); build vs integration check patterns; Quay ols-operator-artifacts:{sha} and on-pr-{sha} retention
Repo reference — Prow job list, Konflux pipeline/check names, .tekton triggers (including related_images.json → ols-bundle-on-pull-request), local command mapping (make test, etc.)
Triage workflow — Prow → GCS → Konflux API → oras → Konflux UI fallback; report template (retry / fix / escalate)
Skill validation — Copy-paste smoke commands against a known green PR
How to use
Invoke when a PR has red checks, e.g. /investigate-ci-failure or @ skill with a PR URL. Skill has disable-model-invocation: true — it does not auto-attach; invoke explicitly.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up dependent library

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Documentation
    • Added comprehensive guide for investigating CI build failures, including step-by-step procedures for diagnosing failures, retrieving build logs and artifacts, cross-referencing with code changes, and taking recommended actions.

@openshift-ci openshift-ci Bot requested review from joshuawilson and xrajesh May 27, 2026 10:55
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive Cursor skill guide for investigating CI failures in the lightspeed-operator repository. The guide covers end-to-end procedures for diagnosing Prow and Konflux failures from PR URLs, including artifact triage, task analysis, job reference data, and validation workflows.

Changes

CI Failure Investigation Skill Guide

Layer / File(s) Summary
Document frontmatter and setup
.cursor/skills/investigate-ci-failure/SKILL.md (lines 1–6)
Metadata declarations (name: "Investigate CI Failure", purpose, description) and configuration to disable automatic model invocation.
Prow failure investigation procedure
.cursor/skills/investigate-ci-failure/SKILL.md (lines 7–186)
Complete workflow: extract PR metadata and changed files, list failed Prow commit statuses, construct GCS artifact URLs, triage using finished.json and build logs with artifact tree exploration, cross-reference with PR changes, and produce structured failure reports with retry/fix/escalate actions.
Konflux failure investigation procedure
.cursor/skills/investigate-ci-failure/SKILL.md (lines 187–462)
Full playbook for Konflux: list and select check runs, fetch check output for task triage, handle annotations, match task names to causes, cross-reference with PR changes, retrieve scan results from Quay via oras with media-type parsing, and provide fallback flows for user-provided logs.
Repository-specific CI job mappings and triage reference
.cursor/skills/investigate-ci-failure/SKILL.md (lines 473–546)
Tables documenting local make targets mapped to Prow jobs, Prow context URLs, Konflux check names (build/integration/EC), and an ordered triage sequence for diagnosing failed PR checks in this repository.
Skill validation and tool usage notes
.cursor/skills/investigate-ci-failure/SKILL.md (lines 547–589)
Smoke-test validation commands to verify check presence, GCS artifact availability, and Quay artifact discoverability against a known PR; guidance on gh CLI, oras tool, and WebFetch vs raw GCS endpoint usage.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'adding investigate-ci-failure skills' directly and specifically describes the main change—adding a new Cursor skill for CI failure investigation. It is concise, clear, and accurately summarizes the primary addition in the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 27, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign bparees for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.cursor/skills/investigate-ci-failure/SKILL.md:
- Around line 218-223: The table row for "Enterprise Contract / trusted links"
is missing the third cell and triggers MD056; update that row in the Markdown
table to include an `output.summary` column (e.g., add a third pipe cell with a
short summary like "Use EC docs/UI — no task table" or "N/A") so the row has
three cells matching the headers (`Check type`, `Task table location`,
`output.summary`) and ensure the row uses the same pipe-separated format as the
other rows.
- Around line 54-56: Several fenced code blocks in SKILL.md are missing language
tags (e.g., the block containing the Prow URL ```` ``` https://prow... ````);
update every fenced block to include an appropriate language identifier so
markdownlint MD040 passes: use "text" for plain URLs/snippets, "bash" for
shell/CLI examples, and "json" for JSON payloads. Scan the file for all
triple-backtick blocks (including the ones noted in the review) and add the
correct tag right after the opening backticks (for example change ``` to ```text
or ```bash/```json) keeping the block content unchanged.
- Around line 563-565: The example uses a literal '*' in the HTTP URL (curl -sf
"$BASE/pull-ci-openshift-lightspeed-operator-main-unit/*/finished.json"), which
doesn't work because shells don't expand wildcards in remote URLs; replace the
one-step wildcard example with a two-step approach: first retrieve the builds
index (e.g., list the directory at
"$BASE/pull-ci-openshift-lightspeed-operator-main-unit/"), determine the desired
build_id (latest or explicit), then request that specific finished.json URL
(i.e., construct and curl "$BASE/.../{build_id}/finished.json"); update the
SKILL.md example to show these two steps and emphasize preferring an explicit
build_id rather than a wildcard.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f51eeaca-b8e0-417f-bdae-c8b7e6008285

📥 Commits

Reviewing files that changed from the base of the PR and between f020286 and 6b1da49.

📒 Files selected for processing (1)
  • .cursor/skills/investigate-ci-failure/SKILL.md

Comment on lines +54 to +56
```
https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/{org}_{repo}/{pr}/{job_name}/{build_id}
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks to satisfy markdownlint (MD040).

Several fenced blocks are missing a language tag. Add text/bash/json as appropriate to keep docs lint-clean.

Also applies to: 70-72, 78-80, 94-96, 99-112, 166-168, 422-424, 435-439, 443-447

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 54-54: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.cursor/skills/investigate-ci-failure/SKILL.md around lines 54 - 56, Several
fenced code blocks in SKILL.md are missing language tags (e.g., the block
containing the Prow URL ```` ``` https://prow... ````); update every fenced
block to include an appropriate language identifier so markdownlint MD040
passes: use "text" for plain URLs/snippets, "bash" for shell/CLI examples, and
"json" for JSON payloads. Scan the file for all triple-backtick blocks
(including the ones noted in the review) and add the correct tag right after the
opening backticks (for example change ``` to ```text or ```bash/```json) keeping
the block content unchanged.

Comment on lines +218 to +223
| Check type | Task table location | `output.summary` |
|---|---|---|
| Build pipelines (`lightspeed-operator-on-pull-request`, `ols-bundle-on-pull-request`) | **`output.text`** — HTML `<h4>Task Statuses:</h4>` table with 🟢/🔴 and per-task Konflux UI log links | Often short (e.g. "Build pipeline … has passed") |
| Integration tests (`operator-e2e-tests-*`, `upgrade-e2e-tests`, `service-e2e-tests-*`, console tests) | **`output.text`** — markdown table: Task \| Duration \| Status \| Details; pipelinerun link at top | One line: "Integration test for component … has passed/failed" |
| Enterprise Contract / trusted links | `https://red.ht/trusted` in `gh pr checks` — use EC docs or UI; not a task table in the API |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the table row with missing third column (MD056).

The Enterprise Contract / trusted links row has only 2 cells in a 3-column table, which breaks rendering/linting. Add the missing output.summary cell.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~221-~221: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ..., console tests) | **output.text`** — markdown table: Task | Duration | Status | De...

(MARKDOWN_NNP)

🪛 markdownlint-cli2 (0.22.1)

[warning] 222-222: Table column count
Expected: 3; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.cursor/skills/investigate-ci-failure/SKILL.md around lines 218 - 223, The
table row for "Enterprise Contract / trusted links" is missing the third cell
and triggers MD056; update that row in the Markdown table to include an
`output.summary` column (e.g., add a third pipe cell with a short summary like
"Use EC docs/UI — no task table" or "N/A") so the row has three cells matching
the headers (`Check type`, `Task table location`, `output.summary`) and ensure
the row uses the same pipe-separated format as the other rows.

Comment on lines +563 to +565
curl -sf "$BASE/pull-ci-openshift-lightspeed-operator-main-unit/*/finished.json" 2>/dev/null | head -1
# Prefer explicit build_id from gh pr checks URL tail

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix the broken wildcard curl example in smoke tests.

curl does not expand * in HTTP URLs, so this command won’t reliably fetch finished.json and can mislead validation.

Proposed doc fix
-# Or take build_id from prow target_url in statuses API
-curl -sf "$BASE/pull-ci-openshift-lightspeed-operator-main-unit/*/finished.json" 2>/dev/null | head -1
-# Prefer explicit build_id from gh pr checks URL tail
+# Prefer explicit build_id from prow target_url (statuses API) or gh pr checks URL tail
+BUILD_ID=$(gh api "repos/$REPO/statuses/$SHA" \
+  --jq '.[] | select(.context=="ci/prow/unit") | .target_url' \
+  | awk -F/ '{print $NF}' | head -1)
+curl -sf "$BASE/pull-ci-openshift-lightspeed-operator-main-unit/$BUILD_ID/finished.json" | head -1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.cursor/skills/investigate-ci-failure/SKILL.md around lines 563 - 565, The
example uses a literal '*' in the HTTP URL (curl -sf
"$BASE/pull-ci-openshift-lightspeed-operator-main-unit/*/finished.json"), which
doesn't work because shells don't expand wildcards in remote URLs; replace the
one-step wildcard example with a two-step approach: first retrieve the builds
index (e.g., list the directory at
"$BASE/pull-ci-openshift-lightspeed-operator-main-unit/"), determine the desired
build_id (latest or explicit), then request that specific finished.json URL
(i.e., construct and curl "$BASE/.../{build_id}/finished.json"); update the
SKILL.md example to show these two steps and emphasize preferring an explicit
build_id rather than a wildcard.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 27, 2026

@blublinsky: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@blublinsky blublinsky closed this May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant