adding investigate-ci-failure skills by blublinsky · Pull Request #1659 · openshift/lightspeed-operator

blublinsky · 2026-05-27T10:55:41Z

Description

Summary
Adds an operator-specific investigate-ci-failure Cursor skill so agents can triage failed PR checks on openshift/lightspeed-operator using Prow and Konflux without Konflux SSO for most cases.

Adapted from the lightspeed-service skill and validated against PR #1641 (smoke-tested gh, GCS, and Quay paths).

What’s in the skill
Dual CI entry point — gh pr checks, Prow commit statuses vs Konflux check runs
Prow (operator) — GCS artifact layout for ci/prow/*, especially bundle-e2e-4-XX/e2e-test/build-log.txt (Ginkgo in test/e2e/), distinct from service e2e-ols-cluster/
Konflux — Task tables in check run output.text (not output.summary); build vs integration check patterns; Quay ols-operator-artifacts:{sha} and on-pr-{sha} retention
Repo reference — Prow job list, Konflux pipeline/check names, .tekton triggers (including related_images.json → ols-bundle-on-pull-request), local command mapping (make test, etc.)
Triage workflow — Prow → GCS → Konflux API → oras → Konflux UI fallback; report template (retry / fix / escalate)
Skill validation — Copy-paste smoke commands against a known green PR
How to use
Invoke when a PR has red checks, e.g. /investigate-ci-failure or @ skill with a PR URL. Skill has disable-model-invocation: true — it does not auto-attach; invoke explicitly.

Type of change

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Documentation
- Added comprehensive guide for investigating CI build failures, including step-by-step procedures for diagnosing failures, retrieving build logs and artifacts, cross-referencing with code changes, and taking recommended actions.

coderabbitai · 2026-05-27T10:55:56Z

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive Cursor skill guide for investigating CI failures in the lightspeed-operator repository. The guide covers end-to-end procedures for diagnosing Prow and Konflux failures from PR URLs, including artifact triage, task analysis, job reference data, and validation workflows.

Changes

CI Failure Investigation Skill Guide

Layer / File(s)	Summary
Document frontmatter and setup `.cursor/skills/investigate-ci-failure/SKILL.md` (lines 1–6)	Metadata declarations (name: "Investigate CI Failure", purpose, description) and configuration to disable automatic model invocation.
Prow failure investigation procedure `.cursor/skills/investigate-ci-failure/SKILL.md` (lines 7–186)	Complete workflow: extract PR metadata and changed files, list failed Prow commit statuses, construct GCS artifact URLs, triage using `finished.json` and build logs with artifact tree exploration, cross-reference with PR changes, and produce structured failure reports with retry/fix/escalate actions.
Konflux failure investigation procedure `.cursor/skills/investigate-ci-failure/SKILL.md` (lines 187–462)	Full playbook for Konflux: list and select check runs, fetch check output for task triage, handle annotations, match task names to causes, cross-reference with PR changes, retrieve scan results from Quay via `oras` with media-type parsing, and provide fallback flows for user-provided logs.
Repository-specific CI job mappings and triage reference `.cursor/skills/investigate-ci-failure/SKILL.md` (lines 473–546)	Tables documenting local make targets mapped to Prow jobs, Prow context URLs, Konflux check names (build/integration/EC), and an ordered triage sequence for diagnosing failed PR checks in this repository.
Skill validation and tool usage notes `.cursor/skills/investigate-ci-failure/SKILL.md` (lines 547–589)	Smoke-test validation commands to verify check presence, GCS artifact availability, and Quay artifact discoverability against a known PR; guidance on `gh` CLI, `oras` tool, and WebFetch vs raw GCS endpoint usage.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'adding investigate-ci-failure skills' directly and specifically describes the main change—adding a new Cursor skill for CI failure investigation. It is concise, clear, and accurately summarizes the primary addition in the changeset.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-05-27T10:55:56Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign bparees for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.cursor/skills/investigate-ci-failure/SKILL.md:
- Around line 218-223: The table row for "Enterprise Contract / trusted links"
is missing the third cell and triggers MD056; update that row in the Markdown
table to include an `output.summary` column (e.g., add a third pipe cell with a
short summary like "Use EC docs/UI — no task table" or "N/A") so the row has
three cells matching the headers (`Check type`, `Task table location`,
`output.summary`) and ensure the row uses the same pipe-separated format as the
other rows.
- Around line 54-56: Several fenced code blocks in SKILL.md are missing language
tags (e.g., the block containing the Prow URL ```` ``` https://prow... ````);
update every fenced block to include an appropriate language identifier so
markdownlint MD040 passes: use "text" for plain URLs/snippets, "bash" for
shell/CLI examples, and "json" for JSON payloads. Scan the file for all
triple-backtick blocks (including the ones noted in the review) and add the
correct tag right after the opening backticks (for example change ``` to ```text
or ```bash/```json) keeping the block content unchanged.
- Around line 563-565: The example uses a literal '*' in the HTTP URL (curl -sf
"$BASE/pull-ci-openshift-lightspeed-operator-main-unit/*/finished.json"), which
doesn't work because shells don't expand wildcards in remote URLs; replace the
one-step wildcard example with a two-step approach: first retrieve the builds
index (e.g., list the directory at
"$BASE/pull-ci-openshift-lightspeed-operator-main-unit/"), determine the desired
build_id (latest or explicit), then request that specific finished.json URL
(i.e., construct and curl "$BASE/.../{build_id}/finished.json"); update the
SKILL.md example to show these two steps and emphasize preferring an explicit
build_id rather than a wildcard.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f51eeaca-b8e0-417f-bdae-c8b7e6008285

📥 Commits

Reviewing files that changed from the base of the PR and between f020286 and 6b1da49.

📒 Files selected for processing (1)

.cursor/skills/investigate-ci-failure/SKILL.md

coderabbitai · 2026-05-27T10:58:27Z

+```
+https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/{org}_{repo}/{pr}/{job_name}/{build_id}
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks to satisfy markdownlint (MD040).

Several fenced blocks are missing a language tag. Add text/bash/json as appropriate to keep docs lint-clean.

Also applies to: 70-72, 78-80, 94-96, 99-112, 166-168, 422-424, 435-439, 443-447

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 54-54: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.cursor/skills/investigate-ci-failure/SKILL.md around lines 54 - 56, Several fenced code blocks in SKILL.md are missing language tags (e.g., the block containing the Prow URL ```` ``` https://prow... ````); update every fenced block to include an appropriate language identifier so markdownlint MD040 passes: use "text" for plain URLs/snippets, "bash" for shell/CLI examples, and "json" for JSON payloads. Scan the file for all triple-backtick blocks (including the ones noted in the review) and add the correct tag right after the opening backticks (for example change ``` to ```text or ```bash/```json) keeping the block content unchanged.

coderabbitai · 2026-05-27T10:58:27Z

+| Check type | Task table location | `output.summary` |
+|---|---|---|
+| Build pipelines (`lightspeed-operator-on-pull-request`, `ols-bundle-on-pull-request`) | **`output.text`** — HTML `<h4>Task Statuses:</h4>` table with 🟢/🔴 and per-task Konflux UI log links | Often short (e.g. "Build pipeline … has passed") |
+| Integration tests (`operator-e2e-tests-*`, `upgrade-e2e-tests`, `service-e2e-tests-*`, console tests) | **`output.text`** — markdown table: Task \| Duration \| Status \| Details; pipelinerun link at top | One line: "Integration test for component … has passed/failed" |
+| Enterprise Contract / trusted links | `https://red.ht/trusted` in `gh pr checks` — use EC docs or UI; not a task table in the API |
+


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the table row with missing third column (MD056).

The Enterprise Contract / trusted links row has only 2 cells in a 3-column table, which breaks rendering/linting. Add the missing output.summary cell.

🧰 Tools

🪛 LanguageTool

[uncategorized] ~221-~221: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ..., console tests) | **output.text`** — markdown table: Task | Duration | Status | De...

(MARKDOWN_NNP)

🪛 markdownlint-cli2 (0.22.1)

[warning] 222-222: Table column count
Expected: 3; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.cursor/skills/investigate-ci-failure/SKILL.md around lines 218 - 223, The table row for "Enterprise Contract / trusted links" is missing the third cell and triggers MD056; update that row in the Markdown table to include an `output.summary` column (e.g., add a third pipe cell with a short summary like "Use EC docs/UI — no task table" or "N/A") so the row has three cells matching the headers (`Check type`, `Task table location`, `output.summary`) and ensure the row uses the same pipe-separated format as the other rows.

coderabbitai · 2026-05-27T10:58:27Z

+curl -sf "$BASE/pull-ci-openshift-lightspeed-operator-main-unit/*/finished.json" 2>/dev/null | head -1
+# Prefer explicit build_id from gh pr checks URL tail
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix the broken wildcard curl example in smoke tests.

curl does not expand * in HTTP URLs, so this command won’t reliably fetch finished.json and can mislead validation.

Proposed doc fix

-# Or take build_id from prow target_url in statuses API -curl -sf "$BASE/pull-ci-openshift-lightspeed-operator-main-unit/*/finished.json" 2>/dev/null | head -1 -# Prefer explicit build_id from gh pr checks URL tail +# Prefer explicit build_id from prow target_url (statuses API) or gh pr checks URL tail +BUILD_ID=$(gh api "repos/$REPO/statuses/$SHA" \ + --jq '.[] | select(.context=="ci/prow/unit") | .target_url' \ + | awk -F/ '{print $NF}' | head -1) +curl -sf "$BASE/pull-ci-openshift-lightspeed-operator-main-unit/$BUILD_ID/finished.json" | head -1

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.cursor/skills/investigate-ci-failure/SKILL.md around lines 563 - 565, The example uses a literal '*' in the HTTP URL (curl -sf "$BASE/pull-ci-openshift-lightspeed-operator-main-unit/*/finished.json"), which doesn't work because shells don't expand wildcards in remote URLs; replace the one-step wildcard example with a two-step approach: first retrieve the builds index (e.g., list the directory at "$BASE/pull-ci-openshift-lightspeed-operator-main-unit/"), determine the desired build_id (latest or explicit), then request that specific finished.json URL (i.e., construct and curl "$BASE/.../{build_id}/finished.json"); update the SKILL.md example to show these two steps and emphasize preferring an explicit build_id rather than a wildcard.

openshift-ci · 2026-05-27T11:11:55Z

@blublinsky: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

adding investigate-ci-failure skills

6b1da49

openshift-ci Bot requested review from joshuawilson and xrajesh May 27, 2026 10:55

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

blublinsky closed this May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding investigate-ci-failure skills#1659

adding investigate-ci-failure skills#1659
blublinsky wants to merge 1 commit into
openshift:mainfrom
blublinsky:ci-investigation

blublinsky commented May 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 27, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

openshift-ci Bot commented May 27, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 27, 2026

Uh oh!

coderabbitai Bot May 27, 2026

Uh oh!

coderabbitai Bot May 27, 2026

Uh oh!

openshift-ci Bot commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		curl -sf "$BASE/pull-ci-openshift-lightspeed-operator-main-unit/*/finished.json" 2>/dev/null \| head -1
		# Prefer explicit build_id from gh pr checks URL tail

Conversation

blublinsky commented May 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

openshift-ci Bot commented May 27, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

blublinsky commented May 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 27, 2026 •

edited

Loading