NO-JIRA: tls-scanner: Fix exit code handling, lingering failed pods, and clarify docs#78789
Conversation
|
Skipping CI for Draft Pull Request. |
|
@DavidHurta: This pull request references Jira Issue OCPBUGS-84968, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughSwitches the scanner container command from ChangesTLS Scanner Execution & Completion Signaling
PQC_CHECK Documentation Wording
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 12✅ Passed checks (12 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
|
/jira refresh |
|
/test all |
|
@DavidHurta: This pull request references Jira Issue OCPBUGS-84968, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@DavidHurta: This pull request references Jira Issue OCPBUGS-84968, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ci-operator/step-registry/tls/scanner/run/tls-scanner-run-commands.sh`:
- Around line 77-83: The script captures the scanner exit in SCAN_EXIT_CODE from
${PIPESTATUS[0]} but never propagates it (the exit is commented out), so change
the flow to preserve artifact collection yet exit with the scanner's code: after
touching /results/scan.done and the sleep 120 (or replace sleep with a
background wait if needed), add an explicit exit ${SCAN_EXIT_CODE} so the
container exits with the original scanner status; ensure you reference
SCAN_EXIT_CODE and PIPESTATUS exactly as used in the snippet.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 705b0f2f-a447-455d-b396-4a5f45dbd733
📒 Files selected for processing (1)
ci-operator/step-registry/tls/scanner/run/tls-scanner-run-commands.sh
dbee2ce to
d23a0cb
Compare
|
@DavidHurta let's coordinate on this cause I just put up a similar fix here: #78781 though your pipefail + PIPESTATUS[0] is an improvement that is worth keeping I think. My PR fixes the exit code issue for the artifacts and uses |
|
Hi @smith-xyz! Haha, I was looking for an existing similar PR yesterday, and there wasn't any 😁 |
|
@DavidHurta: This pull request references Jira Issue OCPBUGS-84968. The bug has been updated to no longer refer to the pull request using the external bug tracker. All external bug links have been closed. The bug has been moved to the NEW state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@DavidHurta: This pull request references Jira Issue OCPBUGS-84968, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Missclick 🙌 |
|
@smith-xyz, sure, let's use your PR to fix the artifact collection. I will then repurpose this PR to propose changes for the remaining potential issues, which this PR aimed to fix/address. |
|
[REHEARSALNOTIFIER]
A total of 36 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
@DavidHurta: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/unhold |
|
/pj-rehearse pull-ci-openshift-tls-scanner-release-5.0-tls13-adherence |
|
@DavidHurta: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
|
||
| oc wait --for=jsonpath='{.status.phase}'=Succeeded pod/tls-scanner -n "${NAMESPACE}" --timeout=4h || { | ||
| echo "Scanner did not complete successfully" | ||
| oc wait --for=jsonpath='{.status.phase}'=Succeeded pod/tls-scanner -n "${NAMESPACE}" --timeout=10m || { |
There was a problem hiding this comment.
This was very needed before the ad819e6 changes. Now not so much, but the 4 hour timeout seems excessive and is still prone to the lingering pods.
|
@coderabbitai summary |
✅ Actions performedSummary regeneration triggered. |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
ci-operator/step-registry/tls/scanner/run/tls-scanner-run-commands.sh (1)
81-87:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winScanner failures are still masked because
SCAN_EXIT_CODEis never propagated.
SCAN_EXIT_CODEis captured at Line 81, but Line 86-Line 87 keeps the exit commented, so the pod still exits successfully aftersleep 120even when the scan fails.Proposed fix
SCAN_EXIT_CODE=\${PIPESTATUS[0]} echo "Scan complete. Exit code: \${SCAN_EXIT_CODE}" | tee -a /results/output.log touch /results/scan.done # Keep pod alive for artifact collection sleep 120 - # We are intentionally ignoring the scanner exit code for the moment - # exit \${SCAN_EXIT_CODE} + exit \${SCAN_EXIT_CODE}Run this read-only check to confirm current behavior in-branch:
#!/bin/bash set -euo pipefail FILE="ci-operator/step-registry/tls/scanner/run/tls-scanner-run-commands.sh" echo "Checking scanner exit-code handling block..." sed -n '76,90p' "$FILE" echo echo "Verifying capture vs propagation markers..." rg -n 'PIPESTATUS\[0\]|scan\.done|exit \\\$\{SCAN_EXIT_CODE\}|intentionally ignoring' "$FILE"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ci-operator/step-registry/tls/scanner/run/tls-scanner-run-commands.sh` around lines 81 - 87, The script captures the scanner result into SCAN_EXIT_CODE but never propagates it, so un-comment and restore exit propagation after the sleep (or explicitly call "exit ${SCAN_EXIT_CODE}") so the pod returns the scanner's status; update the block around SCAN_EXIT_CODE, the touch /results/scan.done line, and the sleep 120 line to ensure the script ends with exit ${SCAN_EXIT_CODE} (or equivalent explicit exit) instead of a commented-out exit, so failures are not masked.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@ci-operator/step-registry/tls/scanner/run/tls-scanner-run-commands.sh`:
- Around line 81-87: The script captures the scanner result into SCAN_EXIT_CODE
but never propagates it, so un-comment and restore exit propagation after the
sleep (or explicitly call "exit ${SCAN_EXIT_CODE}") so the pod returns the
scanner's status; update the block around SCAN_EXIT_CODE, the touch
/results/scan.done line, and the sleep 120 line to ensure the script ends with
exit ${SCAN_EXIT_CODE} (or equivalent explicit exit) instead of a commented-out
exit, so failures are not masked.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: a538e8d8-c9a9-43e7-a332-a3bdc63bc43e
📒 Files selected for processing (2)
ci-operator/step-registry/tls/scanner/run/tls-scanner-run-commands.shci-operator/step-registry/tls/scanner/run/tls-scanner-run-ref.yaml
| @@ -80,6 +80,7 @@ spec: | |||
| --log-file /results/scan.log 2>&1 | tee /results/output.log | |||
| SCAN_EXIT_CODE=\${PIPESTATUS[0]} | |||
| echo "Scan complete. Exit code: \${SCAN_EXIT_CODE}" | tee -a /results/output.log | |||
There was a problem hiding this comment.
| echo "Scan complete. Exit code: \${SCAN_EXIT_CODE}" | tee -a /results/output.log | |
| echo "Scan complete. Exit code: \${SCAN_EXIT_CODE}" |
|
/pj-rehearse pull-ci-openshift-tls-scanner-main-smoke-tls |
|
@smith-xyz: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/lgtm |
|
@smith-xyz: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: DavidHurta, smith-xyz The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@DavidHurta: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
The PR aims to fix exit code handling, lingering failed pods, and clarify documentation.
A few things I have found during integrating the steps into our component's CI. I am happy to address any feedback. Happy to drop any commits.