OCPBUGS-84218: Fix units rollback if update failure#5876
OCPBUGS-84218: Fix units rollback if update failure#5876openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
Conversation
This changes fixes an issue that happened only during rollbacks that had updated existing units. The existing modified unit were never rolledback as the write logic was using the non-rollback arguments effectively writting the target units twice, instead of going back to the previous units. Signed-off-by: Pablo Rodriguez Nava <git@amail.pablintino.eu>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@pablintino: This pull request references Jira Issue OCPBUGS-84218, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughThe rollback logic in Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 12✅ Passed checks (12 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/daemon/update.go (1)
1206-1207: Consider centralizing rollback unit selection to reduce future regressions.The same reverse-diff snippet is duplicated in two paths, and argument order is semantically non-obvious. A helper would make intent explicit.
♻️ Suggested refactor
+func unitsToRestoreOnRollback(failedIgnConfig, previousIgnConfig ign3types.Config) []ign3types.Unit { + rollbackUnitDiff := ctrlcommon.GetChangedConfigUnitsByType(&failedIgnConfig, &previousIgnConfig) + return slices.Concat(rollbackUnitDiff.Added, rollbackUnitDiff.Updated) +} ... -rollbackUnitDiff := ctrlcommon.GetChangedConfigUnitsByType(&newIgnConfig, &oldIgnConfig) -if err := dn.updateFiles(newIgnConfig, oldIgnConfig, slices.Concat(rollbackUnitDiff.Added, rollbackUnitDiff.Updated), skipCertificateWrite, false); err != nil { +if err := dn.updateFiles(newIgnConfig, oldIgnConfig, unitsToRestoreOnRollback(newIgnConfig, oldIgnConfig), skipCertificateWrite, false); err != nil { ... } ... -rollbackUnitDiff := ctrlcommon.GetChangedConfigUnitsByType(&newIgnConfig, &oldIgnConfig) -if err := dn.updateFiles(newIgnConfig, oldIgnConfig, slices.Concat(rollbackUnitDiff.Added, rollbackUnitDiff.Updated), false, false); err != nil { +if err := dn.updateFiles(newIgnConfig, oldIgnConfig, unitsToRestoreOnRollback(newIgnConfig, oldIgnConfig), false, false); err != nil { ... }Also applies to: 1427-1428
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/daemon/update.go` around lines 1206 - 1207, The duplicated reverse-diff logic should be extracted into a single helper to make intent explicit and avoid argument-order mistakes: create a function (e.g., buildRollbackUnits or selectRollbackUnits) that accepts oldIgnConfig, newIgnConfig and returns the concat of rollback unit names (using ctrlcommon.GetChangedConfigUnitsByType and slices.Concat internally) and use it in both call sites instead of duplicating the snippet; then call dn.updateFiles(newIgnConfig, oldIgnConfig, helper(...), skipCertificateWrite, false) so the selection logic is centralized and the meaning of the unit list is unambiguous.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/daemon/update.go`:
- Around line 1206-1207: The duplicated reverse-diff logic should be extracted
into a single helper to make intent explicit and avoid argument-order mistakes:
create a function (e.g., buildRollbackUnits or selectRollbackUnits) that accepts
oldIgnConfig, newIgnConfig and returns the concat of rollback unit names (using
ctrlcommon.GetChangedConfigUnitsByType and slices.Concat internally) and use it
in both call sites instead of duplicating the snippet; then call
dn.updateFiles(newIgnConfig, oldIgnConfig, helper(...), skipCertificateWrite,
false) so the selection logic is centralized and the meaning of the unit list is
unambiguous.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 1b0af6c2-0d56-409f-bd9b-5f18f649ef9d
📒 Files selected for processing (1)
pkg/daemon/update.go
|
/jira refresh |
|
@yuqi-zhang: This pull request references Jira Issue OCPBUGS-84218, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Scheduling tests matching the |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pablintino, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Verified usin IPI on AWS with proxy
A first one to create the elements that we will modify later in the test A second one modifying those values and forcing a failure in the image (not pullable) to trigger the rollback We could verify that the rollback was properly working and it was not failing and blocking the workflow in the confidrif checks. We found that likely the password and sshkeys are not being restored. However, it doesn't make the config drift fail, and it doesn't block the workflow. This issue can be fixed in another PR. Currently running the regression tests related to "units" and "config drift". When those test cases are passed we can add the verified label. PS: all "units" and "config drift" test cases passed. We added the verified label. Thank you for the quick fix!! |
|
/retest-required /verified by @sergiordlr |
|
@sergiordlr: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
|
/override ci/prow/e2e-hypershift |
|
@pablintino: Overrode contexts on behalf of pablintino: ci/prow/e2e-hypershift DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test unit |
|
@pablintino: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@pablintino: Jira Issue Verification Checks: Jira Issue OCPBUGS-84218 Jira Issue OCPBUGS-84218 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Fix included in release 5.0.0-0.nightly-2026-04-25-015310 |
Fixes: #OCPBUGS-84218
- What I did
This changes fixes an issue that happened only during rollbacks that had updated existing units. The existing modified unit were never rolledback as the write logic was using the non-rollback arguments effectively writting the target units twice, instead of going back to the previous units.
- How to verify it
(credit goes to @sergiordlr)
- Description for the changelog
Fix the rollback logic of systemd units in case of a mid-update error.
Summary by CodeRabbit
Bug Fixes