Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1861026: daemon: perform other rpm-ostree operations after OS rebase #2029

Merged

Conversation

sinnykumari
Copy link
Contributor

We perform operations like rt-kernel switch from packages
available in the latest machine-os-content to which we are going
to rebase OS. It makes sense to overlay additional
packages after performing OS rebase. This also prevents issues
like missing dependencies which may be now already part of BaseOS.

Related: https://bugzilla.redhat.com/show_bug.cgi?id=1861026

@openshift-ci-robot
Copy link
Contributor

@sinnykumari: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

daemon: perform other rpm-ostree operations after OS rebase

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 27, 2020
@sinnykumari
Copy link
Contributor Author

Going to test it locally to ensure it fixes the bug
/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 27, 2020
Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return err
}

defer func() {
if retErr != nil {
if err := dn.switchKernel(newConfig, oldConfig); err != nil {
retErr = errors.Wrapf(retErr, "error rolling back Real time Kernel %v", err)
if err := dn.applyOSChanges(newConfig, oldConfig); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly related to this change but we're now grouping the "defer rollback" of the kargs/kernel type switch - I was going to comment about how this might cause us to incorrectly roll kargs back if we failed at the kernel type switch but then I realized:

The only defer rollback we need is the removePendingDeployment() - that reverts everything.

(I think that defer will run last and hence win, so this code is fine as is)

Comment on lines 114 to 115
args := []string{"cleanup", "-p"}
_, err := runGetOut("rpm-ostree", args...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
args := []string{"cleanup", "-p"}
_, err := runGetOut("rpm-ostree", args...)
_, err := runGetOut("rpm-ostree", "cleanup", "-p")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

We perform operations like rt-kernel switch from packages
available in the latest machine-os-content to which we are going
to rebase OS. It makes sense to overlay additional
packages after performing OS rebase. This also prevents issues
like missing dependencies which may be now already part of BaseOS.

Related: https://bugzilla.redhat.com/show_bug.cgi?id=1861026
@sinnykumari
Copy link
Contributor Author

looks fine locally, I miss #1766 and at the same time I am happy that at least we have this in 4.6+

Should be good to go.

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 27, 2020
@sinnykumari sinnykumari changed the title daemon: perform other rpm-ostree operations after OS rebase Bug 1861026: daemon: perform other rpm-ostree operations after OS rebase Aug 27, 2020
@openshift-ci-robot openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Aug 27, 2020
@openshift-ci-robot
Copy link
Contributor

@sinnykumari: This pull request references Bugzilla bug 1861026, which is invalid:

  • expected the bug to target the "4.5.z" release, but it targets "4.6.0" instead
  • expected Bugzilla bug 1861026 to depend on a bug targeting a release in 4.6.0, 4.6.z and in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but no dependents were found

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1861026: daemon: perform other rpm-ostree operations after OS rebase

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sinnykumari
Copy link
Contributor Author

@ashcrow This doesn't have corresponding 4.6 bug because we already fixed this as part of extensions epic #1941

@ashcrow
Copy link
Member

ashcrow commented Aug 27, 2020

@ashcrow This doesn't have corresponding 4.6 bug because we already fixed this as part of extensions epic #1941

The referenced bug is set to target 4.6. Should it target 4.5 instead? I'm happy to override the check if 4.6 was fixed due to a change in how things worked in extensions.

@sinnykumari
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@sinnykumari: This pull request references Bugzilla bug 1861026, which is invalid:

  • expected Bugzilla bug 1861026 to depend on a bug targeting a release in 4.6.0, 4.6.z and in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but no dependents were found

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ashcrow ashcrow added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Aug 27, 2020
@ashcrow
Copy link
Member

ashcrow commented Aug 27, 2020

Overriding BZ check since 4.6 was fixed due to underlying, non bug related changes.

etcPivotFile = "/etc/pivot/image-pullspec"
runPivotRebootFile = "/run/pivot/reboot-needed"
// etcPivotFile is used for 4.1 bootimages and is how the MCD
// currently communicated with this service.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Unclear as to if this means now or previously. Currently would indicate it's in use now while communicated would indicate the past.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sinnykumari
Copy link
Contributor Author

tests are passing, can we have lgtm :)

@cgwalters
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 27, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ashcrow, cgwalters, sinnykumari

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [ashcrow,cgwalters,sinnykumari]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sinnykumari
Copy link
Contributor Author

/cherry-pick release-4.4

@openshift-cherrypick-robot

@sinnykumari: once the present PR merges, I will cherry-pick it on top of release-4.4 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@crawford crawford added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Aug 27, 2020
@openshift-merge-robot openshift-merge-robot merged commit 0157b68 into openshift:release-4.5 Aug 27, 2020
@openshift-ci-robot
Copy link
Contributor

@sinnykumari: All pull requests linked via external trackers have merged:

Bugzilla bug 1861026 has been moved to the MODIFIED state.

In response to this:

Bug 1861026: daemon: perform other rpm-ostree operations after OS rebase

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@sinnykumari: #2029 failed to apply on top of branch "release-4.4":

Applying: daemon: perform other rpm-ostree operations after OS rebase
Using index info to reconstruct a base tree...
M	pkg/daemon/daemon.go
M	pkg/daemon/rpm-ostree.go
M	pkg/daemon/update.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/daemon/update.go
CONFLICT (content): Merge conflict in pkg/daemon/update.go
Auto-merging pkg/daemon/rpm-ostree.go
Auto-merging pkg/daemon/daemon.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 daemon: perform other rpm-ostree operations after OS rebase
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cgwalters
Copy link
Member

One note about this: Anyone affected by this will need to apply this update before they try an OS update. From this comment:

@sinny, any plans to backport it into 4.4? I see the same issue during minor upgrades of OCP 4.4.5 -> 4.4.17 (node with RT kernel becomes degraded)

Since 4.4.17 has already shipped this requires manual intervention:

  • Create a custom release image with new MCO from this patch based on e.g. 4.4.5
  • Upgrade to custom
  • Upgrade to 4.4.18 or a new release that still has this patch

sinnykumari added a commit to sinnykumari/installer that referenced this pull request Sep 2, 2020
…s after OS rebase

This fixes the issues which we have today during cluster install
involving multiple rpm-ostree operations such as both OS rebase and
rt-kernel switch.
PR openshift/machine-config-operator#2029 fixes
the issue for day2 and we need to update boot images to include
machine-config-daemon containing the fixes for day1.

boot images update contains machine-config-daemon-4.5.0-202008280032.p0.git.2558.a93c8dc.el8
which contains the necessary fixes.

Used:

$ hack/update-rhcos-bootimage.py https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.5/45.82.202008280129-0/x86_64/meta.json amd64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants