-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 2012969: emit kube events for skipped and upgraded OSes #3153
Bug 2012969: emit kube events for skipped and upgraded OSes #3153
Conversation
@cheesesashimi: This pull request references Bugzilla bug 2012969, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/lgtm |
/bugzilla refresh |
@kikisdeliveryservice: This pull request references Bugzilla bug 2012969, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Bugzilla (rioliu@redhat.com), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest-required Please review the full test history for this PR and help us cut down flakes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file changed out from under you due to changes in the daemon in b6cfd42
/lgtm cancel
pkg/daemon/update.go
Outdated
@@ -379,6 +379,9 @@ func (dn *CoreOSDaemon) applyOSChanges(mcDiff machineConfigDiff, oldConfig, newC | |||
MCDPivotErr.WithLabelValues(nodeName, newConfig.Spec.OSImageURL, err.Error()).SetToCurrentTime() | |||
return err | |||
} | |||
dn.recorder.Eventf(getNodeRef(dn.node), corev1.EventTypeNormal, "OSUpgradeApplied", "OS upgrade applied; new MachineConfig (%s) has new OS (%s)", newConfig.Name, newConfig.Spec.OSImageURL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no more recorder so we need something like:
dn.recorder.Eventf(getNodeRef(dn.node), corev1.EventTypeNormal, "OSUpgradeApplied", "OS upgrade applied; new MachineConfig (%s) has new OS (%s)", newConfig.Name, newConfig.Spec.OSImageURL) | |
dn.nodeWriter.Eventf(corev1.EventTypeNormal, "OSUpgradeApplied", "OS upgrade applied; new MachineConfig (%s) has new OS (%s)", newConfig.Name, newConfig.Spec.OSImageURL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh that's right! Good catch! I'll make the changes.
pkg/daemon/update.go
Outdated
@@ -379,6 +379,9 @@ func (dn *CoreOSDaemon) applyOSChanges(mcDiff machineConfigDiff, oldConfig, newC | |||
MCDPivotErr.WithLabelValues(nodeName, newConfig.Spec.OSImageURL, err.Error()).SetToCurrentTime() | |||
return err | |||
} | |||
dn.recorder.Eventf(getNodeRef(dn.node), corev1.EventTypeNormal, "OSUpgradeApplied", "OS upgrade applied; new MachineConfig (%s) has new OS (%s)", newConfig.Name, newConfig.Spec.OSImageURL) | |||
} else { | |||
dn.recorder.Eventf(getNodeRef(dn.node), corev1.EventTypeNormal, "OSUpgradeSkipped", "OS upgrade skipped; new MachineConfig (%s) has same OS (%s) as old MachineConfig (%s)", newConfig.Name, newConfig.Spec.OSImageURL, oldConfig.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
dn.recorder.Eventf(getNodeRef(dn.node), corev1.EventTypeNormal, "OSUpgradeSkipped", "OS upgrade skipped; new MachineConfig (%s) has same OS (%s) as old MachineConfig (%s)", newConfig.Name, newConfig.Spec.OSImageURL, oldConfig.Name) | |
dn.nodeWriter.Eventf(corev1.EventTypeNormal, "OSUpgradeSkipped", "OS upgrade skipped; new MachineConfig (%s) has same OS (%s) as old MachineConfig (%s)", newConfig.Name, newConfig.Spec.OSImageURL, oldConfig.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, instead of "same OS ", it should be "same OS image"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. That sounds better.
5a4d04b
to
758385c
Compare
/retest-required |
weird everything is failing /retest-required |
758385c
to
b5139a1
Compare
Found the source of the failures: May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: Changes queued for next boot. Run "systemctl reboot" to start a reboot
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: I0519 16:17:25.268339 2149 update.go:2002] Removing SIGTERM protection
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: panic: runtime error: invalid memory address or nil pointer dereference
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1796d0a]
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: goroutine 1 [running]:
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: github.com/openshift/machine-config-operator/pkg/daemon.(*CoreOSDaemon).applyOSChanges(0xc0007534f0, {0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 0xc0004ba680, ...)
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: /go/src/github.com/openshift/machine-config-operator/pkg/daemon/update.go:382 +0x68a
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: github.com/openshift/machine-config-operator/pkg/daemon.(*Daemon).update(0xc0001ca600, 0xc00030e374, 0xc000102680)
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: /go/src/github.com/openshift/machine-config-operator/pkg/daemon/update.go:573 +0xd1b
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: github.com/openshift/machine-config-operator/pkg/daemon.(*Daemon).RunFirstbootCompleteMachineconfig(0xc0001ca600)
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: /go/src/github.com/openshift/machine-config-operator/pkg/daemon/daemon.go:852 +0x172
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: main.runFirstBootCompleteMachineConfig(0xc0005dfce0, {0xc0005dfd08, 0x502d80, 0xc000092050})
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: /go/src/github.com/openshift/machine-config-operator/cmd/machine-config-daemon/firstboot_complete_machineconfig.go:39 +0x11d
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: main.executeFirstbootCompleteMachineConfig(0x2e0c4e0, {0x2e583a8, 0x0, 0x0})
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: /go/src/github.com/openshift/machine-config-operator/cmd/machine-config-daemon/firstboot_complete_machineconfig.go:44 +0xdd
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: github.com/spf13/cobra.(*Command).execute(0x2e0c4e0, {0x2e583a8, 0x0, 0x0})
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: /go/src/github.com/openshift/machine-config-operator/vendor/github.com/spf13/cobra/command.go:860 +0x5f8
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: github.com/spf13/cobra.(*Command).ExecuteC(0x2e0c760)
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: /go/src/github.com/openshift/machine-config-operator/vendor/github.com/spf13/cobra/command.go:974 +0x3bc
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: github.com/spf13/cobra.(*Command).Execute(...)
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: /go/src/github.com/openshift/machine-config-operator/vendor/github.com/spf13/cobra/command.go:902
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: k8s.io/component-base/cli.Run(0x2e0c760)
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/component-base/cli/run.go:105 +0x389
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: main.main()
May 19 16:17:25 ip-10-0-131-92 machine-config-daemon[2149]: /go/src/github.com/openshift/machine-config-operator/cmd/machine-config-daemon/main.go:28 +0x25
May 19 16:17:25 ip-10-0-131-92 systemd[1]: machine-config-daemon-firstboot.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
May 19 16:17:25 ip-10-0-131-92 systemd[1]: machine-config-daemon-firstboot.service: Failed with result 'exit-code'.
May 19 16:17:25 ip-10-0-131-92 systemd[1]: Failed to start Machine Config Daemon Firstboot.
May 19 16:17:25 ip-10-0-131-92 systemd[1]: machine-config-daemon-firstboot.service: Consumed 13.342s CPU time During firstboot, For posterity, how I found this was:
|
b5139a1
to
61c9b11
Compare
To provide better information to what the MCD is doing, the MCD should emit an event whenever a MachineConfig specifies that a new OS will be installed as well as whenever it does not.
61c9b11
to
73d6584
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/test e2e-agnostic-upgrade
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cheesesashimi, kikisdeliveryservice, sinnykumari The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest-required Please review the full test history for this PR and help us cut down flakes. |
@cheesesashimi: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@cheesesashimi: All pull requests linked via external trackers have merged: Bugzilla bug 2012969 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@cheesesashimi do we need to backport it 4.10 since bug was reported during 4.10 timeframe? |
To provide better information to what the MCD is doing, the MCD should emit an event whenever a MachineConfig specifies that a new OS will be installed as well as whenever it does not.
- What I did
I added two events,
OSUpgradeApplied
andOSUpgradeSkipped
which are emitted whenever the MCD determines an OS upgrade is required or not.- How to verify it
$ oc logs -n openshift-machine-config-operator -c machine-config-daemon -f pod/machine-config-daemon-8hqvd
.-v4
) flag:I0518 19:51:24.703522 2623 event.go:285] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-0-130-248.ec2.internal", UID:"cdac4d45-1c5f-4e31-b106-b9d76583d624", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'NoOSUpgradeAvailable' New MachineConfig (rendered-infra-d4288fbd661c4a61fbc31a4614a9271f) has same OS (quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e42eb8c6aa523511d928b3dde3a5e0251e6baaba0b60ed00b431e17698556777) as old MachineConfig (rendered-infra-f02bc763e0574470ff44e34981d34e12); skipping OS upgrade
For the upgrade cases:
.spec.osImageURL
value.$ oc logs -n openshift-machine-config-operator -c machine-config-daemon -f pod/machine-config-daemon-8hqvd
.- Description for the changelog
Emit a Kube event when an OS upgrade is applied or skipped