New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/daemon/drain: Clarify "cordon/uncordon" messages #2659
pkg/daemon/drain: Clarify "cordon/uncordon" messages #2659
Conversation
0708519
to
5e2ca0f
Compare
sinny already has an approved PR that is waiting to merge: /hold |
@@ -38,16 +45,19 @@ func (dn *Daemon) cordonOrUncordonNode(desired bool) error { | |||
err := drain.RunCordonOrUncordon(dn.drainer, dn.node, desired) | |||
if err != nil { | |||
lastErr = err | |||
glog.Infof("cordon/uncordon failed with: %v, retrying", err) | |||
glog.Infof("%s failed with: %v, retrying", verb, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kikisdeliveryservice , any interest in this portion of the PR, or should we just close this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, it makes sense to keep this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rerolled to keep this, but leave eventing alone, in 5e2ca0f6 -> f302d7e4. The new commit message tries to motivate logging but not including events within cordonOrUncordonNode
, but let me know if my attempt at that motivation doesn't make sense.
5e2ca0f
to
f302d7e
Compare
pkg/daemon/drain.go
Outdated
} | ||
|
||
dn.logSystem("Node has been successfully %s", verb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, I would say let caller decide both logging and sending event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, rebased onto master and dropped this on-success log change with f302d7e4 -> 342c0135.
f302d7e
to
342c013
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this, just 1 change to match sentence.
I'm tracking the specific verb, so that logs and error messages can be more specific than the generic "cordon/uncordon" they used before. 95ede9b (daemon: add log and event for uncordoning node, 2021-07-02, openshift#2657) added uncordon events, so now we have cordonOrUncordonNode callers handling the eventing side of both cordons and uncordons. And one benefit of the caller eventing is that they can provide additional context about why the node is being (un)cordoned, like "to apply update", and we don't have that context within cordonOrUncordonNode. But events are best-effort, and maybe in the future we will add a new cordonOrUncordonNode caller and forget to event on success there. That seems unlikely enough that it's not worth a duplicate event created within cordonOrUncordonNode. And while I personally think it's worth logging the successful action from within cordonOrUncordonNode so we don't have future callers who forget to log the change, the maintainers would prefer to leave both logging and eventing on success to the callers. [1]: openshift#2659 (comment)
342c013
to
da38bbb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Trevor!
/assign @sinnykumari |
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kikisdeliveryservice, sinnykumari, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
3 similar comments
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
2 similar comments
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/skip |
are commands dead or am impatient? |
HAHAHA. Impatient it is. |
I'm tracking the specific verb, so that logs and error messages can be more specific than the generic "cordon/uncordon" they used before. 95ede9b (daemon: add log and event for uncordoning node, 2021-07-02, openshift#2657) added uncordon events, so now we have cordonOrUncordonNode callers handling the eventing side of both cordons and uncordons. And one benefit of the caller eventing is that they can provide additional context about why the node is being (un)cordoned, like "to apply update", and we don't have that context within cordonOrUncordonNode. But events are best-effort, and maybe in the future we will add a new cordonOrUncordonNode caller and forget to event on success there. That seems unlikely enough that it's not worth a duplicate event created within cordonOrUncordonNode. And while I personally think it's worth logging the successful action from within cordonOrUncordonNode so we don't have future callers who forget to log the change, the maintainers would prefer to leave both logging and eventing on success to the callers. [1]: openshift#2659 (comment)
Before marking cordon/uncordon successful, also check the node.Spec.Unschedulable has been set correctly. Also added additional log while performing cordon/uncordon This is to help debug bugs such as https://bugzilla.redhat.com/show_bug.cgi?id=2022387 Manual backport of PRs: - openshift#2829 - openshift#2659 - openshift#2657
So we're symmetric for both cordoning and uncording.
Including the node name in the message is not strictly necessary, because it is already in the event's referenced object. But the message is pretty short otherwise, so repeating the node name inline seemed like it might be a good use of space.