Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/daemon/drain: Clarify "cordon/uncordon" messages #2659

Merged
merged 1 commit into from Jul 19, 2021

Conversation

wking
Copy link
Member

@wking wking commented Jul 2, 2021

So we're symmetric for both cordoning and uncording.

Including the node name in the message is not strictly necessary, because it is already in the event's referenced object. But the message is pretty short otherwise, so repeating the node name inline seemed like it might be a good use of space.

@wking wking force-pushed the events-for-uncordon branch 2 times, most recently from 0708519 to 5e2ca0f Compare July 2, 2021 17:45
@kikisdeliveryservice
Copy link
Contributor

kikisdeliveryservice commented Jul 2, 2021

sinny already has an approved PR that is waiting to merge:
#2657

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 2, 2021
@kikisdeliveryservice kikisdeliveryservice requested review from kikisdeliveryservice and removed request for yuqi-zhang July 2, 2021 20:36
@@ -38,16 +45,19 @@ func (dn *Daemon) cordonOrUncordonNode(desired bool) error {
err := drain.RunCordonOrUncordon(dn.drainer, dn.node, desired)
if err != nil {
lastErr = err
glog.Infof("cordon/uncordon failed with: %v, retrying", err)
glog.Infof("%s failed with: %v, retrying", verb, err)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kikisdeliveryservice , any interest in this portion of the PR, or should we just close this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, it makes sense to keep this part.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rerolled to keep this, but leave eventing alone, in 5e2ca0f6 -> f302d7e4. The new commit message tries to motivate logging but not including events within cordonOrUncordonNode, but let me know if my attempt at that motivation doesn't make sense.

@wking wking changed the title pkg/daemon/drain: Event for all cordon changes pkg/daemon/drain: Clarify "cordon/uncordon" messages Jul 6, 2021
}

dn.logSystem("Node has been successfully %s", verb)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, I would say let caller decide both logging and sending event.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, rebased onto master and dropped this on-success log change with f302d7e4 -> 342c0135.

Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, just 1 change to match sentence.

pkg/daemon/drain.go Outdated Show resolved Hide resolved
pkg/daemon/drain.go Outdated Show resolved Hide resolved
I'm tracking the specific verb, so that logs and error messages can be
more specific than the generic "cordon/uncordon" they used before.

95ede9b (daemon: add log and event for uncordoning node,
2021-07-02, openshift#2657) added uncordon events, so now we have
cordonOrUncordonNode callers handling the eventing side of both
cordons and uncordons.  And one benefit of the caller eventing is that
they can provide additional context about why the node is being
(un)cordoned, like "to apply update", and we don't have that context
within cordonOrUncordonNode.  But events are best-effort, and maybe in
the future we will add a new cordonOrUncordonNode caller and forget to
event on success there.  That seems unlikely enough that it's not
worth a duplicate event created within cordonOrUncordonNode.  And
while I personally think it's worth logging the successful action from
within cordonOrUncordonNode so we don't have future callers who forget
to log the change, the maintainers would prefer to leave both logging
and eventing on success to the callers.

[1]: openshift#2659 (comment)
Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Trevor!

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 16, 2021
@kikisdeliveryservice
Copy link
Contributor

/assign @sinnykumari

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 17, 2021

@wking: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-metal-ipi da38bbb link /test e2e-metal-ipi
ci/prow/e2e-gcp-op-single-node da38bbb link /test e2e-gcp-op-single-node
ci/prow/e2e-aws-disruptive da38bbb link /test e2e-aws-disruptive
ci/prow/okd-e2e-aws da38bbb link /test okd-e2e-aws
ci/prow/e2e-aws-workers-rhel7 da38bbb link /test e2e-aws-workers-rhel7
ci/prow/e2e-ovn-step-registry da38bbb link /test e2e-ovn-step-registry

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sinnykumari
Copy link
Contributor

/lgtm
/hold cancel

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jul 19, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 19, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kikisdeliveryservice, sinnykumari, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [kikisdeliveryservice,sinnykumari]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@kikisdeliveryservice
Copy link
Contributor

/skip

@kikisdeliveryservice
Copy link
Contributor

are commands dead or am impatient?

@openshift-merge-robot openshift-merge-robot merged commit 011c4a9 into openshift:master Jul 19, 2021
@kikisdeliveryservice
Copy link
Contributor

HAHAHA. Impatient it is.

@wking wking deleted the events-for-uncordon branch July 20, 2021 02:20
sinnykumari pushed a commit to sinnykumari/machine-config-operator that referenced this pull request Dec 2, 2021
I'm tracking the specific verb, so that logs and error messages can be
more specific than the generic "cordon/uncordon" they used before.

95ede9b (daemon: add log and event for uncordoning node,
2021-07-02, openshift#2657) added uncordon events, so now we have
cordonOrUncordonNode callers handling the eventing side of both
cordons and uncordons.  And one benefit of the caller eventing is that
they can provide additional context about why the node is being
(un)cordoned, like "to apply update", and we don't have that context
within cordonOrUncordonNode.  But events are best-effort, and maybe in
the future we will add a new cordonOrUncordonNode caller and forget to
event on success there.  That seems unlikely enough that it's not
worth a duplicate event created within cordonOrUncordonNode.  And
while I personally think it's worth logging the successful action from
within cordonOrUncordonNode so we don't have future callers who forget
to log the change, the maintainers would prefer to leave both logging
and eventing on success to the callers.

[1]: openshift#2659 (comment)
sinnykumari added a commit to sinnykumari/machine-config-operator that referenced this pull request Dec 2, 2021
Before marking cordon/uncordon successful,
also check the node.Spec.Unschedulable has been set
correctly.
Also added additional log while performing cordon/uncordon

This is to help debug bugs such as
https://bugzilla.redhat.com/show_bug.cgi?id=2022387

Manual backport of PRs:
- openshift#2829
- openshift#2659
- openshift#2657
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants