Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet waits for node lister to sync at least once #94087

Merged
merged 1 commit into from Jan 12, 2021

Conversation

derekwaynecarr
Copy link
Member

What type of PR is this?
/kind bug

What this PR does / why we need it:
If node has a kube client, it will wait to ensure the node lister has synced at least once before trusting its data.

Which issue(s) this PR fixes:
Fixes #92067

Special notes for your reviewer:
still under discussion where to best wait (in getter, or in main kubelet startup)

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 18, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 18, 2020
@derekwaynecarr
Copy link
Member Author

@sjenning idea is to do similar as seen here, just need to work out pro/con for where we wait for node lister to sync at least once.

@derekwaynecarr
Copy link
Member Author

after following up with @deads2k seems like it would be a good time to move away from list/watch and to a node informer with filter, then we can just use regular HasSynced function. will iterate, just trying to determine the best location to wait for the node informer to sync at least once and not cause issues (do not want to do it in the kubelet_getters if possible)

@derekwaynecarr derekwaynecarr force-pushed the node-sync-once branch 2 times, most recently from b510d22 to 5a39f16 Compare August 18, 2020 19:49
@sjenning
Copy link
Contributor

this might also fix #93338

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 18, 2020
@derekwaynecarr derekwaynecarr force-pushed the node-sync-once branch 2 times, most recently from 3de6020 to 7847919 Compare August 18, 2020 21:08
@derekwaynecarr
Copy link
Member Author

@sjenning take a look, the idea is to move the predicate function to call GetNode, which if we have a kubeclient ensures we synced the node lister at least once (otherwise, it errors). I am still trying to think through an edge case for the scheduler scenario that we may miss, but this would ensure that we do not fall back to the kubelet "initialNode" and stop tripping up scheduler.

@@ -245,7 +256,7 @@ func (kl *Kubelet) GetNode() (*v1.Node, error) {
// zero capacity, and the default labels.
func (kl *Kubelet) getNodeAnyWay() (*v1.Node, error) {
if kl.kubeClient != nil {
if n, err := kl.nodeLister.Get(string(kl.nodeName)); err == nil {
if n, err := kl.GetNode(); err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure this whole function collapses to just return kl.GetNode() since GetNode() contains the standalone logic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetNode does not guarantee a response in case where kubeclient is not nil, whereas getNodeAnyway does (it falls back to initial node).

@@ -235,6 +237,15 @@ func (kl *Kubelet) GetNode() (*v1.Node, error) {
if kl.kubeClient == nil {
return kl.initialNode(context.TODO())
}
// if we have a valid kube client, we wait up to 5s for initial lister to sync
if !kl.nodeHasSynced() {
err := wait.PollImmediate(time.Second, 5*time.Second, func() (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

choose a weird number so we can find this later if it comes up. How about 8 seconds

@derekwaynecarr derekwaynecarr force-pushed the node-sync-once branch 2 times, most recently from 524527b to ee3f26d Compare August 20, 2020 17:57
@derekwaynecarr
Copy link
Member Author

/retest

@derekwaynecarr
Copy link
Member Author

i am fine removing my own hold on this as its possible what i saw was related to another unreliable aspect of CI at that time.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 12, 2021
@k8s-ci-robot k8s-ci-robot merged commit 4e93dbc into kubernetes:master Jan 12, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Jan 12, 2021
k8s-ci-robot added a commit that referenced this pull request Feb 12, 2021
…m-release-1.18

Cherry pick of #94087 upstream release 1.18
k8s-ci-robot added a commit that referenced this pull request Feb 12, 2021
…87-upstream-release-1.19

Automated cherry pick of #94087: node sync at least once
k8s-ci-robot added a commit that referenced this pull request Feb 12, 2021
…87-upstream-release-1.20

Automated cherry pick of #94087: node sync at least once
klog.Infof("kubelet nodes sync")
return true
}
klog.Infof("kubelet nodes not sync")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this message is posted a lot in the logs, i think it can be omitted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 447 klog.Infof("kubelet nodes sync") is removed in #98137

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +240 to +248
// if we have a valid kube client, we wait for initial lister to sync
if !kl.nodeHasSynced() {
err := wait.PollImmediate(time.Second, maxWaitForAPIServerSync, func() (bool, error) {
return kl.nodeHasSynced(), nil
})
if err != nil {
return nil, fmt.Errorf("nodes have not yet been read at least once, cannot construct node object")
}
}
Copy link
Member

@neolit123 neolit123 Feb 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this poll is problematic.
GetNode is called in a hot loop, which means that the poll is present on each call.

the sequence looks like:

  • GetNode()
    • Poll for nodeHasSynced()
    • Poll ....
  • GetNode()
    • Poll for nodeHasSynced()
    • Poll ....

if there is a valid client, arguably the informer sync check should be outside of this loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be enough to check HasSynced once before creating the Kubelet object?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would only sync after the api server is up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the API server is down, there is nothing to sync. Am I missing something?

Copy link
Member

@neolit123 neolit123 Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW we are discussion how to improve this in:
#99336

if the kubelet is managing the first server instance in the cluster as a static pod and if the HasSynced check is before the kubelet object creation this means for such a kubelet instance the single HasSynced check will always be false.

maybe that's what we want, given this is the first node in the cluster and there is no need to sync it on the first kubelet run (ever).

for subsequent runs from the same kubelet or additional kubelet the check should pass if there is an api server.

@neolit123
Copy link
Member

the Minikube maintainers (cc @medyagh) found that this change introduced a performance regression.
when client credentials are fed to a primary kubelet in a cluster (--kubeconfig), but there are no other nodes or an API server running yet (because this primary kubelet will boot it from a static pod), this blocks for extra ~40 seconds.

i can see the informer wait as something desired but i think the logic here can be improved.
cc @derekwaynecarr @deads2k @liggitt

we also backported this change in the support skew and since we don't have direct performance tests, nobody saw it until now.

@snowplayfire
Copy link
Contributor

If these pods are re-scheduled automatically, can the problem be solved? @derekwaynecarr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

Kubernetes scheduler spams cluster with pods in NodeAffinity status