Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/kubelet: improve the node informer sync check #99336

Merged
merged 1 commit into from Apr 22, 2021

Conversation

neolit123
Copy link
Member

@neolit123 neolit123 commented Feb 23, 2021

What this PR does / why we need it:

GetNode() is called in a lot of places including a hot loop in
fastStatusUpdateOnce. Having a poll in it is delaying
the kubelet /readyz status=200 report.

If a client is available attempt to wait for the sync to happen,
before starting the list watch for pods at the apiserver.
This is done to avoid caching of Node objects.

Some test data for a kubeadm setup that manages the kube-apiserver as a static pod -
waiting for the kube-apiserver and kubelet to report 200 at /healthz.

  • Without this patch: 72 seconds
  • With this patch: 12 seconds
    (it restores the old fast timing / behavior)

Which issue(s) this PR fixes:

xref kubernetes/kubeadm#2395
xref kubernetes/kubelet#23

Special notes for your reviewer:

Does this PR introduce a user-facing change?

kubelet: fixes a performance regression when waiting for a synchronization of the node list with the kube-apiserver

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 23, 2021
@neolit123
Copy link
Member Author

/kind regression
(^ debatable for self-hosting kube-apiservers)

/cc @derekwaynecarr @liggitt

@k8s-ci-robot k8s-ci-robot added kind/regression Categorizes issue or PR as related to a regression from a prior release. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Feb 23, 2021
@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 23, 2021
@dims
Copy link
Member

dims commented Feb 23, 2021

cc @adisky

@neolit123
Copy link
Member Author

given the problem shown in #99336 (comment)

i'm marking this as release blocking for 1.21.
if someone has objections to this, i'm open to discussion.
/priority critical-urgent
/milestone v1.21

in the meantime we are discussing and trying to fix it.

@k8s-ci-robot k8s-ci-robot added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Feb 23, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Feb 23, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Feb 23, 2021
@ehashman ehashman added this to Triage in SIG Node PR Triage Feb 23, 2021
@ehashman
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 23, 2021
@ehashman
Copy link
Member

flake #98856
/retest

@derekwaynecarr
Copy link
Member

derekwaynecarr commented Apr 21, 2021

just to correct my stmt above, static pods do go through kubelet pod admission (mirror pods do not), but by definition static pods need to work well with the kubelet local view of the node, so I think we do not have any risks.

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 21, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, neolit123

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 21, 2021
@liggitt
Copy link
Member

liggitt commented Apr 21, 2021

/lgtm

thanks for all the iterations and care

@neolit123
Copy link
Member Author

neolit123 commented Apr 21, 2021

thank you to all who helped with review and the discussion.
i can send backports shortly.

apparently there will be another 1.18 PATCH release, so this can be backported there too.

@neolit123
Copy link
Member Author

cherry picks:
#101343
#101344
#101345
#101346

^ need review / LGTM / approval.
diffs involved some adaptation around function argument changes and no structured logging in older versions.
hope i didn't make any mistakes.

all patches tested with a local cluster.
random observation: 1.21 kubelet + apiserver report 200 at healthz faster than e.g. 1.18.

@k8s-ci-robot k8s-ci-robot merged commit 232d930 into kubernetes:master Apr 22, 2021
SIG Node PR Triage automation moved this from Needs Reviewer to Done Apr 22, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Apr 22, 2021
@liggitt
Copy link
Member

liggitt commented Apr 23, 2021

picks reviewed, need approval from kubelet owners and then release branch acks

k8s-ci-robot added a commit that referenced this pull request Apr 28, 2021
…9336-origin-release-1.20

Automated cherry pick of #99336: pkg/kubelet: improve the node informer sync check
k8s-ci-robot added a commit that referenced this pull request Apr 28, 2021
…9336-origin-release-1.19

Automated cherry pick of #99336: pkg/kubelet: improve the node informer sync check
k8s-ci-robot added a commit that referenced this pull request Apr 28, 2021
…9336-origin-release-1.18

Automated cherry pick of #99336: pkg/kubelet: improve the node informer sync check
k8s-ci-robot added a commit that referenced this pull request Apr 28, 2021
…9336-origin-release-1.21

Automated cherry pick of #99336: pkg/kubelet: improve the node informer sync check
@pacoxu pacoxu mentioned this pull request Aug 6, 2021
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/regression Categorizes issue or PR as related to a regression from a prior release. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

None yet