Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix init order during starup for dynamic kubelet config #83184

Merged
merged 1 commit into from
Oct 10, 2019

Conversation

odinuge
Copy link
Member

@odinuge odinuge commented Sep 26, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #83183

Special notes for your reviewer:
This is just a "proof of concept" fix in order to make it easier to understand the problem. From what I understand it does work, and all the tests pass, but I don't think the is the correct solution long term. If this makes sense and works, I guess
we can use this fix on the v1.16 branch to mitigate the problem.

The probelm is essentially that the sync function changed in the PR is executed before the kubelet is started and the metrics are "Created" with the way #81534 works.

If the metrics ain't created, the values will not be registered, and the
metrics will not be visible in the metric endpoint.

Feel free to change the priority; but since this breaks all the related tests, and the metrics are missing, I guess this is ok
/priority critical-urgent
/sig node

Does this PR introduce a user-facing change?:

Fix error where metrics related to dynamic kubelet config isn't registered

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/node Categorizes an issue or PR as relevant to SIG Node. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/kubelet labels Sep 26, 2019
@odinuge
Copy link
Member Author

odinuge commented Sep 26, 2019

As you see here, the node e2e tests (running with FOCUS="DynamicKubeletConfig" SKIP="") pass (note that they take 2hours to run 😅).
Screenshot from 2019-09-25 19-30-25

Non of them pass in master: https://testgrid.k8s.io/sig-node-kubelet#node-kubelet-serial

/cc @alejandrox1

@liggitt
Copy link
Member

liggitt commented Sep 26, 2019

/assign @logicalhan

@logicalhan
Copy link
Member

If I understand this correctly, then the metrics do show up eventually, right? Once the kubelet is created, then then metrics should reflect the syncs.

@odinuge
Copy link
Member Author

odinuge commented Sep 27, 2019

@logicalhan If I understand this correctly, then the metrics do show up eventually, right? Once the kubelet is created, then then metrics should reflect the syncs.

I think #83184 (comment) answers this question.
Also, the tests using the metrics in master still fails, so it looks like they never appear.

@logicalhan
Copy link
Member

Why can't we just move https://github.com/kubernetes/kubernetes/blob/master/cmd/kubelet/app/server.go#L593-L598 down to below where the kubelet is actually initialized?

If the metrics ain't created, the values will not be registered, and the
metrics will not be visible in the metric endpoint.

Therefore move init of dynamic kubelet config below the startup of the
kubelet server (and the init of metrics).
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 28, 2019
@odinuge
Copy link
Member Author

odinuge commented Sep 28, 2019

Thanks for looking at it @logicalhan!

Why can't we just move
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubelet/app/server.go#L593-L598 down to > below where the kubelet is actually initialized?

Yes, I think that is a good solution. I just need confirmation from someone that it will work, and that there is no dependencies that require it to stay where it is. I have updated the PR now, and it looks like the e2e tests pass too (the first few passed, but the tests takes 2 hours to complete).. All DynamicKubeletConfig node-e2e tests pass with this PR.

Metrics are lazily instantiated now to allow us to disable them. We've had issues in the past with high cardinality metrics (#73587). In order to disable metrics, we need to no-opt creation of metrics until they actually become registered, which is the current behavior. We can force metrics initialization like so (logicalhan/kubernetes@master...kubelet-metrics-force-initialization), but I really don't want to do that.

Agree that force-registering the metrics is a bad solution, and goes against the new metric stuff; so I don't think that is a viable solution to the issue.

It looks to me that these tests are failing because they are literally testing that the metrics are immediately registered. That seems like a weird thing to test.

The test has a 1min timeout when looking for the metrics. And as I said in the other comments; from what I see and understand, if the first metric write is lost, the metrics will not be reapplied (Sync will not run) unless some of the data is changed.

@odinuge odinuge changed the title [WIP] Add metric created check in dynamic kubelet sync [WIP] Fix dynamic kubelet config init order Sep 28, 2019
@odinuge odinuge changed the title [WIP] Fix dynamic kubelet config init order Fix dynamic kubelet config init order Oct 2, 2019
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 2, 2019
@odinuge
Copy link
Member Author

odinuge commented Oct 2, 2019

cc @kubernetes/sig-node-bugs

@odinuge
Copy link
Member Author

odinuge commented Oct 3, 2019

Ping @logicalhan. Any new thoughts?

@odinuge
Copy link
Member Author

odinuge commented Oct 8, 2019

/cc mtaufen derekwaynecarr

Copy link
Member

@logicalhan logicalhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 8, 2019
@odinuge
Copy link
Member Author

odinuge commented Oct 9, 2019

/assign dchen1107 derekwaynecarr

@odinuge odinuge changed the title Fix dynamic kubelet config init order Fix init order during starup for dynamic kubelet config Oct 10, 2019
@liggitt
Copy link
Member

liggitt commented Oct 10, 2019

all synchronous setup is done in Controller#Bootstrap(), everything done by this method (other than programming error sanity checks that exit the process) is done in goroutines

/lgtm

@liggitt
Copy link
Member

liggitt commented Oct 10, 2019

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, odinuge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 10, 2019
@odinuge
Copy link
Member Author

odinuge commented Oct 10, 2019

/test pull-kubernetes-integration

@k8s-ci-robot k8s-ci-robot merged commit 91b2a7a into kubernetes:master Oct 10, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Oct 10, 2019
k8s-ci-robot added a commit that referenced this pull request Oct 11, 2019
…4-upstream-release-1.16

Automated cherry pick of #83184: Fix dynamic kubelet config init order
ohsewon pushed a commit to ohsewon/kubernetes that referenced this pull request Oct 16, 2019
Fix init order during starup for dynamic kubelet config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dynamic kubelet config metrics missing
6 participants