Skip to content

Conversation

@harche
Copy link
Contributor

@harche harche commented Jan 29, 2021

Dynamic sizing providers for kubelet

Enhancement Issue: #2369
/sig node
/cc @derekwaynecarr @dchen1107 @mrunalp @ehashman @rphillips

Signed-off-by: Harshal Patil harpatil@redhat.com

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 29, 2021
@harche harche mentioned this pull request Jan 29, 2021
4 tasks
@harche harche changed the title KEP 3000: Kubelet Sizing Providers KEP 2370: Kubelet Sizing Providers Jan 29, 2021
Comment on lines 316 to 319
## Alternatives

1. add a built-in sizing provider in-tree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add an HTTP endpoint with a standardized schema that one can POST the desired values to, with the appropriate authn and authz?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure you add a consideration and discussion of this to the Alternatives section. Right now it has one bullet about an in-tree implementation, but doesn't consider pros and cons. Pros and cons of all alternative solutions should be outlined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ehashman


## Summary

Kubelet should have a sizing provider mechanism, which could give kubelet an ability to dynamically fetch sizing values for memory and cpu reservations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title of the KEP is a little bit misleading. When I saw it initially, I thought it is for dynamically changing node capacity, instead of the node allocatable. Both were requested in the past by me. ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dchen1107 Thanks for your comment, I will be happy to modify the title if it helps avoid the confusion. Please let me know what would be your suggestion.


### Non-Goals

* For now the plugin mechanism is proposed here is only for fetching values of `system reserved` and `kube reserved`. Similar approach can be taken to dynamically fetch the values of other parameters of the kubelet (e.g. `evictionHard`) but they are out of scope of this KEP.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be an interesting follow-on exercise to enumerate the kubelet configuration values that reasonably vary by instance type. implicit in this proposal is the thesis I agree with that merging the existing kubelet file based configuration with instance type specific overrides is cumbersome in many practical environments versus exec-ing out to some other local intelligence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @derekwaynecarr for your comment.

@harche harche force-pushed the sizing-providers branch 2 times, most recently from c04d99f to aa26351 Compare February 4, 2021 09:39
@harche
Copy link
Contributor Author

harche commented Feb 4, 2021

/assign @deads2k @derekwaynecarr

@JamesLaverack
Copy link
Member

I'm aware this KEP is in-review and hasn't been merged yet. But I just wanted to highlight that the KEP number (in the title of this PR, the kep.yaml file, the directory name, the PRR filename, and the title line of the KEP README.md) should match the issue number on the enhancements issue. In this case that should be 2369, not 2370.

@harche harche changed the title KEP 2370: Kubelet Sizing Providers KEP 2369: Kubelet Sizing Providers Feb 4, 2021
@harche
Copy link
Contributor Author

harche commented Feb 4, 2021

Thanks @JamesLaverack for pointing that out. I have updated all the references to 2369 from 2370.

@harche
Copy link
Contributor Author

harche commented Feb 4, 2021

Not sure why pull-enhancements-verify is failing

@JamesLaverack
Copy link
Member

I think your table of contents is out of date. The PRR section in the ToC has a few extra headings that are not in the text.

@harche
Copy link
Contributor Author

harche commented Feb 4, 2021

I think your table of contents is out of date. The PRR section in the ToC has a few extra headings that are not in the text.

Yeah, that was the issue.

@ehashman thanks for the hint to run hack/update-toc.sh, that fixed the issue with ToC automatically.


TBD for beta.

* **What specific metrics should inform a rollback?**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't strictly required for alpha, but as I recall, if these values are too low it can lead to node readiness/responsiveness problems. It may be a good idea to indicate how to tell if the current size is well suited to the current usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @deads2k for your comment.

@deads2k
Copy link
Contributor

deads2k commented Feb 4, 2021

approving prr for alpha.

/approve

@ehashman
Copy link
Member

ehashman commented Feb 8, 2021

/assign @dchen1107

#### Alpha -> Beta Graduation

* integration or e2e tests.
* at least one working plugin implementation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two metrics worth tracking for exec plugins were defined as part of kubelet cred provider plugin.

see:
https://github.com/kubernetes/enhancements/pull/2457/files#diff-7664c031fa927970c5fc47a1a3e1991e6691918066576490797650aaced971d3R316

similar metrics would apply here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a slight difference between cred provider plugin and this node sizing provider plugin.

The cred provider plugin gets executed throughout the lifetime of the kubelet whenever there is a request to provision a pod with matching image name. So It makes sense for them to have kubelet_credential_provider_plugin_errors which will track the number errors that occurred from invoking an exec plugin and kubelet_credential_provider_plugin_duration which will track the duration of execution by plugins.

On the other hand, node sizing provider plugin gets executed only during the start up of the kubelet. Unlike cred provider plugin, there is no repeated execution of node sizing provider plugin by the kubelet spread over the time (as long as kubelet is running). If a node sizing provider plugin is enabled and it fails to fetch the sizing values, the kubelet will abort the startup and log the error in kubelet logs. Since the user has explicitly requested the sizing values to be fetched using the exec plugin, kubelet should not default to any other values. Defaulting to any other values and bringing up the kubelet may result in an unpredictable behavior, such as node lock ups, as the default sizing values may be not optimal.

The node sizing provider plugin is either going to succeed or fail during the kubelet startup, since it's not going to generate any time series data like cred provider plugin I have deliberately dropped the Metrics section from this KEP.

But in case you still feel that there is a scope for adding any other metrics, please let me know.

@derekwaynecarr
Copy link
Member

I am supportive of the idea to delegate computation of these fields dynamically to a plugin that can apply their own local formula.

I suspect as this moves out of alpha->beta phase, we may want to clarify if the pods-per-core heuristic is provided as an argument to this plugin OR derived from this plugin as the desired pod density is something that would skew reservations. I think that can be evolved in a pre-beta evaluation of the feature.

defer for @dchen1107 to provide further comment.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: deads2k, harche
To complete the pull request process, please assign dchen1107 after the PR has been reviewed.
You can assign the PR to them by writing /assign @dchen1107 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@harche harche force-pushed the sizing-providers branch 3 times, most recently from d45d31a to edaf034 Compare February 9, 2021 09:59
Signed-off-by: Harshal Patil <harpatil@redhat.com>
@derekwaynecarr
Copy link
Member

Another potential idea was to allow kubelet --config option to accept either a file or URL similar to kubectl create -f. In that way, providers could serve kubelet configuration from a local metadata server, not manage files, and also not have to persist kubelet configuration in the kubernetes cluster control plane.

Comment on lines +199 to +203
### Test Plan

Alpha:
* Unit tests for the exec plugin provider
* Unit tests for API validation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any more detail here?

Copy link
Contributor

@ravisantoshgudimetla ravisantoshgudimetla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this LGTM. Thank you for working on this @harche.

I think it will be beneficial from Windows node stand point too, considering we provide static recommendations.

cc @marosset @jsturtevant @jayunit100 @aravindhp


### Metrics

N/A
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one metric is startup time even though it is not time series, it will be good to know how long kubelet took to startup with a plugin.

## Alternatives

### Add a built-in sizing provider in-tree.
#### Pros
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative is what @derekwaynecarr suggested. We implemented a plugin based mechanism in scheduler using both intree and http. Using metadata server is more inline with cloudproviders like AWS do. We can perhaps add this as supported implementation when we graduate to beta?

@derekwaynecarr
Copy link
Member

@ravisantoshgudimetla just updating from conversation in sig-node, but @dchen1107 was against a sparse type of plugin model but appeared more open to supporting --config as a URL per my comment above on slack.

@marosset
Copy link
Contributor

I think it will be beneficial from Windows node stand point too, considering we provide static recommendations.

+1 that this would be beneficial for Windows.
Windows doesn't have an OOMkilling but we have seen issues where kubelet was getting starved of CPU time due to poorly set system reserves in the past. As mentioned in the motivation sections, trying to tune these values for many node sizes is hard to do manually.

@ehashman
Copy link
Member

We decided not to pursue this I believe.

/close

@k8s-ci-robot
Copy link
Contributor

@ehashman: Closed this PR.

In response to this:

We decided not to pursue this I believe.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@YanzhaoLi
Copy link
Member

Do we have alternative for this feature?

@ehashman
Copy link
Member

This can be calculated by code responsible for provisioning a cluster/setting up a node with kubelet rather than delegating it to the kubelet itself per this proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants