-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AEP] docs: add proposal for kwok
provider
#5869
Conversation
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: vadasambar The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
kwok
providerkwok
provider
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
Note that the original plan was to create a first draft and then edit the proposal to match the KEP template but after discussion in sig-autoscaling meeting yesterday, I found out that I don't need to write a proposal to add a cloud provider. I think having a proposal is still valuable (but it doesn't follow standard KEP template). It helps users understand the motivation for adding the cloud provider and it provides a document everyone can review and add comments/suggestions in. I will keep this PR open for reviews/suggestions from users/contributors before merging it in. I have asked |
* `BuildKwokCloudProvider`: | ||
* Create a lister to watch for nodes with the annotation `kwok.x-k8s.io/node: fake` | ||
* The annotation is used to identify nodes created by the `kwok` provider (`kwok` provider creates nodes with this annotation). Limiting the lister `kwok` created nodes ensure we don't touch actual nodes. | ||
* Use `KWOK_TEMPLATES_PATH` env variable to fetch user defined template nodes. This is so that the users can specify their own custom template nodes. This can be the yaml result of `kubectl get nodes` from a different cluster with some changes. Or, it can be nodes defined by the user from scratch in yaml. The yaml has to be mounted in the CA pod at the path specified in `KWOK_TEMPLATES_PATH` env variable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also designing this Mock data, if you're interested take look at the API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for sharing the link. kubernetes-sigs/kwok#668 looks interesting. 👍
You can see a PoC implementation of rest of the functions at https://github.com/kubernetes/autoscaler/pull/5820/files#diff-44474ffb56eda61e9a6b16c3ca66461cdf8e02a7e89b659f5a45ca32f5fa8588 (only supports template nodes passed via env variable as of writing this) | ||
|
||
### Things to note | ||
1. Once the user is done with using the `kwok` provider, `--cloudprovider` flag can be reset to any other cloud provider. Deleting the CA pod will cleanup all the fake nodes created by the `kwok` provider and restore the cluster to its original state. This behavior doesn't seem to be supported as of writing this (based on my tests). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleting the CA pod will cleanup all the fake nodes created by the
kwok
provider and restore the cluster to its original state. This behavior doesn't seem to be supported as of writing this (based on my tests).
Yes, kwok will not delete any nodes, we need to filter the nodes to be deleted based on the label/annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior doesn't seem to be supported as of writing this
Sorry for the confusion. I don't expect kwok
to clean up nodes. This
here is CA. I need to implement a go interface for this proposal. This interface has a Cleanup()
function. My expectation is if I implement this function, CA would call it when the CA pod is terminating. But, this expectation is wrong because Cleanup
function is for cleaning go routines when the CA pod is terminating. This is why I said This behavior doesn't seem to be supported as of writing this (based on my tests).
.
TODO: change the sentence to make it less confusing
* relevant unit tests have been added | ||
* `kwok` cloud provider is `demo`ed in one of the sig-autoscaling meetings | ||
|
||
### FAQ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A question, as I understand it, CA depends on metric-server, right? I'm not sure if kwok nodes can scale up/down properly without /metrics/resource
. I'm currently working on supporting metric-server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the link. Fortunately, CA doesn't depend on kubectl top pods
or kubectl top nodes
metrics. It is based on kube state metrics (it uses pod CPU and memory requests metrics).
That being said, having metrics server would definitely be nice. 😄
@wzshiming thank you for taking a look at my proposal 🙏 ❤️ |
/assign This looks quite interesting, I think some scalability issues could indeed be found with such cloud provider. Definitely not all of them, since some of them are cloud provider specific. I'd be happy to marge this, though maybe it would make sense to discuss on the sig meeting first? Or did that happen already? (I see this PR is quite old by now.) |
Thank you for showing interest ❤️ I already talked about this proposal in the sig meeting. Based on the discussion, I don't need an AEP to add a new cloudprovider. I have kept this PR so that people can find it easily. I am working on getting the first release out. I plan to include this proposal as a doc in the PR. PR: #5820 I think the code can improve (would love your review :) ). I want to get the first release out so that users can try it and work on the feedback. |
Gotcha. I'd close this PR then since you're already working on the other one. Anyone reading this will know where to look next. Does that make sense? Also, #5820 is currently marked as "DON'T REVIEW". Is it already in reviewable state? |
Agreed. I will close this PR.
I am writing tests right now. I would like to get a review on the PR in its current state but I will refrain (don't want to waste your time). Once I am done with the tests, I will mention you in the PR to let you know it is ready. |
Signed-off-by: vadasambar surajrbanakar@gmail.com
What type of PR is this?
/kind documentation
/kind feature
What this PR does / why we need it:
This is a proposal for implementing
kwok
provider. Check the files for more explanation.Which issue(s) this PR fixes:
It will help fix issues like #5769
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Not sure what should be added ^
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: