Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache crd objects to speed up VpaTargetSelectorFetcher and ControllerFetcher #3412

Closed
wants to merge 3 commits into from

Conversation

chenchun
Copy link
Contributor

@chenchun chenchun commented Aug 6, 2020

Fixes #3266
Without this patch, it takes 1m41s (from I0806 07:50:08.676461 to I0806 07:51:49.018659) to get all selectors for each of 3111 vpas (mostly crd workload targets).
image

With this patch, it takes 1s (from I0806 08:02:49.137257 to I0806 08:02:50.266574).
image

We also save 3111*3/2/60 qps of scale http calls to kube-apiserver and speed up vpa admission.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 6, 2020
@chenchun chenchun changed the title cache objects to speed up VpaTargetSelectorFetcher and ControllerFetcher cache crd objects to speed up VpaTargetSelectorFetcher and ControllerFetcher Aug 6, 2020
@krzysied
Copy link
Contributor

krzysied commented Aug 6, 2020

Wow! The performance improvement looks amazing.
Can I review this?

@bskiba
Copy link
Member

bskiba commented Aug 7, 2020

/assign @krzysied
Krzysztof please do take a look.
My main concern with the approach is that we need get, list and watch permissions on all objects in the cluster.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign krzysied
You can assign the PR to them by writing /assign @krzysied in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@krzysied
Copy link
Contributor

My main concern with the approach is that we need get, list and watch permissions on all objects in the cluster.

I'm not sure if we can do it differently. The best solution would be having watch on scale subresource, but k8s doesn't prove such option.
These permissions give a lot of access, but it's only read access.

vertical-pod-autoscaler/pkg/crdcache/crdcache.go Outdated Show resolved Hide resolved
vertical-pod-autoscaler/pkg/crdcache/crdcache.go Outdated Show resolved Hide resolved
vertical-pod-autoscaler/pkg/crdcache/crdcache.go Outdated Show resolved Hide resolved
vertical-pod-autoscaler/pkg/crdcache/crdcache.go Outdated Show resolved Hide resolved
vertical-pod-autoscaler/pkg/crdcache/crdcache.go Outdated Show resolved Hide resolved
@@ -108,7 +109,8 @@ func NewClusterStateFeeder(config *rest.Config, clusterState *model.ClusterState
kubeClient := kube_client.NewForConfigOrDie(config)
podLister, oomObserver := NewPodListerAndOOMObserver(kubeClient, namespace)
factory := informers.NewSharedInformerFactoryWithOptions(kubeClient, defaultResyncPeriod, informers.WithNamespace(namespace))
controllerFetcher := controllerfetcher.NewControllerFetcher(config, kubeClient, factory)
crdCache := crdcache.NewCrdCache(config, make(chan struct{}), defaultResyncPeriod)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stop channel should provided via function params.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what you mean. I deleted stop chan param and follow the rules in other package to create stop chan in crdcache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about having stop channel as an input parameter. This would be helpful if cluster feeder was recreated many times during recommender life.
However, I looked at the recommender code and it didn't seem like a use case. Moreover other informers are also not propagating stop channels.
Deleting stop channel seems ok.

@chenchun chenchun force-pushed the crdcache branch 2 times, most recently from 04ede85 to f4ff989 Compare August 11, 2020 06:53
@krzysied
Copy link
Contributor

/hold
I talked with security folks and having rbac that allows to list the whole universe doesn't seems to be a great idea.
I will look for some alternatives/ways to restrict the access.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 18, 2020
@bskiba
Copy link
Member

bskiba commented Oct 2, 2020

From investigation by @krzysied, the current understanding is that giving VPA read access to all objects is a no-go from security perspective (gives access to secrets) and there is no way to express "give access to everything but secrets". Unfortunately, I don't see how we can make this PR work without that so I'm closing it, but feel free to reopen if you see an alternative.
/close

@k8s-ci-robot
Copy link
Contributor

@bskiba: Closed this PR.

In response to this:

Form investigation by @krzysied, the current understanding is that giving VPA read access to all objects is a no-go from security perspective (gives access to secrets) and there is no way to express "give access to everything but secrets". Unfortunately, I don't see how we can make this PR work without that so I'm closing it, but feel free to reopen if you see an alternative.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@chenchun
Copy link
Contributor Author

there is no way to express "give access to everything but secrets"

I find kubernetes/kubernetes#85963 talks about this.
@bskiba Do you think it's worth adding a flag to make this feature turned off by default and can be turned on for specific clusters by adding permissions for certain workloads instead of all objects to RBAC ?

@bskiba
Copy link
Member

bskiba commented Oct 16, 2020

Sadly, looks like kubernetes/kubernetes#85963 is rejected. We're looking currently at a different way of tackling this, I'm hoping #3589 will help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vpa-recommender: lots of duplicate scale calls
4 participants