Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statefulset controller create pod too slowly #75495

Closed
likakuli opened this issue Mar 20, 2019 · 9 comments

Comments

Projects
None yet
5 participants
@likakuli
Copy link

commented Mar 20, 2019

/area controller-manager
What would you like to be added:
Don't resync statefulset in statefulset controller.
Why is this needed:
There are 2000+ sts in my k8s cluster, thus when i create a new sts, it takes a lot of time to create it's pod, i found that every sts will cost 40ms to sync it, then 2000 sts will cost 80s, and there is also a statefulSetResyncPeriod to resync all sts, I don't know why there need to have statefulSetResyncPeriod to resync all sts? Can we not resync sts and just watch it's changes?

@likakuli

This comment has been minimized.

Copy link
Author

commented Mar 20, 2019

/sig api-machinery

@yue9944882

This comment has been minimized.

Copy link
Member

commented Mar 21, 2019

/remove-sig api-machinery
/sig apps
/sig scalability

@krmayankk

This comment has been minimized.

Copy link
Contributor

commented Mar 22, 2019

the resync pattern is followed by all controllers to account for bugs in watch mechanism or missed notification due to any other reason. I dont think the individual controllers resync period is configurable or we could increase it . Also i am not sure if the current scale limits of 2000 StatefulSets is supported @wojtek-t FYI

@likakuli

This comment has been minimized.

Copy link
Author

commented Mar 22, 2019

@krmayankk I have three cluster, A has 20 sts, B has 1300 sts and C has 2300 sts. I set kube-controller-manager loglevel to 4 and find that the cost of processing every sts is 2ms,30ms,45ms.

the resync pattern is followed by all controllers to account for bugs in watch mechanism or missed notification due to any other reason

Can we record the processed object into a processed cache and every time we get a new object to precess, we can compare it with the item in the cache, if they are same, then we don't need to sync it.

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Mar 22, 2019

the resync pattern is followed by all controllers to account for bugs in watch mechanism or missed notification due to any other reason. I dont think the individual controllers resync period is configurable or we could increase it . Also i am not sure if the current scale limits of 2000 StatefulSets is supported @wojtek-t FYI

The "resync pattern" doesn't help with watch bugs (resync is done purely from internal cache, so in hypothetical situation that we missed watch event, it won't help at all).

Resync-period should basically be used only in controllers that also sync some external things (like load-balancers), and watching changes of those is impossible.

Unless I'm not missing anything, resync period can be safely dropped in this case.

@kubernetes/sig-apps-bugs @kow3ns @janetkuo

@likakuli

This comment has been minimized.

Copy link
Author

commented Mar 23, 2019

@wojtek-t I think all the controllers have the same problem because of the sharedinformerfactory has a default resync period if we don't set it to 0 in the config file. Also if there are more sts, the consumption for every sts will be larger.

@krmayankk

This comment has been minimized.

Copy link
Contributor

commented Mar 23, 2019

Resync-period should basically be used only in controllers that also sync some external things (like load-balancers), and watching changes of those is impossible.

@wojtek-t interesting, but the resync is only happening for k8s objects. Why is watching changes on those impossible ? And when you say external things you mean loadbalancers in gcloud ? resync is only happening for k8s services corresponding to those loadbalancers , right ? Or may be you are saying that if a k8s services is linked to a gcloud loadbalancer, the resync enables the controller code to go and check gcloud if the loadbalancing is provisioned or has anything changed there ?

We need to document this in somwhere much like the apiconventions doc

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Mar 24, 2019

@krmayankk - I mean loadbalancer in GCP as an example (but can also be other load balancer, persistent volumes in cloud or things like that).
Yes - we resync on k8s objects, but that that let's you sync those. As an example - imagine service of type LoadBalancer - if you're periodically syncing the k8s service object, this triggers the operation of syncing the load-balancer (and we have an ability to check if it exists, is healthy etc.).

@likakuli - hmm, I would need to double check it, but I thought the default resync period is pretty long...

@likakuli

This comment has been minimized.

Copy link
Author

commented Mar 24, 2019

@wojtek-t The default resync period is 12~24 hours. It is long but the bad experience will still happen every day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.