-
Notifications
You must be signed in to change notification settings - Fork 38.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On large cluster, routecontroller takes forever to program routes due to rate limit errors / reconcile #26119
Comments
c.f. #23327, which is the other leg of this problem |
I took a look into code and it seems that we are doing full reconciliation every 10s. If we need to create 2000 routes at the beginning it seems to frequently. I think that we need to:
I will try to put some small PR together as a starting point. |
Instead of just rate limits to operation polling, send all API calls through a rate limited RoundTripper. This isn't a perfect solution, since the QPS is obviously getting split between different controllers, etc., but it's also spread across different APIs, which, in practice, rate limit differently. Fixes kubernetes#26119 (hopefully)
I'm reopening this one, since this doesn't seem to be solved. So basically something strange is happening here. With @zmerlynn PR we have 10/s limit on API calls send to GCE. And we are still rate-limitted. However, according to documentation we are allowed for 20/s: So something either doesn't work or is weird. |
What is super interesting from the logs of recent run: It starts to creating routes:
After 2 minutes from that, we are hitting "Rate limit Exceeded" for the first time:
And only after 9 minutes we create the first route ever.
So it seems that nothing useful is happening for the first 2 minutes, and then between 3m30 and 9m30 |
OK - so I think that what is happening is that all controllers are sharing GCE interface (and thuse they have common throttling). So if one controller is generating a lot of API calls, then other controllers may be throttled. @gmarek has a hypothesis that it may be caused by nodecontroller. |
One thing my PR didn't address (because it was late): Now we have some obnoxious differences in logging when you go to make a request, because the |
@zmerlynn - yes I'm aware of that. And probably this is exactly the case here. |
I got fooled by that - I guess we should fix it somehow. |
One approach to fixing it is a similar approach to the operation polling code, but in the |
I found something else - Kubelet is also building GCE interface: We have 2000 kubelets, so if all of them send multiple requests to GCE, we may hit limits. |
Any idea in what circumstances Kubelet will contact cloud provider? @dchen1107 |
Hmm - it seems it contacts GCE exactly once at the beginning. So it doesn't explain much. |
I think it's only here: kubernetes/cmd/kubelet/app/server.go Line 622 in 6224f44
I've rarely seen GCE ratelimit readonly calls, in practice, and kubelets come up pretty well staggered. |
OK - so with logs that I added, it is pretty clear that we are very heavily throttled on GCE api calls. In particular, when I was running 1000-node cluster, I see lines where we were throttle for 1m30. I'm pretty sure it's significantly higher in 2000-node cluster.
I'm going to investigate a bit more where all those reuqests come from. |
Actually - it seems that we have a pretty big problem here. So to clarify:
We should think how to solve this issue, but I think the options are: increasing QPS quota or calling GCE only once per X statusUpdates. |
The second issue is that we are throttled at the controller level too, and this one I'm tryin to understand right now. |
@zmerlynn - that said, any rate-limitting on our side, will not solve the kubelet-related problem |
Hmm - actually looking into implementation it seems that NodeAddresses are only contacting metadata server, so I'm not longer that convinced... |
Also, I think that important thing is that a single call to GCE API can translate to multiple api calls to GCE. In particular - CreateRoute translates to:
|
OK - so I think that what is happening here is that since CreateRoute translates to:
That means that if we call say 2000 CreateRoute() at the same time, all of them will try to issue: getInstance at the beginning. So we will quickly accumulate 2000 GET requests in the queue. So It's kind of expected what is happening here. So getting back to the Zach`s PR - I think that throttling POST and GET and OPERATION calls separate is kind of good idea in general. |
But I still don't really understand why do we get "RateLimitExceeded". Is that because of Kubelets? |
I think that one problem we have at the RouteController level is that if CreateRoute fails, we simply don't react on it. And we will wait at least 5 minutes before even retrying it. |
But I still don't understand why do we get those "RateLimitExceeded" error |
OK - so as an update. In GCE, we have a separate rate-limit for number of in-flight CreateRoute call per project. |
Automatic merge from submit-queue GCE provider: Revert rate limits [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]() This reverts #26140 and #26170. After testing with #26263, #26140 is unnecessary, and we need to be able to prioritize normal GET / POST requests over operation polling requests, which is what the pre-#26140 requests do. c.f. #26119
Automatic merge from submit-queue Limit concurrent route creations Ref #26119 This is supposed to improve 2 things: - retry creating route in routecontroller in case of failure - limit number of concurrent CreateRoute calls in flight. We need something like that, because we have a limit of concurrent in-flight CreateRoute requests in GCE. @gmarek @cjcullen
OK - so this one is actually fixed. I mean that it is still long, but currently it's purely blocked on GCE. |
As a comment - we have an internal bug for it already. |
On a large cluster, the routecontroller basically makes O(n)
Routes.Insert
requests all at once, and most of them error out with "Rate Limit Exceeded". We then wait for operations to complete on the ones that did stick, and damn the rest, they'll be caught next reconcile (which is a while).In fact, in some cases it appears we're spamming hard enough to get rate limited by the upper-level API DOS protections and seeing something akin to golang/go#14627, e.g.:
We can do better than this!
The text was updated successfully, but these errors were encountered: