-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bound Concurrency of Pod Scraping (Probably with a Work Pool) #8377
Comments
In the same vain, we should likely switch the probes into a workqueue based solution (like net-XXX probers) to be able to more easily control the lifecycle of the respective probes. |
For the color |
Here, I am a bit on the border, given that we probably have requests queueing up in activator that we want to start forwarding as fast as possible. And until the pod is probed it receives no requests... so 🤷 |
Not sure we talk about the same thing @vagababov? What does the activator have to do with pod probing in the autoscaler? 🤔 |
We only probe during scale to 0. If we consider 500 revisions all going to 0 at the same time, it's unfortunate, but a much more corner case than just 500 revisions that we need to scrape :) |
But we do a lot of probing in activator :) |
Okay, we clarified offline: In my comment about switching to workqueue based stuff I meant metric scraping, not probing. Sorry for the confusion. |
@julz @vagababov @markusthoemmes I could help here if possible. |
Yep, just know that this might be a somewhat bigger refactor of the current scraper code. I think we first need to agree that we actually want work-pooling here as it comes with the risk of us setting the concurrency too low. |
so how do we want to proceed on this? Would a feature proposal doc to hash out the details and any foreseeable edge cases be the best next step? |
I believe so @julz |
cool - @skonto happy to work together on that if you like? |
@julz sure let's do it :) I will start a doc and we can iterate on it. I will ping you. |
Note, that this has to be a configurable feature, since a user might wish to throw CPU at the AS to ensure enough resources to support more revisions. |
@vagababov, as you probably noticed we (cc @julz @markusthoemmes) had a long discussion on slack (autoscaling channel), so absolutely that is one option that we discussed, but still within the context of one instance there should be space for improvements. I will try to summarize here. What can be improved needs verification in any case. The total max cost is We could either make 16 less or limit the two factors at any time eg. control scraping or revisions served. Using a workqueue for limiting scraping has some other benefits, besides limiting concurrency, for example as @markusthoemmes mentioned:
But also there are challenges as mentioned in #8610. Note here though that we already use a workqueue for probing. In addition to that, the fact of currently having a max of 16 pods scraped per revision could be less assuming we have a smarter algorithm besides relying on the normal distribution. Maybe a time-series algo could do better here with less pods used in scraping. The key question, as discussed, is whether we should limit the number of concurrently served revisions or how much scraping we do per revision at any time. For the later a related issue is this one. In that issue max keep alive was increased but this could lead to running out of file descriptors if we keep increasing it. IMHO, from a UX and operations perspective we could make configurable both the number of revisions served and the workqueue concurrency or other options (if we go down that path). |
@vagababov @julz should we move on with a workqueue approach, have a first version and take it from there? |
We might want to think about the guarding measures we want to take first maybe. That makes accepting/denying this a no-brainer imo. How do we want to measure success/failure or rather better/worse once we do this? |
@markusthoemmes from my POV, my definition of success would be to add configuration capabilities keeping the same performance as in the unbounded case and avoid corner cases as described in #8610. |
Hm, but wouldn't we want to show that implementing this generates an advantage over the existing way of doing things (which doesn't require hand-tuning either). We might want to produce a micro-benchmark that gives us an idea on the scale needed to reap benefits and the scale needed to push things so far that they start to detriment. |
The way I see this is that the workqueue imposes a rate limiting strategy in order to optimize revision scraping throughput and latency. The current approach from operations perspective implicitly requires hand-tuning if you have to increase your connection pool size in large envs and btw there is no control with busty traffic. |
I would also want to see how StatefulSet AS scaleout works, since I don't want us to compilicate the system if there's no necessity to. |
@vagababov reading these benchmarks from the |
100K go routines is probably strongly above what we support per task in any case.... |
@julz describes an issue that occurs in large envs. So we need to clarify what is the number of pods being scraped when issues occur. Btw in theory if we follow the guidelines for building large clusters, we could have thousands of pods and the limit is 150K. However, these benchmarks I mentioned depend on the processing being done which is a dummy function. I dont think the threshold is the same with a realistic workload. In the latter case cost per scraping request could be way more in ms, could use more resources because of http connections in exceeded keep-alive pools etc . |
Yeah let's write a Golang benchmark (I don't think an E2E one is necessary) and see if we can find a break even point between just spawning new goroutines and pooling them for our workloads. |
This issue is stale because it has been open for 90 days with no |
/reopen /lifecycle frozen |
@nader-ziada: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/triage accepted |
Describe the feature
We've discussed this a few times in slack but it seems worth having an issue for it and I didn't spot one, so here one is :).
As of right now we spawn pod scrapes in parallel for ~every revision with no real bound on the amount we'll attempt at once across revisions. This means we can potentially have an extremely large number of active outgoing scrapes, causing more contention than needed, exceeding keep-alive connection pool size and buffer pools etc. In our larger environments, we see large numbers of connections waiting on DNS and GC when revisions*pods gets large.
We should probably introduce some sort of work pool for the scrapers.
The text was updated successfully, but these errors were encountered: