-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[close #1577] Negative Backpressure Metric #1579
Conversation
This PR introduces the `pool_capacity` stat which can be used as a “negative back pressure metric”. What does a “negative backpressure metric” mean? It means when this number is low, we have a higher amount of backpressure. When it is zero it means that our worker has no ability to process additional requests. This information could be used to scale out by adding additional servers. When that happens the requests should be re-distributed between the extra server and the “pool capacity” number for each individual server should go up.
Related to puma/puma#1579
@schneems Is this ready to be merged? |
Yep! |
sweet! a few questions
|
I think this is the thing TM that "fixes" #1577. Though if you disagree, i'm happy to re-open. Ahh, i see i missed some of your comments over there, sorry about that. I had intended conversation about this specific interface to happen on this PR. I'm using this in production right now and it is providing good feedback.
That's already been asked, I can add that in as another metric. Seems worthwhile. Per worker it would be max-threads. It's a good idea to let the stat tell us that directly. While i'm doing that I can also expose the current number of threads per worker. The two values are related.
Going down to 0 in "pool capacity" is bad in the N:N config. There may be a race condition with 0:N case where all your workers drop to zero (because you've not yet hit N threads) and then the stat fires and reports that you have 0 capacity, then they all dynamically create a new thread and the capacity would actually be 1 times the number of processes. I think this is mitigated by the fact that we would need to use a stream of this metric rather than just one value. I.e. even if it's showing 0 right now, under the same sustained load it would show a positive number the next time it reports. Even if we took some scaling action based on the 0 indicator, a future stream of positive numbers would perhaps indicate we could safely scale back down. |
Yesss, thank you! Now we can publish a "Utilization" metric for sensible autoscaling, at least in the N:N configuration. 💯 |
Let me know what you think! The other major question to answer with metrics is "am I using a good number for my thread count?". I'm not totally sure about that one. My current best suggestion is to pick a magic number and live with it. Alternatively, adjust it up or down and note if the average response time increases or decreases. I guess I want some combination of a CPU metric (want to be close to 100% utilization) but also some kind of a "contention" metric. For example, you can certainly saturate an app with 1000 threads, but your contention will be through the roof. Maybe contention could be time spent idle per thread. I don't know if it's possible to get that from Ruby, it might require a patch to ruby/ruby and wouldn't be useful for at least a year until it gets released. I'm open to other thoughts/suggestions. Maybe we should open another issue on it for some brainstorming. |
I've been thinking about this problem for years. Ideally one would like to try different thread counts under similar circumstances and see how performance and CPU load are affected (assuming that processes are scaled up to use available ram, and the amount of ram more threads take is not an issue). But that's hard to do because it's hard to get directly comparable traffic circumstances, and also cumbersome to schedule the experiment and directly compare results. But... I just now had this idea: One could have, on the same machine, Puma processes with different thread counts. Then the behavior of each can be compared. Most important is probably simply throughput. If a process with 4 threads serves almost 2 times as many requests as a process with 2 threads, 4 threads is worth the DB connections. If a process with 8 threads serves only 20% more requests than a process with 4 threads, 8 threads is too many and a waste of DB connections (of course one would would to have enough incoming traffic to feel confident there was a sufficient test... and maybe having enough would require scaling down the number of processes for some period of time until there is a tad bit of backpressure). Doing this would require that the logic which distributes requests to processes is very accurate in its determination of if a process has available capacity. I don't know to what extent this is possible. If it is possible, then we can add features to Puma to facilitate this sort of experiment. We could even eventually create some sort of autoscaling. |
(above comment was edited a lot after posting, please read on website) |
Interesting. That could be really cool. There's lots of apps that can't run that many workers. I also worry about large apps that might need to tightly control DB connection counts. I think this is a good experiment. We could maybe manually do this in a "on_worker_boot" block of an app, though would need a way to determine thread multiplier for each process. I'm not sure how to log and record throughput of each individual worker in that scenario.
Puma 3.12 does a good job of this AFAIK after a recent patch. You can read the docs about how this all works #1576. Let me know if you've got any other ideas. The value of a brainstorming session usually comes from the number of ideas. With quantity comes quality. |
This PR introduces the
pool_capacity
stat which can be used as a “negative back pressure metric”.What does a “negative backpressure metric” mean? It means when this number is low, we have a higher amount of backpressure. When it is zero it means that our worker has no ability to process additional requests. This information could be used to scale out by adding additional servers. When that happens the requests should be re-distributed between the extra server and the “pool capacity” number for each individual server should go up.