Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all async operations still block forever at 500 concurrent subscriptions #226

Closed
ccbrown opened this issue Sep 23, 2021 · 11 comments
Closed

Comments

@ccbrown
Copy link
Contributor

ccbrown commented Sep 23, 2021

Under the hood, asynchronous subscriptions just wrap synchronous calls with blocking::unblock. Under the hood, unblock moves work to a pool of up to 500 threads.

This creates an obvious bottleneck: If you create 500 subscriptions, all other asynchronous operations that you attempt will just block forever (or until those subscriptions receive data or are closed).

Perhaps the blocking crate isn't the best fit here. And perhaps there should be a bigger warning about everything breaking if you hit this limit.

This issue was first raised in #210. That issue was closed with this comment:

0.15.2 has been published which includes the fix for this

But 0.15.2 does not include a fix or any changes related to the 500 worker limit.

@derekcollison
Copy link
Member

Is the 500 a limit we self impose or inherit?

Any thoughts on how you would approach a fix?

@Jarema
Copy link
Member

Jarema commented Sep 27, 2021

@derekcollison I faced this issue in my project and the 500 threads limit is hardcoded in blocking crate. Author already agreed to change that.

    fn grow_pool(&'static self, mut inner: MutexGuard<'static, Inner>) {
        // If runnable tasks greatly outnumber idle threads and there aren't too many threads
        // already, then be aggressive: wake all idle threads and spawn one more thread.
        while inner.queue.len() > inner.idle_count * 5 && inner.thread_count < 500 {

smol-rs/blocking#14

@Jarema
Copy link
Member

Jarema commented Oct 2, 2021

My PR in blocking was merged.
smol-rs/blocking#14

Although because of nature of the blocking crate, it requires setting env var to increase number of threads.

I think we should mention that fact in docs of asynk, together with the information about setting env variable if needed and close this issue.

Question if blocking crate should be used in the future for handling async API is topic for another discussion.

@andrewbanchich
Copy link
Contributor

andrewbanchich commented Oct 12, 2021

I think we ran into this as well. Our prod deployment inexplicably stopped processing all NATS requests when running more than one stream of tasks. This only affected req / rep streams, not pub / sub.

I removed all of our worker code so that it was literally just replying with PONG and it still just stopped forever.

It would be great if we could consider just using async code from the ground up instead of wrapping blocking code.

@caspervonb
Copy link
Collaborator

caspervonb commented Oct 31, 2021

It would be great if we could consider just using async code from the ground up instead of wrapping blocking code.

500 non-cooperative blocking threads is a lot in the context of an async rust crate and is definitively going to hurt through-put.

Long term, that is after feature parity I think we'd be better of shipping both nats and async-nats, the latter using async i/o from async-std.

@databasedav
Copy link

just landed here and want to highlight these comments from the previous thread to summarize what's going on #210 (comment) #210 (comment) #210 (comment)

and i have a question about the explanation in this comment #210 (comment), i may be misunderstanding the broader implications and advantages of async, but isn't the asynchronicity itself worth some (25% or perhaps even more) throughput decrease? i understand that a single async client is less performant than a single sync client, and the nature of the benchmarking wasn't clear so i might be wrong, but shouldn't each client's throughout be measured relative to some maximum saturation of each client per thread? for example, a single thread can hold more than 1 async client but only 1 sync client, and then 100 threads could house some ideal/maximum saturation of async clients across all the threads but only 100 sync clients. is then the claim that the latter is still 25% better than the former?

i'm fairly new to this space so i'm genuinely curious what the implications of this are given that throughput is obviously a primary point of concern

@derekcollison
Copy link
Member

We are currently working on the JetStream parity atm, post that work will take up async again in earnest and figure out the best course of action.

@Jarema
Copy link
Member

Jarema commented Jul 10, 2022

@ccbrown & @andrewbanchich as the new async client supports JetStream now and does not suffer from this issue, I assume we can close this ticket?

@andrewbanchich
Copy link
Contributor

@Jarema I don't have a chance to confirm since I'm no longer at the company where I experienced this bug.

However, for clarification, the bug wasn't with JetStream - it was with normal NATS.

@Jarema
Copy link
Member

Jarema commented Jul 10, 2022

Well, thats a totally new client (full rewrite) that is using tokio, so there is no reason to expect that the bug is in it.

I know it was not about JetStream.
I just thougt it would be not fair to close the issue when new client is there, but without good feature parity with the old one. Now, when it got JetStream, I think its a good moment to get back to this one.

@andrewbanchich
Copy link
Contributor

Makes sense! I'm fine with this being closed. Not sure about @ccbrown

@ccbrown ccbrown closed this as completed Jul 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants