all async operations still block forever at 500 concurrent subscriptions #226

ccbrown · 2021-09-23T00:43:16Z

Under the hood, asynchronous subscriptions just wrap synchronous calls with blocking::unblock. Under the hood, unblock moves work to a pool of up to 500 threads.

This creates an obvious bottleneck: If you create 500 subscriptions, all other asynchronous operations that you attempt will just block forever (or until those subscriptions receive data or are closed).

Perhaps the blocking crate isn't the best fit here. And perhaps there should be a bigger warning about everything breaking if you hit this limit.

This issue was first raised in #210. That issue was closed with this comment:

0.15.2 has been published which includes the fix for this

But 0.15.2 does not include a fix or any changes related to the 500 worker limit.

The text was updated successfully, but these errors were encountered:

derekcollison · 2021-09-23T00:48:33Z

Is the 500 a limit we self impose or inherit?

Any thoughts on how you would approach a fix?

Jarema · 2021-09-27T07:56:47Z

@derekcollison I faced this issue in my project and the 500 threads limit is hardcoded in blocking crate. Author already agreed to change that.

    fn grow_pool(&'static self, mut inner: MutexGuard<'static, Inner>) {
        // If runnable tasks greatly outnumber idle threads and there aren't too many threads
        // already, then be aggressive: wake all idle threads and spawn one more thread.
        while inner.queue.len() > inner.idle_count * 5 && inner.thread_count < 500 {

smol-rs/blocking#14

Jarema · 2021-10-02T11:17:49Z

My PR in blocking was merged.
smol-rs/blocking#14

Although because of nature of the blocking crate, it requires setting env var to increase number of threads.

I think we should mention that fact in docs of asynk, together with the information about setting env variable if needed and close this issue.

Question if blocking crate should be used in the future for handling async API is topic for another discussion.

andrewbanchich · 2021-10-12T03:24:15Z

I think we ran into this as well. Our prod deployment inexplicably stopped processing all NATS requests when running more than one stream of tasks. This only affected req / rep streams, not pub / sub.

I removed all of our worker code so that it was literally just replying with PONG and it still just stopped forever.

It would be great if we could consider just using async code from the ground up instead of wrapping blocking code.

caspervonb · 2021-10-31T22:44:06Z

It would be great if we could consider just using async code from the ground up instead of wrapping blocking code.

500 non-cooperative blocking threads is a lot in the context of an async rust crate and is definitively going to hurt through-put.

Long term, that is after feature parity I think we'd be better of shipping both nats and async-nats, the latter using async i/o from async-std.

databasedav · 2021-11-16T00:04:18Z

just landed here and want to highlight these comments from the previous thread to summarize what's going on #210 (comment) #210 (comment) #210 (comment)

and i have a question about the explanation in this comment #210 (comment), i may be misunderstanding the broader implications and advantages of async, but isn't the asynchronicity itself worth some (25% or perhaps even more) throughput decrease? i understand that a single async client is less performant than a single sync client, and the nature of the benchmarking wasn't clear so i might be wrong, but shouldn't each client's throughout be measured relative to some maximum saturation of each client per thread? for example, a single thread can hold more than 1 async client but only 1 sync client, and then 100 threads could house some ideal/maximum saturation of async clients across all the threads but only 100 sync clients. is then the claim that the latter is still 25% better than the former?

i'm fairly new to this space so i'm genuinely curious what the implications of this are given that throughput is obviously a primary point of concern

derekcollison · 2021-11-16T00:47:49Z

We are currently working on the JetStream parity atm, post that work will take up async again in earnest and figure out the best course of action.

Jarema · 2022-07-10T11:17:59Z

@ccbrown & @andrewbanchich as the new async client supports JetStream now and does not suffer from this issue, I assume we can close this ticket?

andrewbanchich · 2022-07-10T13:14:58Z

@Jarema I don't have a chance to confirm since I'm no longer at the company where I experienced this bug.

However, for clarification, the bug wasn't with JetStream - it was with normal NATS.

Jarema · 2022-07-10T13:20:11Z

Well, thats a totally new client (full rewrite) that is using tokio, so there is no reason to expect that the bug is in it.

I know it was not about JetStream.
I just thougt it would be not fair to close the issue when new client is there, but without good feature parity with the old one. Now, when it got JetStream, I think its a good moment to get back to this one.

andrewbanchich · 2022-07-10T13:21:39Z

Makes sense! I'm fine with this being closed. Not sure about @ccbrown

ccbrown closed this as completed Jul 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

all async operations still block forever at 500 concurrent subscriptions #226

all async operations still block forever at 500 concurrent subscriptions #226

ccbrown commented Sep 23, 2021

derekcollison commented Sep 23, 2021

Jarema commented Sep 27, 2021

Jarema commented Oct 2, 2021

andrewbanchich commented Oct 12, 2021 •

edited

caspervonb commented Oct 31, 2021 •

edited

databasedav commented Nov 16, 2021

derekcollison commented Nov 16, 2021

Jarema commented Jul 10, 2022 •

edited

andrewbanchich commented Jul 10, 2022

Jarema commented Jul 10, 2022

andrewbanchich commented Jul 10, 2022

all async operations still block forever at 500 concurrent subscriptions #226

all async operations still block forever at 500 concurrent subscriptions #226

Comments

ccbrown commented Sep 23, 2021

derekcollison commented Sep 23, 2021

Jarema commented Sep 27, 2021

Jarema commented Oct 2, 2021

andrewbanchich commented Oct 12, 2021 • edited

caspervonb commented Oct 31, 2021 • edited

databasedav commented Nov 16, 2021

derekcollison commented Nov 16, 2021

Jarema commented Jul 10, 2022 • edited

andrewbanchich commented Jul 10, 2022

Jarema commented Jul 10, 2022

andrewbanchich commented Jul 10, 2022

andrewbanchich commented Oct 12, 2021 •

edited

caspervonb commented Oct 31, 2021 •

edited

Jarema commented Jul 10, 2022 •

edited