-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve compatibility with tokio #1367
Comments
On further thought, a more general description of the issue is "holding an I believe that The deadlock scenario is:
Thread 2 is blocked on the mutex, and therefore the IO reactor is stalled. Thread 1 is stalled waiting for IO to complete. No progress can be made - deadlock. |
Right, I believe the issue here is in thread 1 there - we are in a sync context, take a syn mutex, and then move to an async context. In general we should never be going from a sync context to (blocking on) an async context. |
Another variation of this is likely the cause of L2-Technology/sensei#73:
Thread 1 can't make progress, because the task it is waiting is starved on thread 2. Thread 2 can't make progress because of the sync lock wait. My previous conclusion was that it's never safe to spawn user tasks on the runtime LDK uses. That's why we use the following code to spawn lnrod user tasks in a separate runtime: |
I don't quite understand how that is a deadlock - futures aren't tied to a thread, so a thread 3 should have no problem working on the future from thread 1, at least if I understand tokio correctly. |
I think tokio can't move tasks between threads in general - it can steal work in some cases as an optimization, but not in every case. |
Ah! tokio discord agrees - there's nothing preventing work from being stolen, but tokio doesn't do it aggressively, if other threads are sleeping they aren't going to wake up to steal work. |
This fixes a deadlock issue that appears with too many concurrent requtests. This solution was taken from devrandom's issue: lightningdevkit/rust-lightning#1367 Another solution that would probably work is that lightning net tokio wraps our sync calls with `tokio::task::spawn_blocking(|| sync_call()).await.unwarp()`, but not sure how would this affect performace of other users with no async code requirements. Note that peer_manager_tests now don't need to be anotated with `flavor = "multi_thread"`, that's because the runtime we block_on (inside lightning net tokio) is the artificial runtime created in `api::lightning::server` which is already "multi_thread"
This fixes a deadlock issue that appears with too many concurrent requtests. This solution was taken from devrandom's issue: lightningdevkit/rust-lightning#1367 Another solution that would probably work is that lightning net tokio wraps our sync calls with `tokio::task::spawn_blocking(|| sync_call()).await.unwarp()`, but not sure how would this affect performace of other users with no async code requirements. Note that peer_manager_tests now don't need to be anotated with `flavor = "multi_thread"`, that's because the runtime we block_on (inside lightning net tokio) is the artificial runtime created in `api::lightning::server` which is already "multi_thread"
This fixes a deadlock issue that appears with too many concurrent requtests. This solution was taken from devrandom's issue: lightningdevkit/rust-lightning#1367 Another solution that would probably work is that lightning net tokio wraps our sync calls with `tokio::task::spawn_blocking(|| sync_call()).await.unwarp()`, but not sure how would this affect performace of other users with no async code requirements. Note that peer_manager_tests now don't need to be anotated with `flavor = "multi_thread"`, that's because the runtime we block_on (inside lightning net tokio) is the artificial runtime created in `api::lightning::server` which is already "multi_thread"
This fixes a deadlock issue that appears with too many concurrent requtests. This solution was taken from devrandom's issue: lightningdevkit/rust-lightning#1367 Another solution that would probably work is that lightning net tokio wraps our sync calls with `tokio::task::spawn_blocking(|| sync_call()).await.unwarp()`, but not sure how would this affect performace of other users with no async code requirements. Note that peer_manager_tests now don't need to be anotated with `flavor = "multi_thread"`, that's because the runtime we block_on (inside lightning net tokio) is the artificial runtime created in `api::lightning::server` which is already "multi_thread"
Background
The VLS project is using gRPC to connect to a remote signer. This uses tokio/tonic.
Symptom
If the gRPC client is created in the default tokio runtime, a deadlock may result.
Discussion
The
lightning-net-tokio
calls intoPeerManager
sync code. This code takes a lock on thePeerManager.peers
Mutex, which is held across any calls to the signer.The
ldksample
callsPeerManager.get_peer_node_ids
indo_connect_peer
, which also takes a lock on said Mutex.The above sync code can stall the I/O reactor, resulting in the signer gRPC call never completing and the
peers
Mutex never being released.A possible solution is to always call
task::block_in_place
or perhapsHandle::spawn_blocking
when calling any Mutex-using sync code from insidelightning-net-tokio
andldksample
.The text was updated successfully, but these errors were encountered: