-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update tokio to 1.26.0 #31055
update tokio to 1.26.0 #31055
Conversation
Ran bench-tps tests with/without the fix, found no significant difference: master: (aa3e0b9) tpu-client [2023-04-04T21:16:01.483767949Z INFO solana_bench_tps::bench] http://35.233.177.221:8899 | 18163.48 | 348329
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't merge this. Tokio has been pinned because of this long-standing issue: #24644
There is a lot of motivation to update, but the issue has been a critical one for RPC operators.
We (and by we, I mean mostly @mschneider and team) have just started actively working with RPC providers to repro, instrument, test various newer tokio versions.
Codecov Report
@@ Coverage Diff @@
## master #31055 +/- ##
=========================================
- Coverage 81.5% 81.5% -0.1%
=========================================
Files 728 728
Lines 205737 205737
=========================================
- Hits 167695 167688 -7
- Misses 38042 38049 +7 |
Withdrawing this PR due to the RPC concerns -- but I think we need to root cause that to move forward we are many versions behind the Tokio main line. For the Deadlock I mentioned here, there is a workaround, the Endpoint has to be dropped carefully in order to avoid it: any client reference to the same Endpoint must be dropped first before the listener (server code) is dropped on it. This requires careful ordering of our services when it is being shutdown. |
Problem
Noticed deadlock when a Quinn endpoint is dropped. To support repair using quic, we now return the endpoint used for the streamer to the caller so that the client to use that same endpoint to talk to the server side. The server side can send the response to the same endpoint. We have found sporadic hang in the Tokio code when a Quinn Endpoint is dropped.
The hung stack looks the following:
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00005640aa854a65 in parking_lot::raw_mutex::RawMutex::lock_slow ()
#2 0x00005640ac46d56d in tokio::io::driver::scheduled_io::ScheduledIo::wake0 ()
#3 0x00005640ac4736a2 in <tokio::io::driver::Inner as core::ops::drop::Drop>::drop ()
#4 0x00005640abfa32d5 in alloc::sync::Arc::drop_slow ()
#5 0x00005640abf9fe9b in core::ptr::drop_in_place<tokio::park::either::Eithertokio::process::imp::driver::Driver,tokio::park::thread::ParkThread> ()
#6 0x00005640abfa22a1 in alloc::sync::Arc::drop_slow ()
#7 0x00005640abfa2a17 in alloc::sync::Arc::drop_slow ()
#8 0x00005640abfa27c2 in alloc::sync::Arc::drop_slow ()
#9 0x00005640abfa9636 in tokio::runtime::task::harness::Harness<T,S>::dealloc ()
#10 0x00005640ac44cda9 in <tokio::io::driver::registration::Registration as core::ops::drop::Drop>::drop ()
#11 0x00005640abfa59b3 in core::ptr::drop_in_placetokio::net::udp::UdpSocket ()
#12 0x00005640aac12e11 in alloc::sync::Arc::drop_slow ()
#13 0x00005640aac0ba66 in core::ptr::drop_in_placequinn::endpoint::Endpoint ()
#14 0x00005640aac13df9 in alloc::sync::Arc::drop_slow ()
#15 0x00005640aac1577d in alloc::sync::Arc::drop_slow ()
#16 0x00005640aaa0402b in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#17 0x00005640aac5ec39 in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
#18 0x00005640ac5b8923 in std::sys::unix::thread::Thread::new::thread_start ()
#19 0x00007fe5f23096db in start_thread (arg=0x7fda55015700) at pthread_create.c:463
#20 0x00007fe5f1c9461f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Summary of Changes
Upgrade Tokio version to 1.26.0 which is free of such hang.
Fixes #