Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"SendableFrames was SendableFrames { acks: false, other: true }, but only ACKs have been written" #1850

Closed
nemethf opened this issue May 7, 2024 · 11 comments · Fixed by #1854
Labels
bug Something isn't working

Comments

@nemethf
Copy link
Contributor

nemethf commented May 7, 2024

I get the backtrace below using the latest commit 1e54758. Can you, please, tell me how should I debug the issue? Thanks.

thread 'gst-quic-runtime' panicked at ~/quinn/quinn-proto/src/connection/mod.rs:903:13:
SendableFrames was SendableFrames { acks: false, other: true }, but only ACKs have been written
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'gst-quic-runtime' panicked at ~/quinn/quinn/src/mutex.rs:140:42:
called `Result::unwrap()` on an `Err` value: PoisonError { .. }
stack backtrace:
   0:     0x7f7fab854486 - std::backtrace_rs::backtrace::libunwind::trace::hbee8a7973eeb6c93
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/../../backtrace/src/backtrace/libunwind.rs:104:5
   1:     0x7f7fab854486 - std::backtrace_rs::backtrace::trace_unsynchronized::hc8ac75eea3aa6899
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x7f7fab854486 - std::sys_common::backtrace::_print_fmt::hc7f3e3b5298b1083
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:68:5
   3:     0x7f7fab854486 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hbb235daedd7c6190
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x7f7fab87d3e0 - core::fmt::rt::Argument::fmt::h76c38a80d925a410
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/fmt/rt.rs:142:9
   5:     0x7f7fab87d3e0 - core::fmt::write::h3ed6aeaa977c8e45
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/fmt/mod.rs:1120:17
   6:     0x7f7fab851e4f - std::io::Write::write_fmt::h78b18af5775fedb5
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/io/mod.rs:1810:15
   7:     0x7f7fab854264 - std::sys_common::backtrace::_print::h5d645a07e0fcfdbb
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:47:5
   8:     0x7f7fab854264 - std::sys_common::backtrace::print::h85035a511aafe7a8
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:34:9
   9:     0x7f7fab855937 - std::panicking::default_hook::{{closure}}::hcce8cea212785a25
  10:     0x7f7fab855699 - std::panicking::default_hook::hf5fcb0f213fe709a
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:292:9
  11:     0x7f7fab855dc8 - std::panicking::rust_panic_with_hook::h095fccf1dc9379ee
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:779:13
  12:     0x7f7fab855ca2 - std::panicking::begin_panic_handler::{{closure}}::h032ba12139b353db
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:657:13
  13:     0x7f7fab854986 - std::sys_common::backtrace::__rust_end_short_backtrace::h9259bc2ff8fd0f76
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:171:18
  14:     0x7f7fab855a00 - rust_begin_unwind
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:645:5
  15:     0x7f7fab4dc8f5 - core::panicking::panic_fmt::h784f20a50eaab275
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:72:14
  16:     0x7f7fab4dcf33 - core::result::unwrap_failed::h03d8a5018196e1cd
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/result.rs:1649:5
  17:     0x7f7fab5eb22d - core::result::Result<T,E>::unwrap::h2a363cce2c150898
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/result.rs:1073:23
  18:     0x7f7fab5eb22d - quinn::mutex::non_tracking::Mutex<T>::lock::hae8113dcc534427a
                               at ~/quinn/quinn/src/mutex.rs:140:24
  19:     0x7f7fab5df713 - <quinn::connection::ConnectionRef as core::ops::drop::Drop>::drop::he98c22e794feb227
                               at ~/quinn/quinn/src/connection.rs:886:26
  20:     0x7f7fab5de8ed - core::ptr::drop_in_place<quinn::connection::ConnectionRef>::h4888864623947ccc
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/ptr/mod.rs:507:1
  21:     0x7f7fab5de8ed - core::ptr::drop_in_place<quinn::connection::ConnectionDriver>::h2521dedde294d72a
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/ptr/mod.rs:507:1
  22:     0x7f7fab5e4ad8 - quinn::connection::Connecting::new::{{closure}}::h18914da83de132f9
                               at ~/quinn/quinn/src/connection.rs:66:40
  23:     0x7f7fab5e4ad8 - <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll::h1d757db3fbc3d594
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tracing-0.1.40/src/instrument.rs:321:9
  24:     0x7f7fab5d7958 - <core::pin::Pin<P> as core::future::future::Future>::poll::h2d990df112587ec7
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/future/future.rs:124:9
  25:     0x7f7fab5d7958 - tokio::runtime::task::core::Core<T,S>::poll::{{closure}}::h6dd936cb2d2a55f9
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/core.rs:328:17
  26:     0x7f7fab5d7958 - tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::hef780cdb20e9c1ca
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/loom/std/unsafe_cell.rs:16:9
  27:     0x7f7fab5d7958 - tokio::runtime::task::core::Core<T,S>::poll::h7fa3b8fa16e2e37b
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/core.rs:317:13
  28:     0x7f7fab5e1f25 - tokio::runtime::task::harness::poll_future::{{closure}}::h1fc582a33c1be0f4
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/harness.rs:485:19
  29:     0x7f7fab5e1f25 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h0f139c2d2fce7789
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panic/unwind_safe.rs:272:9
  30:     0x7f7fab5e1f25 - std::panicking::try::do_call::hb0b7b8aa23f7b13a
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:552:40
  31:     0x7f7fab5e1f25 - std::panicking::try::h958518d396e49bd2
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:516:19
  32:     0x7f7fab5e1f25 - std::panic::catch_unwind::h01c4acdfbf8f9316
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panic.rs:142:14
  33:     0x7f7fab5e1f25 - tokio::runtime::task::harness::poll_future::h9cb2112b9c6596b7
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/harness.rs:473:18
  34:     0x7f7fab5e230b - tokio::runtime::task::harness::Harness<T,S>::poll_inner::hf4cf8b64687a797f
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/harness.rs:208:27
  35:     0x7f7fab5e2764 - tokio::runtime::task::harness::Harness<T,S>::poll::h53d8212280722229
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/harness.rs:153:15
  36:     0x7f7fab740eb7 - tokio::runtime::task::raw::RawTask::poll::hd004d59887f9711a
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/raw.rs:201:18
  37:     0x7f7fab740eb7 - tokio::runtime::task::LocalNotified<S>::run::hea2540b641084be6
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/mod.rs:416:9
  38:     0x7f7fab740eb7 - tokio::runtime::scheduler::multi_thread::worker::Context::run_task::{{closure}}::h719f7ea389ab8c49
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/scheduler/multi_thread/worker.rs:639:17
  39:     0x7f7fab740eb7 - tokio::runtime::coop::with_budget::h82ea195ed6007e8c
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/coop.rs:107:5
  40:     0x7f7fab740eb7 - tokio::runtime::coop::budget::hccb8b31d371bba32
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/coop.rs:73:5
  41:     0x7f7fab740eb7 - tokio::runtime::scheduler::multi_thread::worker::Context::run_task::h6cf1fabe7b7478b5
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/scheduler/multi_thread/worker.rs:575:9
  42:     0x7f7fab73f67a - tokio::runtime::scheduler::multi_thread::worker::Context::run::h20d7306ff0b6d749
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/scheduler/multi_thread/worker.rs:526:24
  43:     0x7f7fab73771e - tokio::runtime::scheduler::multi_thread::worker::run::{{closure}}::{{closure}}::h5c3abab3a364d2d7
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/scheduler/multi_thread/worker.rs:491:21
  44:     0x7f7fab73771e - tokio::runtime::context::scoped::Scoped<T>::set::h2fe87a79fa62de5e
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/context/scoped.rs:40:9
  45:     0x7f7fab748d04 - tokio::runtime::context::set_scheduler::{{closure}}::h899fbba7b7b4fa30
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/context.rs:176:26
  46:     0x7f7fab748d04 - std::thread::local::LocalKey<T>::try_with::h67ad726b9ce48286
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/thread/local.rs:270:16
  47:     0x7f7fab748d04 - std::thread::local::LocalKey<T>::with::h680a5068d626340e
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/thread/local.rs:246:9
  48:     0x7f7fab7533ec - tokio::runtime::context::set_scheduler::h0de1ea243b044de5
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/context.rs:176:9
  49:     0x7f7fab7533ec - tokio::runtime::scheduler::multi_thread::worker::run::{{closure}}::h2c5d448ecd61279a
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/scheduler/multi_thread/worker.rs:486:9
  50:     0x7f7fab7533ec - tokio::runtime::context::runtime::enter_runtime::h15d83ea2e5803e1a
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/context/runtime.rs:65:16
  51:     0x7f7fab73ede0 - tokio::runtime::scheduler::multi_thread::worker::run::h1e62f9b5ec602f42
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/scheduler/multi_thread/worker.rs:478:5
  52:     0x7f7fab74de02 - tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{{closure}}::h4761a582ff48d3b1
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/scheduler/multi_thread/worker.rs:447:45
  53:     0x7f7fab74de02 - <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll::had55906b4380e013
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/blocking/task.rs:42:21
  54:     0x7f7fab75405a - tokio::runtime::task::core::Core<T,S>::poll::{{closure}}::h2a34826b9bb0df09
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/core.rs:328:17
  55:     0x7f7fab75405a - tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::h35f775c14173b776
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/loom/std/unsafe_cell.rs:16:9
  56:     0x7f7fab75405a - tokio::runtime::task::core::Core<T,S>::poll::hbfc2596834ec0926
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/core.rs:317:13
  57:     0x7f7fab74d7e5 - tokio::runtime::task::harness::poll_future::{{closure}}::h9923ee752a6f7651
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/harness.rs:485:19
  58:     0x7f7fab74d7e5 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h3febb4f199c35cc6
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panic/unwind_safe.rs:272:9
  59:     0x7f7fab74d7e5 - std::panicking::try::do_call::he6e0d8e55a893679
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:552:40
  60:     0x7f7fab74d7e5 - std::panicking::try::hac66d9c0ce95257d
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:516:19
  61:     0x7f7fab74d7e5 - std::panic::catch_unwind::ha83461197d4a195a
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panic.rs:142:14
  62:     0x7f7fab74d7e5 - tokio::runtime::task::harness::poll_future::h5aa2264b85d3c064
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/harness.rs:473:18
  63:     0x7f7fab74cd4b - tokio::runtime::task::harness::Harness<T,S>::poll_inner::h7ea79aeb4ee83126
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/harness.rs:208:27
  64:     0x7f7fab74cc45 - tokio::runtime::task::harness::Harness<T,S>::poll::h177fd5e77edc0506
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/harness.rs:153:15
  65:     0x7f7fab744cd4 - tokio::runtime::task::raw::RawTask::poll::hd004d59887f9711a
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/raw.rs:201:18
  66:     0x7f7fab744cd4 - tokio::runtime::task::UnownedTask<S>::run::h06d0fa5ac21b1f42
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/mod.rs:453:9
  67:     0x7f7fab744cd4 - tokio::runtime::blocking::pool::Task::run::hd73dbe1a5378ccd9
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/blocking/pool.rs:159:9
  68:     0x7f7fab744cd4 - tokio::runtime::blocking::pool::Inner::run::h6f7ec22049dadbe0
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/blocking/pool.rs:513:17
  69:     0x7f7fab73c30d - tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}::h2ce9d5cd21cdd9c4
                               at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/blocking/pool.rs:471:13
  70:     0x7f7fab73c30d - std::sys_common::backtrace::__rust_begin_short_backtrace::h2e58a1e9fa309d5b
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:155:18
  71:     0x7f7fab7473fc - std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}::hce965f342ceb2c1e
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/thread/mod.rs:529:17
  72:     0x7f7fab7473fc - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h9bf04f6f67ad2d05
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panic/unwind_safe.rs:272:9
  73:     0x7f7fab7473fc - std::panicking::try::do_call::hcc56899aa1094f5e
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:552:40
  74:     0x7f7fab7473fc - std::panicking::try::h9c75f928a365025d
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:516:19
  75:     0x7f7fab7473fc - std::panic::catch_unwind::h26ab19e38ca17e14
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panic.rs:142:14
  76:     0x7f7fab7473fc - std::thread::Builder::spawn_unchecked_::{{closure}}::h5e31bfda1cd11716
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/thread/mod.rs:528:30
  77:     0x7f7fab7473fc - core::ops::function::FnOnce::call_once{{vtable.shim}}::h187e6116a3838c9c
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/ops/function.rs:250:5
  78:     0x7f7fab858c85 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h12de4fc57affb195
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2015:9
  79:     0x7f7fab858c85 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h3c619f45059d5cf1
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2015:9
  80:     0x7f7fab858c85 - std::sys::unix::thread::Thread::new::thread_start::hbac657605e4b7389
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys/unix/thread.rs:108:17
  81:     0x7f7fac5c9134 - start_thread
                               at ./nptl/pthread_create.c:442:8
  82:     0x7f7fac6497dc - __GI___clone3
                               at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
  83:                0x0 - <unknown>
thread 'gst-quic-runtimethread '<unnamed>' panicked at ~/quinn/quinn/src/mutex.rs:140:42:
called `Result::unwrap()` on an `Err` value: PoisonError { .. }
' panicked at library/core/src/panicking.rs:163:5:
panic in a destructor during cleanup
thread caused non-unwinding panic. aborting.
@Ralith
Copy link
Collaborator

Ralith commented May 7, 2024

Thanks for the report! This indicates that Connection::space_can_send determined that non-ACK frames could be sent, but Connection::populate_packet failed to write any non-ACK frames. In other words, we predicted that packet assembly would send some data, but it didn't. Likely some logic in can_send_1rtt is inconsistent with the corresponding check in populate_packet or the helpers it calls.

Are you transmitting large application datagrams on the endpoint where this panic occurs? If so, #1837 is a likely culprit. In particular I wonder if 003b1d5 or ad4919b might be responsible. I can think of at least one likely broken pattern:

  1. A near-maximum-size application datagram is queued
  2. space_can_send determines that the datagram may be sent, which implies that an ACK-only packet may not be sent.
  3. populate_packet writes an ACK frame to the next outgoing packet
  4. populate_packet determines that insufficient space remains to include the queued application datagram
  5. The completed packet remains ACK-only, triggering the debug_assert.

In short, I think the actual behavior here is correct, and we should arrange for the assert not to fire, either by relaxing the assert somehow or by allowing can_send.acks to be set according to PendingAcks::can_send on the packet space in question even when application data is queued. The application datagram will be sent in the following packet regardless, where it is guaranteed to fit because there are no longer any new ACKs to send.

I'll try to draft a fix for at least that case this week, though more info is needed from you to provide confidence that this is specifically what you're hitting. Do you have a consistent repro?

@Ralith Ralith added the bug Something isn't working label May 7, 2024
@Ralith
Copy link
Collaborator

Ralith commented May 7, 2024

On second thought, I suspect the issue I described above predates #1837, though if that's not what you're experiencing then it's still a plausible culprit.

@Ralith
Copy link
Collaborator

Ralith commented May 8, 2024

Drafted some related cleanup in #1851. Testing indicates that the hypothesis I presented above is incorrect; we already handled that case properly, so I remain unsure what the root cause might be. I don't expect that PR will fix your issue, though it makes some of the code involved easier to follow. Let us know if you have a repro, and I'll look around a little more for other possible causes.

@nemethf
Copy link
Contributor Author

nemethf commented May 8, 2024

Are you transmitting large application datagrams on the endpoint where this panic occurs?

Yes, I am. (Independently of this issue, it seems send_datagram puts the datagram into a queue. What happens when max_datagram_size decreases before the datagram is sent out from the queue?)

If so, #1837 is a likely culprit.

We usually run our tests with all the offloading features disabled with ethtool otherwise wireshark is not able to dissect QUIC packets. Now I've run a video transmission test with offloading enabled, and it seems there is a regression, but this might be the result of some modifications to our local application. I'll create a separate issue after a detailed investigation. But I think this bug is probably not GSO related.

Unfortunately, #1851 does not fix the issue. The attached patch deterministically triggers the bug for me. If the server wants to send just 10 datagrams instead of the current 100, then the panic does not occur.

0001-Modify-example-to-demostrate-bug-1850.patch.txt

(I think it would be helpful if the repository contained a datagram example. It took me a long time to figure out a way to keep the connection alive until an ApplicationClosed arrives. I'm happy to contribute, but I cannot come up with a good example. Maybe a simple file transfer with datagrams?)

@Ralith
Copy link
Collaborator

Ralith commented May 8, 2024

Thanks for the repro! I see the same behavior. I'll investigate.

As a temporary workaround, it looks like sending smaller datagrams avoids the assert, probably by leaving enough room for whatever other frames are competing for space in the packet. 1KiB would be a very safe threshold.

It took me a long time to figure out a way to keep the connection alive until an ApplicationClosed arrives.

Did you see Connection::closed? Typically you'd just wait for your actual workload (in this case the task calling send_datagram) to complete. It's not obvious to me how this differs from stream-oriented patterns.

@Ralith
Copy link
Collaborator

Ralith commented May 8, 2024

This will be fixed by #1854.

@djc djc closed this as completed in #1854 May 8, 2024
@nemethf
Copy link
Contributor Author

nemethf commented May 9, 2024

This will be fixed by #1854.

I confirm this problem disappeared with the git HEAD. Thank you.

It took me a long time to figure out a way to keep the connection alive until an ApplicationClosed arrives.

Did you see Connection::closed?

I don't know what I was thinking. I didn't checked the documentation :( It is clear and easy to understand.


Concerning the current solution, if I understand correctly, then applications are better off sending smaller datagrams than max_datagram_size() because that way they can achieve higher performance by not sending small, non-piggybacked ACK packets. Am I right? If so, wouldn't it make sense to have a next_datagram_size() that takes into account control-like quic frames in the next packet. (We segment a byte-stream into datagrams, which might not be a good idea after all.)

@djc
Copy link
Collaborator

djc commented May 9, 2024

I confirm this problem disappeared with the git HEAD. Thank you.

Note you can just upgrade your dependencies with cargo update this should be fixed, since we released quinn-proto 0.11.1 last night.

@djc
Copy link
Collaborator

djc commented May 9, 2024

(We segment a byte-stream into datagrams, which might not be a good idea after all.)

Right, that works against the grain of the protocol. What were you trying to achieve with using datagrams in this design?

@nemethf
Copy link
Contributor Author

nemethf commented May 9, 2024

(We segment a byte-stream into datagrams, which might not be a good idea after all.)

Right, that works against the grain of the protocol. What were you trying to achieve with using datagrams in this design?

We try to optimize media transmission in different network environments. Our baseline is RTP/UDP. It would be good to see how QUIC performs in comparison to the baseline when both use unreliable transmission. We also measure when the media is transmitted in QUIC streams.

@Ralith
Copy link
Collaborator

Ralith commented May 9, 2024

Concerning the current solution, if I understand correctly, then applications are better off sending smaller datagrams than max_datagram_size() because that way they can achieve higher performance by not sending small, non-piggybacked ACK packets. Am I right?

The intention is that these voluntary ACKs are sent infrequently enough for this not to be a significant factor. If you observe it imposing meaningful costs in practice, we have a lot of room to dial the rate further, too.

If so, wouldn't it make sense to have a next_datagram_size() that takes into account control-like quic frames in the next packet.

I don't think so. Most packets do not include extra control frames, and packet scheduling and assembly is only loosely bound to application call patterns. The only cases where a call like this would give you accurate information is when you're only sending one packet at a time, without any pipelining or queuing, which is exactly when overhead matters the least. Otherwise, it's difficult to be confident that the datagrams you queue will, in fact, be in the next packet scent.

(We segment a byte-stream into datagrams, which might not be a good idea after all.)

Real-time streaming media is a complex case. Using a stream gives the QUIC implementation much more flexibility in packet assembly, but is subject to head-of-line blocking, which I presume you wish to avoid.

A good middle ground might be to use many independent streams and take care that your receiver is willing to read them concurrently/out-of-order, where each stream is a data unit that's not useful unless received in full. You could even reset such streams when they become too old, though that may not be worth the effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants