async: always defer task wakes via ngx_post_event#295
Conversation
`schedule()` ran `runnable.run()` synchronously when a task was woken from outside its own poll (`woken_while_running == false`). That violates the `Waker::wake()` contract (wakes must be non-blocking and non-re-entrant): when a wake fires from a `Drop` that holds a lock the woken task also needs — e.g. h2's `Streams::drop` waking its `Connection` task while holding `Arc<Mutex<Inner>>` — the synchronous re-poll re-enters and deadlocks on that lock. Always defer the wake via `ngx_post_event` instead; the runnable is re-polled on the next event-loop tick by `ngx_event_process_posted`. On the single-threaded event loop that is one worker-local list insert — one tick of latency. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a freestanding test in `async_::spawn` (no deps beyond `async_task`) reproducing the deadlock fixed by the previous commit: a `Drop` impl wakes a parked task while holding a lock. With synchronous re-poll the re-poll finds the lock still held — the deadlock signature, surfaced via `Mutex::try_lock` returning `WouldBlock` so the test cannot hang; with deferred wakes the re-poll acquires the lock cleanly. The test supplies its own `schedule` functions, so no NGINX event loop is required. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
I was asked to review this to identify if it could cause any headaches within NGINX WASM. I appreciate that this change will also make the interleaving of WASM guest execution (or any rust async code) with other components in the NGINX worker process easier to reason about. |
|
What are the chances 😲? I can't do an official review here but: lgtm, and thanks! |
Do you have a source for this? Neither the current documentation nor the design doc at https://github.com/rust-lang/rfcs/blob/master/text/2592-futures.md includes such a contract. There's a very good reason why the scheduler was implemented like this: all IO events and callbacks from the nginx event loop must be handled synchronously. Otherwise, we'll start observing some rare but nonetheless amusing consequences: operation results can be freed at the callback exit, file descriptors can be incorrectly registered for subsequent operations on the event loop, etc. It would make sense to extend the |
|
Thanks @bavshin-f5, you're right about the wording. "Contract" was the wrong word; it was more the general idea that I don't think inline re-poll can be made safe in general. The failure is deterministic: when We've seen it with h2, and now it looks like hyper too, so I don't think it's specific to one library. I'd expect it to surface in any library that wakes a task while holding one of its own private locks that the woken task then tries to acquire, which I think is a legal thing for a library to do: the Waker API permits waking from any context, and the executor has no way to detect that the caller is holding such a lock. Fixing it upstream library-by-library would be whack-a-mole across an open-ended set; the executor is the one place where deferring is robust against all of them at once. On extending On the synchronous-IO concern...where you know the code better than I do, so correct me if I've read it wrong. As I understand it, your concern assumes a design where the nginx event callback itself does the IO: reads the socket, owns the result, touches fd registration, so a task must run before the callback returns. If that's the model, I agree deferral breaks it. However, when I run For what it's worth, enqueuing the woken task instead of re-polling it inline is what the mainstream executors already do. I checked three: Tokio's LocalSet pushes the woken task onto its queue ( I appreciate your time on this, either way. |
I talked to @alexcrichton, who created Waker, about this. As far as he knows, there is nothing in the Rust project docs that say you can't poll a future on the call stack of Waker::wake(), and in the early days of building executors, many implementations did do so. Practically, all executors that did so no longer do so, and the Rust ecosystem contains lots of Future impls that, whether by use of locks or unsafe, call wake with the assumption of no major side effects (such as polling a Future) will happen synchronously. Given that this issue bites us on real code today with a deadlock and could potentially bite with less-detectable bad behavior in the future, I think the best way to treat this is that Waker has an invariant which is not reflected in the Rust docs, and thats a deficiency of those docs. |
|
The main difference between all the mainstream async executors and our one is the entire nginx doing its work behind the scenes. It's not a simple event loop like libevent, libuv or mio, it's a complete, complex server application that accepts connections, handles errors, and can decide to drop our async task context without a notification, before we even have a chance to handle this gracefully. If we have to respect the undocumented Waker invariant, as an nginx developer, I no longer believe it is possible to build a safe async abstraction over the nginx event loop. We'll need to drop everything we currently have, and likely return to the approach of running tokio as a sidecar. I'd prefer to find an approach that does not require such drastic measures. There are some less important issues, such as always queuing events to the Interestingly enough, a similar deadlock was reported to h2 in 2021. It was stated directly that there's a problem in h2, but the change in tokio that caused the task to be shutdown too early got reverted, and nobody bothered to fix h2.
Edit: I realized that the scenario as described should not possible, because we already left the call stack with the lock. |
Don't throw out the baby with the bathwater. 1. nginx as an async executor.This is valuable on its own, so handlers can be async and we can do non-blocking i/o, which is crucial. We could use hickory-resolver instead of the Resolution future and hyper on TokioIo instead of PeerConnection, although my recent benchmarks show that nginx futures might be preferable for efficiency reasons. Worst case, with just an nginx executor and a bridge into its epoll which wakes it in response to external events (e.g. eventfd), you have another epoll loop in the tokio runtime thread. The nginx epoll would handle the incoming connection fds, etc. and tokio's epoll the fds for hyper clients, file i/o, etc.1 2. futures wrapping nginx internals like Resolution, PeerConnectionIf nginx frees resources after the completion callback returns, the future itself should move the read into the callback, so the data is available at poll time. I don't know of any concrete examples of this, but you clearly have something in mind? Maybe you can share one, which would make it easier for me to wrap my head around. I believe read/recv handlers don't generally require this, e.g. PeerConnection can recv() at poll time, nothing in nginx recv()'s our data away from that socket. In my benchmarks, I'm using your PeerConnection with both In summary, I would say that 1. is a necessity, and 2. is a nice-to-have, and that the issues you are mentioning are related to 2. So how to get 1., i.e. the ability to have handlers async? use std::sync::OnceLock;
use async_compat::CompatExt;
use ngx::core::Status;
use ngx::ffi::{
NGX_HTTP_MODULE, ngx_array_push, ngx_command_t, ngx_conf_t, ngx_http_handler_pt,
ngx_http_module_t, ngx_http_phases_NGX_HTTP_PRECONTENT_PHASE, ngx_int_t, ngx_module_t,
};
use ngx::http::{self, HTTPStatus, HttpModule, Request};
use ngx::http::{HttpModuleMainConf, NgxHttpCoreModule};
use ngx::{http_request_handler, ngx_modules};
use tokio::runtime::Runtime;
use ngx_tickle::prelude::*;
static UPSTREAM: &str = "example.com";
async fn async_handler_compat(request: &mut Request) {
// future 1, poll #1 -> nginx thread, safe to use Request, and other !Send
let response = reqwest::get(format!("{UPSTREAM}/{}", request.path()))
// reqwest starts a hyper driver task internally using tokio::spawn().
// The compat runtime is a tokio new_current_thread in a secondary thread, and it sets up a
// context that makes global tokio::spawn() work, using that runtime.
// Let's call that future 2 -> compat thread
.compat()
.await
// future 1, poll #2 -> nginx thread again
.unwrap();
request.add_header_out("X-example-status", &format!("{}", response.status()));
finalize_request(request, HTTPStatus::NO_CONTENT.into());
}
http_request_handler!(entry_handler_compat, |request: &mut http::Request| {
request.spawn(async_handler_compat).unwrap(); // RequestSpawn, see below
Status::NGX_AGAIN
});
async fn async_handler_tokio(request: &mut Request) {
// future 1, poll #1 -> nginx thread (safe to use Request, and other !Send)
let path = request.path().to_str().unwrap().to_string();
let response = tokio_runtime()
.spawn(async move {
// future 2 -> tokio thread, *not* safe to use !Send, but the compiler will stop you:
// reqwest::get(format!("{UPSTREAM}/{}", request.path()))
//
// error: future cannot be sent between threads safely
// --> examples/compat.rs:45:36
// |
// 45 | let response = tokio_runtime().spawn(async move {
// | ^^^^^ future created by async block is not `Send`
// |
// = help: within `{async block@examples/compat.rs:45:42: 45:52}`, the trait `Send` is not implemented for `*mut u8`
// note: captured value is not `Send` because `&mut` references cannot be sent unless their referent is `Send`
// --> examples/compat.rs:46:47
// |
// 46 | reqwest::get(format!("{UPSTREAM}/{}", request.path()))
// | ^^^^^^^ has type `&mut ngx::http::Request` which is not `Send`, because `ngx::http::Request` is not `Send`
// note: required by a bound in `Runtime::spawn`
// --> /home/p/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.52.3/src/runtime/runtime.rs:241:21
// |
// 239 | pub fn spawn<F>(&self, future: F) -> JoinHandle<F::Output>
// | ----- required by a bound in this associated function
// 240 | where
// 241 | F: Future + Send + 'static,
// | ^^^^ required by this bound in `Runtime::spawn`
// again, reqwest spawns an internal hyper driver task, let's call that future 2.1 -> tokio thread
reqwest::get(format!("{UPSTREAM}/{}", path)).await.unwrap()
})
.await
.unwrap();
// future 1, poll #2 -> nginx thread again
request.add_header_out("X-example-status", &format!("{}", response.status()));
finalize_request(request, HTTPStatus::NO_CONTENT.into());
}
http_request_handler!(entry_handler_tokio, |request: &mut http::Request| {
request.spawn(async_handler_tokio).unwrap(); // RequestSpawn, see below
Status::NGX_AGAIN
});
static RUNTIME: OnceLock<Runtime> = OnceLock::new();
fn tokio_runtime() -> &'static Runtime {
RUNTIME.get_or_init(|| {
// or new_current_thread, but doesn't make a difference at this point
tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()
.expect("runtime")
})
}In this example, future 1 runs in the nginx executor. It can use !Send prevents you from accidentally borrowing Request in future 2, the compiler will stop you. That is why you want something like For some context: we are using the ngx-tickle approach for a customer project now. We use some tokio futures, mainly reqwest and file i/o, and I wrote a RequestBody future to read the client body with the helpful pointers you've given me in #222. In it, I'm just collecting buf pointers in the completion handler, but only call ngx_http_finalize_request (for the ngx_http_read_client_request_body "subrequest") after I have read them fully. We have an extensive black-box test suite (external to that repo), are now in production, and do load-testing regularly. There are no issues in the "twilight zone" (segfaults, deadlocks, ...) that would arise from unsafe behaviour. So I do know what you are talking about, but the crashes you mean were, in our experience, category 2 — futures misusing the nginx APIs — not the scheduler itself. Now, after derailing this discussion with my thread-safe schedule/Waker stuff again 🙂 , let me bring it back: A very rare deadlock that happens when hyper connections are Dropped, related to a lock guarding its connection pool that is held while notifying a task waiting for a connection. ngx-tickle had the exact same recursive-poll behavior, so the waking of a connection waiter ran in-line, and the woken task tries to acquire the same lock re-entrantly, which it is not prepared for. Footnotes
|
We do have to respect the Waker invariant. Dropping everything we currently have isn't a realistic approach, we have multiple projects in progress that depend on async rust on top of the nginx event loop, and switching to a sidecar would amount to a rewrite on those projects and would require escalation way up the business chain for the disruption to deliverables that would cause. There are already numerous problems with using Rust safely on top of nginx's C abstractions, and any additional problems this creates will need to be tackled systematically.
I don't love AsyncRead's design but its an immovable part of the world this project is living in, so the next step is to start building the functionality to make this possible, and make it reasonable for the users of these async Rust specific abstractions to be able to understand and manage the costs of doing things the way they must. |
Fixes #294.
schedule()ranrunnable.run()synchronously when a task was woken fromoutside its own poll. But
Waker::wake()may be called from any context,including a
Dropthat is holding a lock the woken task also needs — e.g. h2'sStreams::drop— and re-polling inline then re-enters that task on the caller'sstack and deadlocks on the held lock. Since the executor can't tell whether a
wake's caller is holding such a lock, it shouldn't re-poll inside
wake()atall. This always defers the wake via
ngx_post_event(one event-loop tick).runnable.run(); alwaysSCHEDULER.schedule().async_task) — synchronous re-pollreproduces the held-lock deadlock signature, deferred re-poll avoids it; no
NGINX event loop needed.
Verified on Linux + macOS aarch64:
cargo test/clippy --all-targets -Dwarnings/
fmt --checkall clean.