multithread and singlethread executors differ #3798

Ekleog · 2024-01-20T00:22:04Z

This is a continuation of a discord chat with @daxpedda.

The issue arises in this specific part of the wasm-bindgen code when using IndexedDB, with these circumstances:

wasm-bindgen/crates/futures/src/task/multithread.rs

Lines 135 to 158 in 64e4a25

    
           // Unlike `singlethread.rs` we are responsible for ensuring there's 
        
           // a closure to handle the notification that a Future is ready. In 
        
           // the single-threaded case the notification itself enqueues work, 
        
           // but in the multithreaded case we don't know what thread a 
        
           // notification comes from so we need to ensure the current running 
        
           // thread is the one that enqueues the work. To do that we execute 
        
           // `Atomics.waitAsync`, creating a local Promise on our own thread 
        
           // which will resolve once `Atomics.notify` is called. 
        
           // 
        
           // We could be in one of two states as we execute this: 
        
           // 
        
           // * `SLEEPING` - we'll get notified via `Atomics.notify` 
        
           //   and then this Promise will resolve. 
        
           // 
        
           // * `AWAKE` - the Promise will immediately be resolved and 
        
           //   we'll execute the work on the next microtask queue. 
        
           Poll::Pending => { 
        
               match wait_async(&self.atomic.state, SLEEPING) { 
        
                   Some(promise) => drop(promise.then(&inner.closure)), 
        
                   // our state has already changed so we can just do the work 
        
                   // again inline. 
        
                   None => continue, 
        
               } 
        
           }

(This is all pseudo-code for clarity)

Assume I have access to an IDBDatabase. This is normal code (for indexed_db_futures, indexed-db already hits the same issue before due to supporting one more use case, but let's ignore that).
I call db.transaction("foo"). This starts up a transaction. From that point on, as soon as the application returns to the browser's event loop without a request pending on that transaction, it'll auto-commit.
I call request = transaction.object_store("bar").get("baz"). This opens up a request that will return me the contents of baz.
I call request.await on that request. Currently, what basically all IndexedDB crates do is, they will just return Poll::Pending from the task that called the await, and use the onsuccess/onerror callbacks of request to wake that task. This is the part that changes:

For singlethread, the task is immediately waked. This means that the browser event loop was never reached, and thus that the transaction, as expected, didn't auto-commit.
For multithread, AFAIU a Promise is made and awaited upon before the awaiting task runs. This means that the browser event loop is reached (though it almost instantly resumes execution). In particular, transaction will auto-commit at that step.

Now, just after request.await, I call transaction.object_store("bar").put("baz", "quux").await. This works only for the singlethread runtime, because on multithread the transaction has already committed.

The way IndexedDB is designed to be used, the put call should have been put inside the onsuccess handler itself, but trying to use it this way makes it very uncomfortable to use, unfortunately, with very significant rightward drift and hard-to-use lifetime behavior. Hence why IndexedDB crates try to "hide" that fact by using async functions as state machines.

It is possible to reproduce the issue by using this code:

wasm_bindgen_test_configure!(run_in_browser);

#[wasm_bindgen_test]
async fn reproducer() {
    let factory = web_sys::window().unwrap().indexed_db().unwrap().unwrap();
    let db_req = Rc::new(factory.open_with_u32("reproducer", 1).unwrap());
    let (tx, rx) = futures_channel::oneshot::channel();
    let db_req2 = db_req.clone();
    let onupgradeneeded = Closure::once(move |_: web_sys::Event| {
        db_req2
            .result()
            .unwrap()
            .dyn_into::<IdbDatabase>()
            .unwrap()
            .create_object_store("example")
            .unwrap();
    });
    db_req.set_onupgradeneeded(Some(
        &onupgradeneeded.as_ref().dyn_ref::<Function>().unwrap(),
    ));
    let onsuccess = Closure::once(|_: web_sys::Event| tx.send(()).unwrap());
    db_req.set_onsuccess(Some(&onsuccess.as_ref().dyn_ref::<Function>().unwrap()));
    rx.await.unwrap();
    let db = db_req.result().unwrap().dyn_into::<IdbDatabase>().unwrap();
    // We now have an IDBDatabase. Start a transaction, and run one request on it.
    let transaction = db.transaction_with_str("example").unwrap();
    let req = transaction
        .object_store("example")
        .unwrap()
        .get(&JsString::from("foo"))
        .unwrap();
    let (tx, rx) = futures_channel::oneshot::channel();
    let onsuccess = Closure::once(|_: web_sys::Event| tx.send(()).unwrap());
    req.set_onsuccess(Some(&onsuccess.as_ref().dyn_ref::<Function>().unwrap()));
    // getting a non-existent value will succeed with a null result
    rx.await.unwrap();
    // Here, at this await, singlethread will continue rightaway and multithread will go back to event loop once before resuming
    // This results in the line below failing on multithread, but passing on singlethread
    transaction
        .object_store("example")
        .unwrap()
        .get(&JsString::from("bar"))
        .unwrap();
}

To be completely honest, I do believe that this is a bug of the reproducer: nothing in the specification of async executors guarantees that calling .await with a ready task will not first return to the event loop. In particular, if the executor is work-stealing, then it's very possible that the ready task will be awoken on the other thread where it's already cached, which will naturally force the current thread to go back to the event loop and thus the transaction to commit.

As such, I have written a fix for my code, which is available here.

However, considering the fact that basically all other IndexedDB crates have the same bug (basically, not working with the multi-threaded executor), I'm opening this issue for their sake. I will also refrain from landing the fix for my code until the question of whether this is treated as a bug in the wasm-bindgen executor is answered, as it introduces one (tiny) bit of unsafe code and changes the public API of the crate.

This being said, I do think that the new implementation is more in line with how IndexedDB developers intended it to be used, and so would have no issues at all landing it and releasing indexed-db v4 :)

The text was updated successfully, but these errors were encountered:

Ekleog · 2024-01-20T00:25:14Z

I almost forgot: Kudos to @Fedeparma74 for noticing the bug (in the pair of wasm-bindgen and that version of my code) in this PR :)

daxpedda · 2024-01-20T13:17:26Z

I have confirmed that this isn't a bug in wasm-bindgen-futures.

My fear was that the single-threaded implementation might sometimes run the future immediately, which would differ from the multi-threaded implementation and would be a bug indeed!
But with the help of Scheduler.postTask() I've confirmed that both implementations correctly take at least one tick.

(Both wasm-bindgen-futures single- and multi-threaded implementations use queueMicrotask, which actually doesn't return to the browser event loop. But the success events in the example actually don't get triggered without returning to the event loop. So my confirmation is a bit misleading, but nonetheless show that there is no bug in wasm-bindgen-futures in the context of this example.)

To go back to why the example fails though: my understanding is that you can schedule a bunch of requests on the same transaction, which will then auto-commit after a while. But if you schedule any after it had committed, it will fail. Awaiting between scheduling requests might be enough time to get the transaction to auto-commit. This is triggered in the multi-threaded implementation of wasm-bindgen-futures because it takes a bit longer, enough to make the difference.

I will close the issue, but I'm happy to re-open it if anybody finds a flaw in my findings here!

Ekleog · 2024-01-20T20:26:46Z

Copying here the discord exchange that followed for posterity:

@daxpedda

@Ekleog I experimented with a fix in wasm-bindgen-futures, basically the "solution" is to add more artificial delays in the single-threaded implementation.
Apparently both single- and multi-threaded return to the event loop, which I tested by using Scheduler.postTask(), which got successfully triggered before the await finished.

So my conclusion is that wasm-bindgen-futures works correctly, it's just that the multi-threaded implementation takes one tick longer, but both take at least one tick.

To go back to why it fails though: my understanding is that you can schedule a bunch of requests on the same transaction, which will then auto-commit after a while. But if you schedule any after it had committed, it will fail. Awaiting between scheduling requests might be enough time to get the transaction to auto-commit. This is triggered in the multi-threaded implementation of wasm-bindgen-futures because it takes 1 tick longer, enough to make the difference.

AFAICT you might want to consider using the IDBTransaction.complete event to know when the transaction has been completed.

Please let me know if you think I missed anything!
(also you mentioned specializing the multi-threaded implementation in the case of a single-threaded use-case, I have no clue how that could be done, so I'm all ears)

@Ekleog

I see, thank you for the investigation! You're entirely right that as soon as the task returns to the event loop with no in-flight request it's all implementation-dependent behavior, so browsers are allowed to, or not, commit the transaction. I thought the behavior was correct with singlethread but it sounds like even there it's just falling through the cracks of the implementations.

As for specializing the multi-thread implementation, what I was thinking of was adding a field to the Waker that identifies its thread and, upon waking, if the waker is being called from the same thread, immediately enqueue a microTask with the to-be-awoken future, rather than go through the atomics logic. AFAIU this being a microTask would make it trigger before returning to the browser's event loop, and thus make the end-result for indexeddb the same.

It's a bit handwavy still but I think this has some likelihood of working, because I'm pretty sure all indexeddb callbacks get triggered on the worker they originated from.

However, whether the change is worth it is an even less obvious question, especially as you already pointed out that there's a return to the event loop anyway and so even with this change validity of the library would still be implementation-dependent. WDYT?

(As for the fix on my end, unfortunately it's more complex because oncomplete gets called too late to schedule additional requests; to support the proper behavior I need to run everything in the request's onsuccess/onerror callbacks rather than use a channel to run that on the main task; that said the implementation I have seems to work fine and I think is not actually implementation-dependent :))

@Ekleog

Oh and I just read your message on the github, I'd just like to check: when you say it returns to the event look, do I understand correctly that you mean between the tx.send and the rx.await?
It's expected that there'll be a return to event loop before onsuccess is called, as indexeddb will not start the request before then; but the validity of current indexeddb crate versions depends on whether there's a return to the event loop between the time onsuccess (on the first .get()'s return) is triggered and the time the second .get() is called

@daxpedda

I mean the return to the event loop happens before the await finishes.
It happens more then once even.

@Ekleog

Sorry if I'm being dense, I have no idea how to test this out 😅 but just to be sure I understand correctly, these returns to event loop happen after onsuccess was triggered and before .await resolves, right?
That makes me think I should have written the correct code too for reference: the correct-as-per-the-spec reproducer would not use a channel and just call .get() directly inside the onsuccess callback; this reproducer works fine on all executors currently

@daxpedda

No, they happen before .await resolves, it probably doesn't happen between triggering onsuccess and resolving .await, as that is probably near instantanious.

web_sys::console::log_1(&"before 2".into());
let _ = web_sys::window()
    .unwrap()
    .scheduler()
    .post_task_with_options(
        &Closure::once_into_js(|| web_sys::console::log_1(&"main thread".into()))
            .unchecked_into(),
        SchedulerPostTaskOptions::new().priority(TaskPriority::UserBlocking),
    );
rx.recv().await.unwrap();
web_sys::console::log_1(&"after 2".into());

@daxpedda

As for specializing the multi-thread implementation, what I was thinking of was adding a field to the Waker that identifies its thread and, upon waking, if the waker is being called from the same thread, immediately enqueue a microTask with the to-be-awoken future, rather than go through the atomics logic.

queueMicrotask happens on both implementations already before anything happens.
The problem isn't the call to the atomics logic, because it returns immediately if it has already been awoken.
The issue is simply that waking up directly vs waiting for Atomics.waitAsync to resolve is just faster.

So unless we can somehow ascertain beforehand that the task can't be woken up from another thread I don't know how that could be done.

I'm happy to look at a PR though.

@Ekleog

Oh ok I see the problem. Thank you for your answers! I'm not at computer right now, but will try out your debugging steps and report 🙂

@Ekleog

Uuuuh so this was a weird story: running with --cfg web_sys_unstable_apis makes the issue stop reproducing. Maybe because it switches between polyfill and browser implementation for waitAsync?

Anyway, I tried with this code:
https://github.com/Ekleog/indexed-db/blob/675ee4a8bf9499ab96035c251f2305952b827b80/tests/js.rs#L10-L70
And with a patched web-sys to remove the #[cfg] so that I don't need the rustflags.

With these changes I can confirm that between the onsuccess callback and the rx.await, in the multithread case there's a return to the event loop, and not in the singlethread case.

That said, as you mentioned I don't have a good suggestion for a solution. Do you confirm this behavior will not be changed in the foreseeable future, and I should land my api-breaking fix? 🙂

@daxpedda

web_sys_unstable_apis doesn't affect wasm-bindgen-futures, but the delay you introduce by running the additional code might.
The whole issue is that this is a race condition, so consider w/e you do to trigger it here or there random.
Take my opinion with a grain of salt, unfortunately I'm clueless about IndexedDB, so I might have missed something.

But no, I don't see how exactly we could "fix" wasm-bindgen-futures, though I'm happy to look at any PR doing so.

Ekleog added the bug label Jan 20, 2024

This was referenced Jan 20, 2024

make sure to work fine with the multi-threaded executor Ekleog/indexed-db#3

Merged

Allow use in a web worker Ekleog/indexed-db#2

Merged

daxpedda closed this as completed Jan 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multithread and singlethread executors differ #3798

multithread and singlethread executors differ #3798

Ekleog commented Jan 20, 2024 •

edited

Ekleog commented Jan 20, 2024 •

edited

daxpedda commented Jan 20, 2024

Ekleog commented Jan 20, 2024

multithread and singlethread executors differ #3798

multithread and singlethread executors differ #3798

Comments

Ekleog commented Jan 20, 2024 • edited

Ekleog commented Jan 20, 2024 • edited

daxpedda commented Jan 20, 2024

Ekleog commented Jan 20, 2024

Ekleog commented Jan 20, 2024 •

edited

Ekleog commented Jan 20, 2024 •

edited