Skip to content

Conversation

sbc100
Copy link
Collaborator

@sbc100 sbc100 commented Oct 1, 2025

See #19046

@sbc100 sbc100 requested a review from kripken October 1, 2025 21:12
@sbc100 sbc100 enabled auto-merge (squash) October 1, 2025 21:12
@sbc100 sbc100 merged commit 09534bb into emscripten-core:main Oct 1, 2025
33 checks passed
@sbc100 sbc100 deleted the coverage branch October 1, 2025 23:08
@juj
Copy link
Collaborator

juj commented Oct 2, 2025

Remarkably, the new test combination added in this PR fails consistently in Firefox, except for Firefox Nightly:

image

The reason for this is that Atomics.waitAsync() was not implemented in Firefox, until very recently, 15 days ago.

We have a polyfill for Atomics.waitAsync in

// Chrome 87 shipped Atomics.waitAsync:
// https://www.chromestatus.com/feature/6243382101803008
// However its implementation is faulty:
// https://bugs.chromium.org/p/chromium/issues/detail?id=1167541
// Firefox Nightly 86.0a1 (2021-01-15) does not yet have it:
// https://bugzilla.mozilla.org/show_bug.cgi?id=1467846
// And at the time of writing, no other browser has it either.
#if MIN_CHROME_VERSION < 91 || MIN_SAFARI_VERSION != TARGET_NOT_SUPPORTED || MIN_FIREFOX_VERSION != TARGET_NOT_SUPPORTED || ENVIRONMENT_MAY_BE_NODE
// Partially polyfill Atomics.waitAsync() if not available in the browser.
// Also polyfill for old Chrome-based browsers, where Atomics.waitAsync is
// broken until Chrome 91, see:
// https://bugs.chromium.org/p/chromium/issues/detail?id=1167541
// https://github.com/tc39/proposal-atomics-wait-async/blob/master/PROPOSAL.md
// This polyfill performs polling with setTimeout() to observe a change in the
// target memory location.
$polyfillWaitAsync__postset: `if (!Atomics.waitAsync || (globalThis.navigator?.userAgent && Number((navigator.userAgent.match(/Chrom(e|ium)\\/([0-9]+)\\./)||[])[2]) < 91)) {
let __Atomics_waitAsyncAddresses = [/*[i32a, index, value, maxWaitMilliseconds, promiseResolve]*/];
function __Atomics_pollWaitAsyncAddresses() {
let now = performance.now();
let l = __Atomics_waitAsyncAddresses.length;
for (let i = 0; i < l; ++i) {
let a = __Atomics_waitAsyncAddresses[i];
let expired = (now > a[3]);
let awoken = (Atomics.load(a[0], a[1]) != a[2]);
if (expired || awoken) {
__Atomics_waitAsyncAddresses[i--] = __Atomics_waitAsyncAddresses[--l];
__Atomics_waitAsyncAddresses.length = l;
a[4](awoken ? 'ok': 'timed-out');
}
}
if (l) {
// If we still have addresses to wait, loop the timeout handler to continue polling.
setTimeout(__Atomics_pollWaitAsyncAddresses, 10);
}
}
#if ASSERTIONS && WASM_WORKERS
if (!ENVIRONMENT_IS_WASM_WORKER) err('Current environment does not support Atomics.waitAsync(): polyfilling it, but this is going to be suboptimal.');
#endif
/**
* @param {number=} maxWaitMilliseconds
*/
Atomics.waitAsync = (i32a, index, value, maxWaitMilliseconds) => {
let val = Atomics.load(i32a, index);
if (val != value) return { async: false, value: 'not-equal' };
if (maxWaitMilliseconds <= 0) return { async: false, value: 'timed-out' };
maxWaitMilliseconds = performance.now() + (maxWaitMilliseconds || Infinity);
let promiseResolve;
let promise = new Promise((resolve) => { promiseResolve = resolve; });
if (!__Atomics_waitAsyncAddresses[0]) setTimeout(__Atomics_pollWaitAsyncAddresses, 10);
__Atomics_waitAsyncAddresses.push([i32a, index, value, maxWaitMilliseconds, promiseResolve]);
return { async: true, value: promise };
};
}`,
#endif
$polyfillWaitAsync__internal: true,
$polyfillWaitAsync: () => {
// nop, used for its postset to ensure `Atomics.waitAsync()` polyfill is
// included exactly once and only included when needed.
// Any function using Atomics.waitAsync should depend on this.
},
. The fact that the new added test combo fails in Firefox, seems to suggest that something is not up-to-par in our polyfill.

@sbc100
Copy link
Collaborator Author

sbc100 commented Oct 2, 2025

Oh man, that could be a pain to debug.

I guess the polyfill must be in use a in a lot of places in Firefox already though right? Basically all pthread tests depend on this right? So it must be some strange edge case that is being hit here, in code that works most of the time?

@juj
Copy link
Collaborator

juj commented Oct 2, 2025

It will have to be a very specific condition when the Atomics.waitAsync() wait comes into play. It is not used in general: I never added an use of it into our system libraries that I could remember. (because I considered it to be just broken and misdesigned at the time) I vaguely recall reading through @tlively 's mailbox/proxying queue impl. that maybe it might have had an use, but can't find one now either.

@sbc100
Copy link
Collaborator Author

sbc100 commented Oct 2, 2025

It will have to be a very specific condition when the Atomics.waitAsync() wait comes into play. It is not used in general: I never added an use of it into our system libraries that I could remember. (because I considered it to be just broken and misdesigned at the time) I vaguely recall reading through @tlively 's mailbox/proxying queue impl. that maybe it might have had an use, but can't find one now either.

Yes, its used in _emscripten_thread_mailbox_await which is used in basically all pthread-based programs to wake up threads when they have work to do. IIUC.

@sbc100
Copy link
Collaborator Author

sbc100 commented Oct 2, 2025

I guess it only happens when pthreads exit to their event loop, which might be be too common?

sbc100 added a commit to sbc100/emscripten that referenced this pull request Oct 2, 2025
The code didn't take into about that system_libs could also contains
flags such as `--whole-archive`.

In this particular test the value of `system_libs` is:

```
['-lGL-getprocaddr', '-lal', '-lhtml5', '-lstubs-debug', '-lc-debug', '-ldlmalloc-debug', '-lcompiler_rt', '--whole-archive', '-lc++-debug-noexcept', '--no-whole-archive', '-lc++abi-debug-noexcept', '-lsockets', '--whole-archive', '-llsan_rt', '--no-whole-archive', '-llsan_common_rt', '-lsanitizer_common_rt']
```
sbc100 added a commit to sbc100/emscripten that referenced this pull request Oct 2, 2025
The code didn't take into about that system_libs could also contains
flags such as `--whole-archive`.

In this particular test the value of `system_libs` is:

```
['-lGL-getprocaddr', '-lal', '-lhtml5', '-lstubs-debug', '-lc-debug', '-ldlmalloc-debug', '-lcompiler_rt', '--whole-archive', '-lc++-debug-noexcept', '--no-whole-archive', '-lc++abi-debug-noexcept', '-lsockets', '--whole-archive', '-llsan_rt', '--no-whole-archive', '-llsan_common_rt', '-lsanitizer_common_rt']
```
sbc100 added a commit that referenced this pull request Oct 2, 2025
This failure was causing the emscripten-reeleases roller to fail.

The code didn't take into about that system_libs could also contains
flags such as `--whole-archive`.

In this particular test the value of `system_libs` is:

```
['-lGL-getprocaddr', '-lal', '-lhtml5', '-lstubs-debug', '-lc-debug', '-ldlmalloc-debug', '-lcompiler_rt', '--whole-archive', '-lc++-debug-noexcept', '--no-whole-archive', '-lc++abi-debug-noexcept', '-lsockets', '--whole-archive', '-llsan_rt', '--no-whole-archive', '-llsan_common_rt', '-lsanitizer_common_rt']
```

Fixes: #25472
@tlively
Copy link
Member

tlively commented Oct 2, 2025

We only use Atomics.waitAsync when it is available and otherwise fall back to using postMessage. I suppose that code probably tries to use the polyfill, although it might be better if it didn't.

@sbc100
Copy link
Collaborator Author

sbc100 commented Oct 2, 2025

We only use Atomics.waitAsync when it is available and otherwise fall back to using postMessage. I suppose that code probably tries to use the polyfill, although it might be better if it didn't.

Ah I see, so maybe the polyfill is only being injected under certain circumstances?

@juj
Copy link
Collaborator

juj commented Oct 2, 2025

The polyfill is always being injected if the target browser does not support Atomics.waitAsync().

The issue is that in order to be 100% correct, it is not possible to polyfill Atomics.waitAsync() without also monkey-patching Atomics.notify(), to provide a separate channel for notifying the emulated waitAsync() waiters.

To avoid that issue, I implemented the polyfill waitAsync() as a setTimeout() loop that polls the waitAsync() target memory location, and notifies waitAsync() when the memory location changes value.

This matches the idea of futex wait "sleep this thread as long as this memory location has its current value X." And it works as long as the notifier will change the value of the memory location before notifying. That is the pattern that is used when implementing the common mutexes/semaphores/barries etc.

Reading the way that atomics.wait/.notify is being used by the mailbox implementation, it seems that it uses futex sleep/wake as a signal. When the wakeup notification is done, the memory location value is not changed, but the location is just notified. The polyfill waitAsync() cannot detect this as being a notification, so it never wakes up the waitAsync() waiter.

In general notifying without first changing the value is racy when locks/semaphores/barriers are concerned - changing the value helps implement the atomicity needed in the sleep+wakeup., But this is probably not an issue in the mailbox implementation.

To resolve this problem, three possible ways: remove the waitAsync polyfill, or restrict the polyfill to not cover the mailbox (essentially mailbox has its own polyfill), or change the mailbox impl. to also change the memory location value before waking up the waiter.

juj added a commit to juj/emscripten that referenced this pull request Oct 4, 2025
…polyfill over the usage pattern in Emscripten mailbox implementation. See emscripten-core#25449 (comment) . The mailbox implementation has its own postMessage() fallback, so does not need the polyfill. Fixes test browser.test_gl_textures_pthreads_main_module in Firefox."
juj added a commit that referenced this pull request Oct 6, 2025
Delete the Atomics.waitAsync() polyfill, which is not good enough to
polyfill over the usage pattern in Emscripten mailbox implementation.
See
#25449 (comment)
. The mailbox implementation has its own postMessage() fallback, so does
not need the polyfill. Fixes test
browser.test_gl_textures_pthreads_main_module in Firefox."

This will be a breaking change to any users who successfully depended on
the polyfill (i.e. only needed the partial support it brought), though
it is uncertain if there are any.

Fixes test `browser.test_gl_textures_pthreads_main_module` on Firefox.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants