Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to overwrite worker's onmessage when using MODULARIZE #20192

Open
miloszmaki opened this issue Sep 6, 2023 · 14 comments
Open

Unable to overwrite worker's onmessage when using MODULARIZE #20192

miloszmaki opened this issue Sep 6, 2023 · 14 comments

Comments

@miloszmaki
Copy link

Consider this example, which creates a new worker thread. The worker overwrites its onmessage to handle custom messages. Then the main thread sends a custom message to the worker.

#include <cstdio>
#include <emscripten.h>
#include <thread>

int main() {
  printf("hello from main\n");

  std::thread t{[](){
    EM_ASM({
        console.log("hello from thread");
        const def_onmsg = self.onmessage;
        self.onmessage = (e) => {
                console.log("handling message:", e, e.data, e.data.cmd);
                if (e["data"]["cmd"] != "custom1" && e["data"]["cmd"] != "custom2") {
                        def_onmsg(e);
                }
        };
        // self.onmessage({"data": {"cmd": "custom1", "id": 1}}); // this works always
    });
  }};

  EM_ASM({
  // setTimeout(()=>{ // doesn't help either
    const threads = Module["PThread"].pthreads;
    const id = Object.keys(threads)[0];
    const worker = threads[id];
    worker.postMessage({"cmd": "custom2", "id": 2}); // this fails with MODULARIZE
  // },1000);
  });

  return 0;
}

It works properly ("custom2" message gets handled by the worker) when I build with:
emcc main.cpp -o test.html --std=c++20 -pthread -s PTHREAD_POOL_SIZE_STRICT=0

However, it stops working (the console prints error "worker.js received unknown command custom2") when I build with:
emcc main.cpp -o test.html --std=c++20 -pthread -s PTHREAD_POOL_SIZE_STRICT=0 -s MODULARIZE=1 -s EXPORT_ES6=1

My output of emcc -v is:

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.45 (ef3e4e3b044de98e1811546e0bc605c65d3412f4)
clang version 18.0.0 (https://github.com/llvm/llvm-project d1e685df45dc5944b43d2547d0138cd4a3ee4efe)
Target: wasm32-unknown-emscripten
Thread model: posix

Interestingly, it was working with some older versions (at least I checked 3.1.15).

@sbc100
Copy link
Collaborator

sbc100 commented Sep 6, 2023

I'm not sure we have ever officially support doing this kind of thing. We have had others ask about it in the past though so there are at least few folks who want to do it. If we do want to support it we should add some tests that do this so it doesn't break again.

We should also make some kind of official API for going from a pthread ID to a worker handle in JS I suppose?

@miloszmaki
Copy link
Author

I see. That would be great if you decide to support this. I think that being able to establish the communication between workers is quite important. Do you know of any workarounds available currently? One I can think of, is to call some C++ function from JS and then do the thread communication on the C++ side. Do you think it's a good way to go?

@sbc100
Copy link
Collaborator

sbc100 commented Sep 7, 2023

I see. That would be great if you decide to support this. I think that being able to establish the communication between workers is quite important. Do you know of any workarounds available currently? One I can think of, is to call some C++ function from JS and then do the thread communication on the C++ side. Do you think it's a good way to go?

Normally pthread communication happens through shared memory. is there some reason you can't do that in this case? The fact that pthreads are implemented as web workers is kind of an implementation detail that ideally you would not rely on. Is there some reason you need to use postMessage rather than shared memory?

@sbc100
Copy link
Collaborator

sbc100 commented Sep 7, 2023

do the thread communication on the C++ side.

Yes, I would recommend doing your thread communication using the pthread APIs (or C++ thread APIs) and shared memory.

@miloszmaki
Copy link
Author

miloszmaki commented Sep 8, 2023

Is there some reason you need to use postMessage rather than shared memory?

I'm capturing camera frames on the main thread and need to transfer the ownership of a VideoFrame to the worker thread, which handles the frame by processing its data and eventually calling close() on it. Not sure if this is possible using shared memory only.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 8, 2023

Yes, I can see that in that case you would need to use postMessage.

@juj doesn't this sounds like a reasonable use case for postMessage or is there some way this can be done using offscreen canvas perhaps?

@sbc100
Copy link
Collaborator

sbc100 commented Sep 8, 2023

Wait, if you are processing the data using native code then don't you have to copy the data out of the VideoFrame and into linear memory anyway? Why not copy the data out on the main thread?

@miloszmaki
Copy link
Author

Right, I'm copying the data for further processing in C++. However, I'm concerned about the efficiency and would like to offload work from the main thread if possible. I think I will need to try and profile the approach you suggested to know if it works for me.

@juj
Copy link
Collaborator

juj commented Sep 12, 2023

I'm not sure we have ever officially support doing this kind of thing.

I think we should support this. It helps composability with other libraries, and helps users utilize the existing constructs that they might have and know about. Alternatively, we should clearly document that the .onmessage parameter is off limits to end users.

I have tried to do this for the recent libraries, e.g. Wasm Workers already utilize addEventListener instead of .onmessage:

// The Wasm Worker runtime is now up, so we can start processing
// any postMessage function calls that have been received. Drop the temp
// message handler that queued any pending incoming postMessage function calls ...
removeEventListener('message', _wasmWorkerAppendToQueue);
// ... then flush whatever messages we may have already gotten in the queue,
// and clear _wasmWorkerDelayedMessageQueue to undefined ...
_wasmWorkerDelayedMessageQueue = _wasmWorkerDelayedMessageQueue.forEach(_wasmWorkerRunPostMessage);
// ... and finally register the proper postMessage handler that immediately
// dispatches incoming function calls without queueing them.
addEventListener('message', _wasmWorkerRunPostMessage);
even though it is larger code size.

We might code golf the size a bit for this in the future to use .onmessage with a hypothetical WORKERS_ARE_PRIVATE or something along those lines. There are too many tutorials and other articles that showcase modifying .onmessage that it's better to let users have it.

@miloszmaki
Copy link
Author

Why not copy the data out on the main thread?

I tried this approach suggested by @sbc100 and it turns out to work quite well.

@miloszmaki
Copy link
Author

However, I'm still a bit concerned with this new approach, since I need to do some kind of synchronization (e.g. mutex) which requires blocking on the main thread (not recommended, as pointed out in the documentation).

@miloszmaki
Copy link
Author

@sbc100 @juj any updates on that? Is it possible to merge #20235 ?

@sbc100
Copy link
Collaborator

sbc100 commented Oct 2, 2024

I'd be OK with landing #20235 but I think we should add a test/example and some docs about how to do it in the blessed way.

@juj
Copy link
Collaborator

juj commented Oct 7, 2024

Sorry, I never quite had time to design the tests for that PR. I'll see if I can get a bit of time slotted for this soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants