Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom worker runs multiple times when compiled with pthreads #19210

Closed
TheTrio opened this issue Apr 20, 2023 · 12 comments
Closed

custom worker runs multiple times when compiled with pthreads #19210

TheTrio opened this issue Apr 20, 2023 · 12 comments

Comments

@TheTrio
Copy link

TheTrio commented Apr 20, 2023

Here's the output of emcc -v

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.5 (d33328b22db1aa62a51c50731e9645951a2038bc)
clang version 15.0.0 (https://github.com/llvm/llvm-project f3bc7fd5465a3a919388b6a0307553ef4a6d39c9)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /usr/lib/emsdk/upstream/bin

I have defined a worker which does nothing but call the emscripten glue code(a.out.js).

// file: my_worker.js

console.log('called')
importScripts('a.out.js')

The C++ code is irrelevant I think. I used to following to reproduce the issue

// file: main.cpp
#include <emscripten/bind.h>
using namespace emscripten;
EMSCRIPTEN_BINDINGS(my_module){}

The JS does just one thing - it creates a worker from worker.js

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <link rel="icon" type="image/svg+xml" href="/vite.svg" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <script>
      const worker = new Worker('./my_worker.js')
    </script>
  </head>
</html>

I compile the code with the following command emcc main.cpp --bind -s PTHREAD_POOL_SIZE=4 -pthread -O2 -s ALLOW_MEMORY_GROWTH=1

Notice that the worker is executed multiple times(confirmed by the multiple console.logs). You can also see in the dev tools that its requested more than once by a.out.worker.js which I find confusing.

image

Is this expected behavior? Why is a.out.worker.js executing my_worker.js? In this example, this perhaps isn't that big of an issue. However, the real code I had defined onmessage on the worker as follows

// file: worker.js
importScripts('a.out.js')
onmessage = (e) => ...

The problem was, since this code was being run by a.out.worker.js as well, it redefined their onmessage and my worker was getting a bunch of events I didn't expect it to.

Is there some flag or some configuration I'm missing here? Thanks!

@sbc100
Copy link
Collaborator

sbc100 commented Apr 20, 2023

The 4 requests would seem to correspond to the 4 worker threads that correspond to PTHREAD_POOL_SIZE=4.

The new threads should load a.out.worker.js which should then load a.out.js. Are you sure it not the case that this is 4 loads of a.out.worker.js ? That would make some sense, but it would make any sense that your my_worker.js would be loaded 4 times.

@TheTrio
Copy link
Author

TheTrio commented Apr 20, 2023

Are you sure it not the case that this is 4 loads of a.out.worker.js ?

I don't think so. In the screenshot I posted above, you can see the multiple fetches to my_worker.js. This is in the 4th column. The 5th column is the initiator - which is a.out.worker.js

In either case, the console.log can only appear from my worker so I'm pretty certain its being called from the other workers from some reason

@sbc100
Copy link
Collaborator

sbc100 commented Apr 20, 2023

Can you using devtools to figure out why my_worker.js would be loaded by a.out.worker.js .. it odd since I don't see how the emscripten-generated code could even know about the existence or my_worker.js? Perhaps you can share the backtrace that leads to the loading the the code?

@TheTrio
Copy link
Author

TheTrio commented Apr 20, 2023

Looks like its here in a.out.worker.js (I've added the console.log)

// file a.out.worker.js

Module['ENVIRONMENT_IS_PTHREAD'] = true
      if (typeof e.data.urlOrBlob == 'string') {
        console.log('loading ' + e.data.urlOrBlob)
        importScripts(e.data.urlOrBlob)

image

@TheTrio
Copy link
Author

TheTrio commented Apr 20, 2023

I think I've figured it out - this is part of a.out.js which does a postmessage on the workers

// file: a.out.js
worker.postMessage({
      cmd: 'load',
      urlOrBlob: Module['mainScriptUrlOrBlob'] || _scriptDir,
      wasmMemory: wasmMemory,
      wasmModule: wasmModule,
    })

and _scriptDir is defined as my_worker.js. That happens here

// file: a.out.js
if (ENVIRONMENT_IS_WORKER) {
  _scriptDir = self.location.href
}

Indeed, if I rewrite my worker as follows, everything works as expected(if indeed it is expected to have a.out.js run multiple times)

console.log('called')
var Module = {
  mainScriptUrlOrBlob: 'a.out.js',
}
importScripts('a.out.js')

@sbc100
Copy link
Collaborator

sbc100 commented Apr 20, 2023

I see, so when you load a.out.js within a worker like that you need tell it its own name. I guess it can't figure it out on its own and it just uses self.location.href a best guess.

Sounds like you found the solution using the mainScriptUrlOrBlob fix which I assume is indented for this very purpose.

@TheTrio
Copy link
Author

TheTrio commented Apr 20, 2023

Looks like its documented here, although I don't see how I could've found it without knowing exactly what I was looking for 🤷

Either way, I guess nothing survives in this issue?

@sbc100
Copy link
Collaborator

sbc100 commented Apr 20, 2023

Yes, I think this can be closed.

@TheTrio
Copy link
Author

TheTrio commented Apr 20, 2023

Well, hopefully this prevents someone from scratching their heads in confusion in the future.

Thanks for your help! Closing this.

@TheTrio TheTrio closed this as completed Apr 20, 2023
@TheTrio
Copy link
Author

TheTrio commented Apr 21, 2023

This is unrelated but it arises out of this - is there a way to bundle a.out.js into a.out.worker.js? importScripts in a.out.worker.js does a network request and so a.out.js is fetched multiple times, slowing things down.

@sbc100
Copy link
Collaborator

sbc100 commented Apr 21, 2023

We don't currently have a way to do that no. Presumably it would result in the same number of bytes over the network though?

@TheTrio
Copy link
Author

TheTrio commented Apr 21, 2023

Yes, it will probably be cached by the browser but I think bundling might be more performant overall.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants