Module.newWorker() to allow loading hosted libs via blob #8338

hostilefork · 2019-03-23T17:59:32Z

(I wouldn't mind tinkering around and providing this as a PR, if the idea was considered sound.)

When you build a library that uses pthreads in emscripten, it produces an additional my-lib.worker.js file.

If you try to put your main my-lib.js in a location besides where your page is being served from, and tell locateFile() that the my-lib.worker.js file is right there with it...you get a bit of "browser security theatre":

Uncaught (in promise) DOMException: Failed to construct 'Worker':
Script at 'http://example.com/my-lib.worker.js' cannot be accessed from
origin 'http://example.com'.

The failing call originates from within library-pthread.js, which is built into the my-lib.js file when you are using pthreads.

Clients don't want to be editing either of these files. But if you could do so, and if your hosting location has CORS enabled, you might workaround by changing:

var worker = new Worker(pthreadMainJs)

To fetch() the JS into a Blob and then use:

var worker = new Worker(URL.createObjectURL(blob))

I tested this and it works, so we're having to do a post-build patch step to rewrite the build product.

But it would be better if emscripten had a Module.newWorker = function (url, callback) {...} option. That could default to being implemented as just callback(new Worker(url)). Then interested parties could provide their own replacement, such as:

Module.newWorker = function(url, callback) {
    fetch(url)
        .then(function(response) {
             if (!response.ok)
                 throw Error(response.statusText)
             return response.blob()
         })
         .then(function(blob) {
             callback(new Worker(URL.createObjectURL(blob)))
         })
 }

Does that sound reasonable? I've gathered Emscripten doesn't want to take Promises for granted, but if it did this could use them instead of a callback.

The text was updated successfully, but these errors were encountered:

The worker can now be fetch()'d remotely via CORS from the S3 instance. emscripten-core/emscripten#8338

kripken · 2019-03-27T20:31:52Z

Interesting. I'm surprised such a workaround for CORS is necessary, in that I'm surprised it's so easy to work around it...

As to adding a new Module hook - in general we are trying to avoid adding more, as they increase code size, unless absolutely necessary. I think you can polyfill this in JS, so it's not necessary here, that is, you can monkeypatch Worker itself in your code?

hostilefork · 2019-03-28T17:23:06Z

I think you can polyfill this in JS, so it's not necessary here, that is, you can monkeypatch Worker itself in your code?

The worker.js calling code is expecting a synchronous result to be returned, and that it is a functional worker with the supported API. There's a completely different contract between new Worker() giving back such an object, and being willing to accept a contract to get that object later. Hence the callback parameter.

Without that, you'd need a way to handle every synchronous call to a worker (present and future API) via an asynchronous implementation. That's not something JavaScript can do, e.g. no top-level AWAITs.

If it worked for this use case, it would be non-trivial...it would mean giving back a stub object from a new Worker() call that wasn't itself a worker, but had (for instance) a postMessage() handler. That handler would have to be able to accept messages before the worker was ready, and put them in its own queue. When the actual worker loaded, it would have to process that deferred list of messages. etc.

kripken · 2019-03-28T18:12:34Z

I think you can still polyfill it? Yes, new Worker returns a worker synchronously, which could contain a polyfill object that proxies the various APIs, and "buffers" as needed - so if you postMessage before the real underlying worker is available, it would wait to send that message until it is, etc.

Can even use a JS Proxy object to support future APIs, but yeah, there might be special logic that is extra work, so it's not 100% future-proof.

I realize this isn't easy to do, but adding a new hook in emscripten has a cost too. On the other hand, maybe there's a way to implement that without adding code size or complexity?

hostilefork · 2019-03-28T18:40:06Z

I realize this isn't easy to do, but adding a new hook in emscripten has a cost too.

It may be the case that workers are such that there are no purely synchronous demands of them (or at least emscripten vows to not use any such APIs), so a proxy atop a worker that may not be in existence at the time of the API calls is a technical possibility.

But... though I can't speak for everyone... if my only two choices were to design/build/debug/maintain this artificial proxying object that simulates buffers for a Worker it doesn't have yet...vs. do a post-build patch of the worker.js output...I'd definitely pick patching the worker.js output to say newWorker(pthreadInitUrl).then(...). :-/

As to adding a new Module hook - in general we are trying to avoid adding more, as they increase code size, unless absolutely necessary.

Code-size-wise, this strikes me as a rather minimal in the scheme of things. It need not affect non-pthread builds...and anyone using pthreads is probably prepared for some size. And in that build, all it's asking for is there be an overridable default:

newWorker = function(url, callback) { callback(new Worker(url)) }

I'd be happy enough supplying the fetch()-based override hook in client code, without EMSCRIPTEN_USE_FETCH_FOR_WORKERS doing it automatically or having that code sitting inert in library-pthread.js for people who aren't using it.

Perhaps the long tail of new Worker() will sort this out and start working someday without any of this rigmarole. In the meantime, this seems pretty important (in that I don't think I am the only one who will be interested in having pthread-enabled code load from a domain other than the one in the URL bar)...

kripken · 2019-04-01T20:24:31Z

I think those are good points.

Meanwhile I realized, can't the locateFile API work for this?

hostilefork · 2019-04-01T20:47:33Z

Meanwhile I realized, can't the locateFile API work for this?

I don't see how--as it's a synchronous API that returns a URL. And so long as that URL is on a remote domain and you don't go through the fetch API, the browser won't be happy.

kripken · 2019-04-01T21:07:55Z

Yeah, true - I wonder if we shouldn't consider making it an async API, as a general fix for this type of issue. Not a simple change though.

hostilefork · 2019-04-01T22:09:30Z

I wonder if we shouldn't consider making it an async API, as a general fix for this type of issue.

You mean locateFile would be able to give either URLs or Blobs asynchronously? That might be a forward-looking choice. (...though making you rethink all the locateFile() callsites)

If you wanted backwards compatibility, you could have the callback be the second parameter, and if locateFile() had a return value then use that...else if undefined, assume it's coming later by way of callback.

kripken · 2019-04-02T21:44:03Z

Yeah, that might be a good long-term plan, but would indeed require changing all the locations.

Another option here might be to do the async operation before - that is, before you load the main JS, load the worker JS and prepare the object URL etc., and stash that, and then you can hand it off synchronously in locateFile. A way to do that easily might be of general use perhaps (for any file we try to load)?

prestomation · 2019-04-18T22:28:34Z

@kripken This idea works. Something like the following:

const base = "https://my/cross/origin/base/url;
const workerURL = `${base}/module.worker.js`;
const workerBlob = await(await fetch(workerURL)).blob();
const workerObjectURL = URL.createObjectURL(workerBlob);
const module = await new Module({
    "locateFile": name => name.endsWith(".worker.js") ? workerObjectURL : `${base}/${name}`
});

hostilefork · 2019-04-18T23:30:20Z

This idea works.

That is an interesting feature to know about, and it seems URL.createObjectURL has fairly wide support (at least, as much support as other things being used here)...

https://caniuse.com/#feat=bloburls

But in practice that would seem to run against the "you can't use async/await at the top level" issue:

https://stackoverflow.com/a/51525286
https://stackoverflow.com/questions/46515764/how-can-i-use-async-await-at-the-top-level
https://gist.github.com/Rich-Harris/0b6f317657f5167663b493c722647221

prestomation · 2019-04-19T00:43:15Z

You don't need to instantiate your module at the top level and the async/await is just for brevity...

hostilefork · 2019-04-19T12:06:40Z

You don't need to instantiate your module at the top level and the async/await is just for brevity...

What I meant is--it's not something that can necessarily be done without larger scale restructuring of the calling script. (The fact that rearrangements from synchronous to asynchronous can be invasive is one of the sticking points of modifying emscripten itself to address the issue.)

But yes, URL.createObjectURL does seem to offer a workaround here. Though if emscripten's goal is a turnkey provision of a workable pthread-emulating system...it may be within the project scope to take on this concern and make the client's life easier in this regard.

They renamed `unusedWorkerPool` to `unusedWorkers`, breaking a patch step in the Pthread build. emscripten-core/emscripten#8338

VirtualTim · 2019-11-19T07:21:44Z

@hostilefork What to you think of my proposed idea here: #9796? What that also fix (or avoid) this issue for you?

hostilefork · 2019-11-19T14:31:39Z

@VirtualTim I wound up giving up on my patch, and just bit the bullet and reorganized my loading process. I fetch the worker in a pre-load step as suggested by @prestomation. Things stall until it gets back, then I kick off the webassembly loading process...knowing that when the request for the worker comes in I have the data on hand to use URL.createObjectURL() on it.

In terms of "what I think" ...well... 🙄 ...anything that makes it easier would be nice. But I think that when you're having to use these workarounds to ultimately achieve the same effects as if the browser had let you load the resource like you wanted in the first place--it's hard not to feel like blaming the browser. One of two things is true: the capability itself is a security bug and they have not securely blocked the capability -or- they are introducing more insecurity by forcing you into ever-more-hare-brained mechanics to accomplish a desire that should be deemed legitimate.

The category of problem continues and I've joined in griping about it. (That issue may affect you if you are worried about this problem, so perhaps you want to look into it.) One wonders if there's any chance at pushing back against this trend. So maybe the browser makers could be lobbied to just let you use the URL like you wanted in the first place.

X-Kent · 2020-05-13T23:28:57Z

@hostilefork
Can you please share your way of using URL.createObjectURL() so people less familiar with WASM internal can utilize it until the issue is resolved ?

I am new to WASM/JS (coming from C/C++). Banged my head against enabling CORS headers and why it still doesn't work until I found this thread.

I under the solution @prestomation has proposed but I am not sure where I do put the worker preloading code so I can suspend the WASM initialization until the fetch() is complete.

Edit:
Can the other solution be to "embed" the libname.worker.js inside the main libname.js as a string and load it from string ? I know it has performance issues but in some cases the worker is not as big (mine is about 10kb). The WASM file is the big part in my case.
At least it should be available at the time of the initialization so no need for async.

hostilefork · 2020-05-22T15:53:46Z

@X-Kent Sorry to miss your message. The worker preloading code simply has to happen before you load the emscripten .js file (which is what triggers the locateFile method in Module for the .worker.js)

If it still helps... I have set up the loading process as a chain of "promiser" functions.

One of those promisers is the prefetch_worker_js_promiser.
It is linked into the main promise chain before the load_js_promiser.
Doing so makes it available in time for the Module.locateFile method to have the data it needs.

Hopefully that helps, or maybe the code and comments will give you some ideas...

stale · 2021-06-02T16:18:49Z

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

hostilefork · 2021-06-03T02:37:55Z

I now believe using multithreaded code as a library (e.g. where worker-based-Wasm is pulled casually in the way that one might use a hosted version of jQuery) is a lost cause. (No fault of emscripten, of course!!)

I'm closing this issue, as COOP/COEP was the last straw...I'm not going to swim upstream against that current.

emscripten-core/emscripten#8338 (comment)

kleisauke · 2022-03-14T20:53:58Z

Note for further readers; you could also leverage importScripts within the web worker instead of fetch()-ing Emscripten's *.worker.js. You can combine it with the scriptDirectory parameter of locateFile to avoid having to specify the CDN base URL at all.

const module = await Module({
  // https://stackoverflow.com/q/25458104
  locateFile: (fileName, scriptDirectory) => {
    const url = scriptDirectory + fileName;
    if (url.endsWith('.worker.js')) {
      return URL.createObjectURL(new Blob(
        [`importScripts('${url}');`],
        {'type': 'application/javascript'}));
    }
    return url;
  }
});

(if you use multiple threads, you may want to cache the output of URL.createObjectURL, since locateFile is called for each web worker instantiation)

See also: emscripten-core/emscripten#8338.

hostilefork added a commit to hostilefork/replpad-js that referenced this issue Mar 24, 2019

Remove libr3.worker.js from repository

9936926

The worker can now be fetch()'d remotely via CORS from the S3 instance. emscripten-core/emscripten#8338

hostilefork mentioned this issue Mar 25, 2019

Demo: UI Builder hostilefork/replpad-js#35

Closed

hostilefork added a commit to metaeducation/ren-c that referenced this issue Sep 14, 2019

Accomodate rename in patched code for emscripten

48cd4df

They renamed `unusedWorkerPool` to `unusedWorkers`, breaking a patch step in the Pthread build. emscripten-core/emscripten#8338

stale bot added the wontfix label Jun 2, 2021

hostilefork closed this as completed Jun 3, 2021

arashigaoka added a commit to arashigaoka/YaneuraOu.wasm that referenced this issue Jun 3, 2021

stop using cdn

4d51d44

emscripten-core/emscripten#8338 (comment)

arashigaoka added a commit to arashigaoka/YaneuraOu.wasm that referenced this issue Jun 3, 2021

stop using cdn

449e6d0

emscripten-core/emscripten#8338 (comment)

arashigaoka mentioned this issue Jun 3, 2021

stop using cdn arashigaoka/YaneuraOu.wasm#4

Merged

arashigaoka added a commit to arashigaoka/YaneuraOu.wasm that referenced this issue Jun 3, 2021

stop using cdn (#4)

7d62ec8

emscripten-core/emscripten#8338 (comment)

kleisauke added a commit to kleisauke/wasm-vips that referenced this issue Mar 15, 2022

Workaround CORS issue for externally hosted JS (#12)

0d776ce

See also: emscripten-core/emscripten#8338.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Module.newWorker() to allow loading hosted libs via blob #8338

Module.newWorker() to allow loading hosted libs via blob #8338

hostilefork commented Mar 23, 2019 •

edited

kripken commented Mar 27, 2019

hostilefork commented Mar 28, 2019 •

edited

kripken commented Mar 28, 2019

hostilefork commented Mar 28, 2019 •

edited

kripken commented Apr 1, 2019

hostilefork commented Apr 1, 2019

kripken commented Apr 1, 2019

hostilefork commented Apr 1, 2019 •

edited

kripken commented Apr 2, 2019

prestomation commented Apr 18, 2019

hostilefork commented Apr 18, 2019 •

edited

prestomation commented Apr 19, 2019

hostilefork commented Apr 19, 2019

VirtualTim commented Nov 19, 2019

hostilefork commented Nov 19, 2019 •

edited

X-Kent commented May 13, 2020 •

edited

hostilefork commented May 22, 2020

stale bot commented Jun 2, 2021

hostilefork commented Jun 3, 2021 •

edited

kleisauke commented Mar 14, 2022

Module.newWorker() to allow loading *hosted* libs via blob #8338

Module.newWorker() to allow loading *hosted* libs via blob #8338

Comments

hostilefork commented Mar 23, 2019 • edited

kripken commented Mar 27, 2019

hostilefork commented Mar 28, 2019 • edited

kripken commented Mar 28, 2019

hostilefork commented Mar 28, 2019 • edited

kripken commented Apr 1, 2019

hostilefork commented Apr 1, 2019

kripken commented Apr 1, 2019

hostilefork commented Apr 1, 2019 • edited

kripken commented Apr 2, 2019

prestomation commented Apr 18, 2019

hostilefork commented Apr 18, 2019 • edited

prestomation commented Apr 19, 2019

hostilefork commented Apr 19, 2019

VirtualTim commented Nov 19, 2019

hostilefork commented Nov 19, 2019 • edited

X-Kent commented May 13, 2020 • edited

hostilefork commented May 22, 2020

stale bot commented Jun 2, 2021

hostilefork commented Jun 3, 2021 • edited

kleisauke commented Mar 14, 2022

Module.newWorker() to allow loading hosted libs via blob #8338

Module.newWorker() to allow loading hosted libs via blob #8338

hostilefork commented Mar 23, 2019 •

edited

hostilefork commented Mar 28, 2019 •

edited

hostilefork commented Mar 28, 2019 •

edited

hostilefork commented Apr 1, 2019 •

edited

hostilefork commented Apr 18, 2019 •

edited

hostilefork commented Nov 19, 2019 •

edited

X-Kent commented May 13, 2020 •

edited

hostilefork commented Jun 3, 2021 •

edited