Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module.newWorker() to allow loading *hosted* libs via blob #8338

Closed
hostilefork opened this issue Mar 23, 2019 · 20 comments
Closed

Module.newWorker() to allow loading *hosted* libs via blob #8338

hostilefork opened this issue Mar 23, 2019 · 20 comments
Labels

Comments

@hostilefork
Copy link
Contributor

hostilefork commented Mar 23, 2019

(I wouldn't mind tinkering around and providing this as a PR, if the idea was considered sound.)

When you build a library that uses pthreads in emscripten, it produces an additional my-lib.worker.js file.

If you try to put your main my-lib.js in a location besides where your page is being served from, and tell locateFile() that the my-lib.worker.js file is right there with it...you get a bit of "browser security theatre":

Uncaught (in promise) DOMException: Failed to construct 'Worker':
Script at 'http://example.com/my-lib.worker.js' cannot be accessed from
origin 'http://example.com'.

The failing call originates from within library-pthread.js, which is built into the my-lib.js file when you are using pthreads.

Clients don't want to be editing either of these files. But if you could do so, and if your hosting location has CORS enabled, you might workaround by changing:

var worker = new Worker(pthreadMainJs)

To fetch() the JS into a Blob and then use:

var worker = new Worker(URL.createObjectURL(blob))

I tested this and it works, so we're having to do a post-build patch step to rewrite the build product.

But it would be better if emscripten had a Module.newWorker = function (url, callback) {...} option. That could default to being implemented as just callback(new Worker(url)). Then interested parties could provide their own replacement, such as:

Module.newWorker = function(url, callback) {
    fetch(url)
        .then(function(response) {
             if (!response.ok)
                 throw Error(response.statusText)
             return response.blob()
         })
         .then(function(blob) {
             callback(new Worker(URL.createObjectURL(blob)))
         })
 }

Does that sound reasonable? I've gathered Emscripten doesn't want to take Promises for granted, but if it did this could use them instead of a callback.

hostilefork added a commit to hostilefork/replpad-js that referenced this issue Mar 24, 2019
The worker can now be fetch()'d remotely via CORS from the S3 instance.

emscripten-core/emscripten#8338
@kripken
Copy link
Member

kripken commented Mar 27, 2019

Interesting. I'm surprised such a workaround for CORS is necessary, in that I'm surprised it's so easy to work around it...

As to adding a new Module hook - in general we are trying to avoid adding more, as they increase code size, unless absolutely necessary. I think you can polyfill this in JS, so it's not necessary here, that is, you can monkeypatch Worker itself in your code?

@hostilefork
Copy link
Contributor Author

hostilefork commented Mar 28, 2019

I think you can polyfill this in JS, so it's not necessary here, that is, you can monkeypatch Worker itself in your code?

The worker.js calling code is expecting a synchronous result to be returned, and that it is a functional worker with the supported API. There's a completely different contract between new Worker() giving back such an object, and being willing to accept a contract to get that object later. Hence the callback parameter.

Without that, you'd need a way to handle every synchronous call to a worker (present and future API) via an asynchronous implementation. That's not something JavaScript can do, e.g. no top-level AWAITs.

If it worked for this use case, it would be non-trivial...it would mean giving back a stub object from a new Worker() call that wasn't itself a worker, but had (for instance) a postMessage() handler. That handler would have to be able to accept messages before the worker was ready, and put them in its own queue. When the actual worker loaded, it would have to process that deferred list of messages. etc.

@kripken
Copy link
Member

kripken commented Mar 28, 2019

I think you can still polyfill it? Yes, new Worker returns a worker synchronously, which could contain a polyfill object that proxies the various APIs, and "buffers" as needed - so if you postMessage before the real underlying worker is available, it would wait to send that message until it is, etc.

Can even use a JS Proxy object to support future APIs, but yeah, there might be special logic that is extra work, so it's not 100% future-proof.

I realize this isn't easy to do, but adding a new hook in emscripten has a cost too. On the other hand, maybe there's a way to implement that without adding code size or complexity?

@hostilefork
Copy link
Contributor Author

hostilefork commented Mar 28, 2019

I realize this isn't easy to do, but adding a new hook in emscripten has a cost too.

It may be the case that workers are such that there are no purely synchronous demands of them (or at least emscripten vows to not use any such APIs), so a proxy atop a worker that may not be in existence at the time of the API calls is a technical possibility.

But... though I can't speak for everyone... if my only two choices were to design/build/debug/maintain this artificial proxying object that simulates buffers for a Worker it doesn't have yet...vs. do a post-build patch of the worker.js output...I'd definitely pick patching the worker.js output to say newWorker(pthreadInitUrl).then(...). :-/

As to adding a new Module hook - in general we are trying to avoid adding more, as they increase code size, unless absolutely necessary.

Code-size-wise, this strikes me as a rather minimal in the scheme of things. It need not affect non-pthread builds...and anyone using pthreads is probably prepared for some size. And in that build, all it's asking for is there be an overridable default:

newWorker = function(url, callback) { callback(new Worker(url)) }

I'd be happy enough supplying the fetch()-based override hook in client code, without EMSCRIPTEN_USE_FETCH_FOR_WORKERS doing it automatically or having that code sitting inert in library-pthread.js for people who aren't using it.

Perhaps the long tail of new Worker() will sort this out and start working someday without any of this rigmarole. In the meantime, this seems pretty important (in that I don't think I am the only one who will be interested in having pthread-enabled code load from a domain other than the one in the URL bar)...

@kripken
Copy link
Member

kripken commented Apr 1, 2019

I think those are good points.

Meanwhile I realized, can't the locateFile API work for this?

@hostilefork
Copy link
Contributor Author

Meanwhile I realized, can't the locateFile API work for this?

I don't see how--as it's a synchronous API that returns a URL. And so long as that URL is on a remote domain and you don't go through the fetch API, the browser won't be happy.

@kripken
Copy link
Member

kripken commented Apr 1, 2019

Yeah, true - I wonder if we shouldn't consider making it an async API, as a general fix for this type of issue. Not a simple change though.

@hostilefork
Copy link
Contributor Author

hostilefork commented Apr 1, 2019

I wonder if we shouldn't consider making it an async API, as a general fix for this type of issue.

You mean locateFile would be able to give either URLs or Blobs asynchronously? That might be a forward-looking choice. (...though making you rethink all the locateFile() callsites)

If you wanted backwards compatibility, you could have the callback be the second parameter, and if locateFile() had a return value then use that...else if undefined, assume it's coming later by way of callback.

@kripken
Copy link
Member

kripken commented Apr 2, 2019

Yeah, that might be a good long-term plan, but would indeed require changing all the locations.

Another option here might be to do the async operation before - that is, before you load the main JS, load the worker JS and prepare the object URL etc., and stash that, and then you can hand it off synchronously in locateFile. A way to do that easily might be of general use perhaps (for any file we try to load)?

@prestomation
Copy link

@kripken This idea works. Something like the following:

const base = "https://my/cross/origin/base/url;
const workerURL = `${base}/module.worker.js`;
const workerBlob = await(await fetch(workerURL)).blob();
const workerObjectURL = URL.createObjectURL(workerBlob);
const module = await new Module({
    "locateFile": name => name.endsWith(".worker.js") ? workerObjectURL : `${base}/${name}`
});

@hostilefork
Copy link
Contributor Author

hostilefork commented Apr 18, 2019

This idea works.

That is an interesting feature to know about, and it seems URL.createObjectURL has fairly wide support (at least, as much support as other things being used here)...

https://caniuse.com/#feat=bloburls

But in practice that would seem to run against the "you can't use async/await at the top level" issue:

https://stackoverflow.com/a/51525286
https://stackoverflow.com/questions/46515764/how-can-i-use-async-await-at-the-top-level
https://gist.github.com/Rich-Harris/0b6f317657f5167663b493c722647221

@prestomation
Copy link

You don't need to instantiate your module at the top level and the async/await is just for brevity...

@hostilefork
Copy link
Contributor Author

You don't need to instantiate your module at the top level and the async/await is just for brevity...

What I meant is--it's not something that can necessarily be done without larger scale restructuring of the calling script. (The fact that rearrangements from synchronous to asynchronous can be invasive is one of the sticking points of modifying emscripten itself to address the issue.)

But yes, URL.createObjectURL does seem to offer a workaround here. Though if emscripten's goal is a turnkey provision of a workable pthread-emulating system...it may be within the project scope to take on this concern and make the client's life easier in this regard.

hostilefork added a commit to metaeducation/ren-c that referenced this issue Sep 14, 2019
They renamed `unusedWorkerPool` to `unusedWorkers`, breaking a patch
step in the Pthread build.

emscripten-core/emscripten#8338
@VirtualTim
Copy link
Collaborator

@hostilefork What to you think of my proposed idea here: #9796? What that also fix (or avoid) this issue for you?

@hostilefork
Copy link
Contributor Author

hostilefork commented Nov 19, 2019

@VirtualTim I wound up giving up on my patch, and just bit the bullet and reorganized my loading process. I fetch the worker in a pre-load step as suggested by @prestomation. Things stall until it gets back, then I kick off the webassembly loading process...knowing that when the request for the worker comes in I have the data on hand to use URL.createObjectURL() on it.

In terms of "what I think" ...well... 🙄 ...anything that makes it easier would be nice. But I think that when you're having to use these workarounds to ultimately achieve the same effects as if the browser had let you load the resource like you wanted in the first place--it's hard not to feel like blaming the browser. One of two things is true: the capability itself is a security bug and they have not securely blocked the capability -or- they are introducing more insecurity by forcing you into ever-more-hare-brained mechanics to accomplish a desire that should be deemed legitimate.

The category of problem continues and I've joined in griping about it. (That issue may affect you if you are worried about this problem, so perhaps you want to look into it.) One wonders if there's any chance at pushing back against this trend. So maybe the browser makers could be lobbied to just let you use the URL like you wanted in the first place.

@X-Kent
Copy link

X-Kent commented May 13, 2020

@hostilefork
Can you please share your way of using URL.createObjectURL() so people less familiar with WASM internal can utilize it until the issue is resolved ?

I am new to WASM/JS (coming from C/C++). Banged my head against enabling CORS headers and why it still doesn't work until I found this thread.

I under the solution @prestomation has proposed but I am not sure where I do put the worker preloading code so I can suspend the WASM initialization until the fetch() is complete.

Edit:
Can the other solution be to "embed" the libname.worker.js inside the main libname.js as a string and load it from string ? I know it has performance issues but in some cases the worker is not as big (mine is about 10kb). The WASM file is the big part in my case.
At least it should be available at the time of the initialization so no need for async.

@hostilefork
Copy link
Contributor Author

@X-Kent Sorry to miss your message. The worker preloading code simply has to happen before you load the emscripten .js file (which is what triggers the locateFile method in Module for the .worker.js)

If it still helps... I have set up the loading process as a chain of "promiser" functions.

Hopefully that helps, or maybe the code and comments will give you some ideas...

@stale
Copy link

stale bot commented Jun 2, 2021

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

@stale stale bot added the wontfix label Jun 2, 2021
@hostilefork
Copy link
Contributor Author

hostilefork commented Jun 3, 2021

I now believe using multithreaded code as a library (e.g. where worker-based-Wasm is pulled casually in the way that one might use a hosted version of jQuery) is a lost cause. (No fault of emscripten, of course!!)

I'm closing this issue, as COOP/COEP was the last straw...I'm not going to swim upstream against that current.

arashigaoka added a commit to arashigaoka/YaneuraOu.wasm that referenced this issue Jun 3, 2021
arashigaoka added a commit to arashigaoka/YaneuraOu.wasm that referenced this issue Jun 3, 2021
arashigaoka added a commit to arashigaoka/YaneuraOu.wasm that referenced this issue Jun 3, 2021
@kleisauke
Copy link
Collaborator

Note for further readers; you could also leverage importScripts within the web worker instead of fetch()-ing Emscripten's *.worker.js. You can combine it with the scriptDirectory parameter of locateFile to avoid having to specify the CDN base URL at all.

const module = await Module({
  // https://stackoverflow.com/q/25458104
  locateFile: (fileName, scriptDirectory) => {
    const url = scriptDirectory + fileName;
    if (url.endsWith('.worker.js')) {
      return URL.createObjectURL(new Blob(
        [`importScripts('${url}');`],
        {'type': 'application/javascript'}));
    }
    return url;
  }
});

(if you use multiple threads, you may want to cache the output of URL.createObjectURL, since locateFile is called for each web worker instantiation)

kleisauke added a commit to kleisauke/wasm-vips that referenced this issue Mar 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants