-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Web] WebGPU and WASM Backends Unavailable within Service Worker #20876
Comments
Than you for reporting this issue. I will try to figure out how to fix this problem. |
So it turns out to be that dynamic import (ie. Currently, the WebAssembly factory (wasm-factory.ts) uses dynamic import to load the JS glue. This does not work in service worker. A few potential solutions are also not available:
I am now trying to make a JS bundle that does not use dynamic import for usage of service worker specifically. Still working on it |
Thanks, I appreciate your efforts around this. It does seem like some special-case bundle will need to be built after all; you might need |
I have considered this option. However, Emscripten does not offer an option to output both UMD(IIFE+CJS) & ESM for JS glue (emscripten-core/emscripten#21899). I have to choose either. I choose the ES6 format output for the JS glue, because of a couple of problems when import UMD from ESM, and I found a way to make ORT web working, - yes this need the build script to do some special handling. And this will only work for ESM, because the JS glue is ESM and it seems no way to import ESM from UMD in service worker. |
### Description <!-- Describe your changes. --> This PR allows to build ORT web to `ort{.all|.webgpu}.bundle.min.mjs`, which does not have any dynamic import. This makes it possible to use ort web via static import in service worker. Fixes #20876
@ggaabe Could you please help to try |
@fs-eire my project is dependent on transformersjs, which imports onnxruntime webgpu backend like this here: https://github.com/xenova/transformers.js/blob/v3/src/backends/onnx.js#L24 Is this the right usage? In my project I've added this to my package.json to resolve onnx-runtime to this new version though the issue is still occurring:
|
Maybe also important: The same error is still occurring in same spot in inference session in the onnx package and not from transformersjs. Do I need to add a resolver for onnxruntime-common as well? |
#20991 makes default ESM import to use non-dynamic-import and hope this change may fix this problem. PR is still in progress |
Hi @fs-eire, is the newly-merged fix in a released build I can try? |
Please try 1.19.0-dev.20240612-94aa21c3dd |
@fs-eire EDIT: Nvm the comment I just deleted, that error was because I didn't set the webpack However, I'm getting a new error now (progress!):
|
Update: Found the error is happening in here: onnxruntime/js/common/lib/backend-impl.ts Lines 83 to 86 in fff68c3
For some reason the webgpu backend.init promise is rejecting due to the |
Could you share me the reproduce steps? |
@fs-eire You'll need to run the webGPU setup in a chrome extension.
![]()
![]()
![]()
|
@ggaabe I did some debug on my box and made some fixes -
|
Awesome, thank you for your thoroughness in explaining this and tackling this head on. Is there a dev channel version I can test out? |
Not yet. Will update here once it is ready. |
sorry to bug; is there any dev build number? wasn't sure how often a release runs |
Please try 1.19.0-dev.20240621-69d522f4e9 |
@fs-eire I'm getting one new error:
I pushed the code changes to my repo and fixed the call to the tokenizer. To reproduce, just type 1 letter in the chrome extension’s text input and wait |
Hey, I also need this. I am struggling with importing this version. So far I have been importing ONNX using |
just replace |
This may be a problem of transformerjs. Could you try whether this problem happen in a normal page? If so, can report the issue to transformerjs. If it's only happening in service worker, I can take a closer look |
@fs-eire I can verify that using import * as ONNX_WEBGPU from "onnxruntime-web/webgpu";
// any Blob that contains a valid ORT model would work
// I'm using Xenova/multilingual-e5-small/onnx/model_quantized.with_runtime_opt.ort
const buffer = await mlModel.blob.arrayBuffer();
const sessionwebGpu = await ONNX_WEBGPU.InferenceSession.create(buffer, {
executionProviders: ["webgpu"],
});
console.log("Loading embedding model using sessionwebGpu", sessionwebGpu); Results in a successful execution, yay! 💯 :) ![]() I think we can ignore the warning, printed as an error, as the session loads. WebAssembly would work in a Service Worker. Just because Service Workers are limited in their ability to load external resources such as WASM runtime files as Passing down a It's even much simpler for non-Web-Extension use cases as you simply only use the I'm aware that the current implementation hard-codes a few things. Like
import ortWasmRuntime from "onnxruntime-web/dist/ort-wasm-simd-threaded" The runtime exports a default runtime function:
Module['instantiateWasm'] = async(imports, onSuccess) => {
let result;
if (WebAssembly.instantiateStreaming) {
result = WebAssembly.instantiateStreaming(Module["wasmModule"], imports);
} else {
result = await WebAssembly.instantiate(Module["wasmModule"], imports)
}
return onSuccess(result.instance, result.module)
}; Of course, we don't want it that way, but I mention it as this is the "documented way".
{
numThreads,
// just conditionally merge in:
instantiateWasm: ONNX_WASM.env.wasm.instantiateWasm
}
import * as ONNX_WASM from "onnxruntime-web/wasm";
// the difference is, that this will be bundled in by the user-land bundler,
// while the conditional dynamic import that happens in the ONNX runtime would not
// as the trenary operator here: https://github.com/microsoft/onnxruntime/blob/83e0c6b96e77634dd648e890cead598b6e065cde/js/web/lib/wasm/wasm-utils-import.ts#L157
// and all it's following code cannot be statically analyzed by bundlers; tree-shaking and inline cannot happen,
// so bundler will be forced to generate dynamic import() code
// this could also lead to downstream issues with the transformersjs package and other packages / bundler combinations,
// while this is explicit and inlined
import ortWasmRuntime from "onnxruntime-web/dist/ort-wasm-simd-threaded"
// could maybe be passed a Blob via https://emscripten.org/docs/api_reference/module.html#Module.mainScriptUrlOrBlob
ONNX_WASM.env.wasm.proxy = false;
// instead of always calling importWasmModule() in wasm-factory.ts, allow to pass down the callback of the Emscripten JS runtime
ONNX_WASM.env.wasm.wasmRuntime = ortWasmRuntime;
// allow to also set a custom Emscripten loader
ONNX_WASM.env.wasm.instantiateWasm = async(imports, onSuccess) => {
let result;
if (WebAssembly.instantiateStreaming) {
// please note that wasmRuntimeBlob comes from user-land code. It may be passed via a MessageChannel
result = WebAssembly.instantiateStreaming(await wasmRuntimeBlob.arrayBuffer(), imports);
} else {
// please note that wasmRuntimeBlob comes from user-land code. It may be passed via a MessageChannel
result = await WebAssembly.instantiate(await wasmRuntimeBlob.arrayBuffer(), imports)
}
return onSuccess(result.instance, result.module)
}
// then continuing as usual
// please note that mlModel comes from user-land code. It may have been passed via a MessageChannel
const modelBuffer = await mlModel.blob.arrayBuffer();
const sessionWasm = await ONNX_WASM.InferenceSession.create(buffer, {
executionProviders: ["wasm"],
});
console.log("Loading embedding model using sessionWasm", sessionWasm); So with a 1 LoC change (using passed down runtime callback) here, and 1 LoC change here, (add the Currently, when I call the WASM implementation: import * as ONNX_WASM from "onnxruntime-web/wasm";
const sessionWasm = await ONNX_WASM.InferenceSession.create(buffer, {
executionProviders: ["wasm"],
});
console.log("Loading embedding model using sessionWasm", sessionWasm); Thank you for your help! |
I can confirm Web GPU is working for my little chrome extension app as well, but I'm having a problem disabling the warning. |
You can numb it using a brittle monkey patch... // store original reference
const originalConsole = self.console;
// override function reference with a new arrow function that does nothing
self.console.error = () => {}
// code will internally call the function that does nothing...
const sessionwebGpu = await ONNX_WEBGPU.InferenceSession.create(buffer, {
executionProviders: ["webgpu"],
});
// still works, we did only replace the reference for the .error() function
console.log("Loading embedding model using sessionwebGpu", sessionwebGpu);
// restore the original function reference, so that console.error() works just as before
self.console.error = originalConsole.error; But I agree.. it should probably be a |
Thank you so much!!!!! The whole time I was trying to change the ort log
severity, now it's fast and beautiful!!!!
…On Mon, Jul 8, 2024 at 7:03 AM Aron Homberg ***@***.***> wrote:
@ChTiSh <https://github.com/ChTiSh>
I can confirm Web GPU is working for my little chrome extension app as
well, but I'm having a problem disabling the warning.
You can numb it using a brittle monkey patch...
const originalConsole = self.console;
self.console.error = () => {}
const sessionwebGpu = await ONNX_WEBGPU.InferenceSession.create(buffer, {
executionProviders: ["webgpu"],});console.log("Loading embedding model using sessionwebGpu", sessionwebGpu);
self.console.error = originalConsole.error;
—
Reply to this email directly, view it on GitHub
<#20876 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AWQJC525QV2VY4WC2636BI3ZLKL3XAVCNFSM6AAAAABIRY45QKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJUGE3DONRSHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@ChTiSh You're welcome 🫶 Always happy to help :) |
Did data structures change for the |
Describe the issue
I'm running into issues trying to use the WebGPU or WASM backends inside of a ServiceWorker (on a chrome extension). More specifically, I'm attempting to use Phi-3 with transformers.js v3
Every time I attempt this, I get the following error:
This is originating in the
InferenceSession
class injs/common/lib/inference-session-impl.ts
.More specifically, it's happening in this method:
const [backend, optionsWithValidatedEPs] = await resolveBackendAndExecutionProviders(options);
where the implementation is in
js/common/lib/backend-impl.ts
and thetryResolveAndInitializeBackend
fails to initialize any of the execution providers.WebGPU is now supported in ServiceWorkers though; it is a recent change and it should be feasible. Here were the chrome release notes.
Additionally, here is an example browser extension from the mlc-ai/web-llm framework that implements WebGPU usage in service workers successfully:
https://github.com/mlc-ai/web-llm/tree/main/examples/chrome-extension-webgpu-service-worker
Here is some further discussion on this new support from Google itself:
https://groups.google.com/a/chromium.org/g/chromium-extensions/c/ZEcSLsjCw84/m/WkQa5LAHAQAJ
So technically I think it should be possible for this to be supported now? Unless I'm doing something else glaringly wrong. Is it possible to add support for this?
To reproduce
Download and set up the transformers.js extension example and put this into the background.js file:
Urgency
this would help enable a new ecosystem to build up around locally intelligent browser extensions and tooling.
it's urgent for me because it would be fun to build and I want to build it and it would be fun to be building it rather than not be building it.
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.19.0-dev.20240509-69cfcba38a
Execution Provider
'webgpu' (WebGPU)
The text was updated successfully, but these errors were encountered: