[Web] WebGPU issues tracking #15796

fs-eire · 2023-05-03T20:39:54Z

This issue is for tracking WebGPU related problems. WebGPU EP is available since ONNX Runtime Web v1.15.0 as experimental feature. We are working on improving stability, operator coverage and performance.

For a list of supported/WIP operators, comments or any operator specific issues: #15952

Can not consume

Q: How to build?
A: Building ort-web with webgpu support from source: please refer to this gist

Q: [Web] An error occurred during model execution: "TypeError: Cannot read properties of undefined (reading 'apply')".
A: #15780 <--- this PR fixed it

Q: no available backend found. ERR: ...
A: Need to make sure webgpu is available in the current context. Upgrade to latest Chrome or Edge (v113), and served in a secured location ( https or localhost )

Runtime failures

Q: Non-zero status code returned while running Transpose node. ....
A: #15819 <--- This PR should fix it

Q: crash in the transpose optimizer for various models (#15869: cannot load model https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/onnx/vae_encoder)
A: issue being investigated - see the PR for detailed info

Kernel coverage or running slow

Q: General investigation tips?
A: a few tools that can be used to taking deeper look at it: ( don't do them together, it will generate too many logs )

env.logLevel = 'verbose'; env.debug = true; - This will let onnxruntime-web to output some logs helpful for analysing the execution. including telling which operators are running on webgpu and which are on CPU (fallback). to improve performance caused by fallback we need to improve the operator coverage. I can help to implement the missing ops.
env.webgpu.profilingMode = 'default'; - This will output quite a lot of logs into console for each webgpu shaders - by aggregating and analyzing those we can know which shader is slow. Need to launch chrome/edge with flag --disable-dawn-features=disallow_unsafe_apis.
set sessionOptions.enableProfiling = true when creating inference session. This shows which operator running on GPU, which fallback to CPU.

Q: running slow on image classification model. (logs)
A: jsepCopyGpuToCpu occurred 114 times, which indicating frequent CPU <--> GPU data transfer. Adding implementation of the missing operators may improve performance.

The text was updated successfully, but these errors were encountered:

xenova · 2023-05-03T21:38:36Z

For now, need chrome/edge canary launched with flag --enable-unsafe-webgpu, and served in a secured location ( https or localhost )

WebGPU is now supported in the latest version of the official Chrome build (no longer only Canary, and not locked behind the flag). That said, I do not know about support for other browsers.

xenova · 2023-05-03T22:29:54Z

Q: Non-zero status code returned while running Transpose node. ....
A: investigating. @xenova could you share the model and corresponding input?

Here are the model files.

Here is some example input (for the encoder):

let input = {
    attention_mask: new Tensor(
        'int64',
        new BigInt64Array([1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n]),
        [1, 12]
    ),
    input_ids: new Tensor(
        'int64',
        new BigInt64Array([13959n, 1566n, 12n, 2379n, 10n, 8774n, 6n, 149n, 33n, 25n, 58n, 1n]),
        [1, 12]
    )
}

Note: These are the same as I mentioned in the original issue: #15719 (comment)

nagadomi · 2023-05-03T23:45:57Z

Q: Non-zero status code returned while running Transpose node. ....

I got the similar error.

23-05-04 08:10:11.420400 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
i @ ort-wasm.js:49
23-05-04 08:10:11.427000 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Transpose node. Name:'/Transpose' Status Message: Failed to run JSEP kernel
Non-zero status code returned while running Transpose node. Name:'/Transpose' Status Message: Failed to run JSEP kernel
i @ ort-wasm.js:49
caught (in promise) Error: failed to call OrtRun(). error code = 1.
    at t.run (wasm-core-impl.ts:282:15)
    at async t.OnnxruntimeWebAssemblySessionHandler.run (session-handler.ts:81:11)
    at async d.run (inference-session-impl.js:91:15)
    at async Object.padding (script.js:498:19)
    at async Object.tiled_render (script.js:397:17)
    at async process (script.js:617:9)
    at async HTMLInputElement.<anonymous> (script.js:718:13)

The model(Object.padding) is very simple reflection padding.

PyTorch module
https://github.com/nagadomi/nunif/blob/6b605f8687b0ff5439f4c3776fd794f104cc2e15/nunif/models/onnx_helper_models.py#L12-L38
javascript caller
https://github.com/nagadomi/nunif/blob/6b605f8687b0ff5439f4c3776fd794f104cc2e15/waifu2x/unlimited_waifu2x/public_html/script.js#L483-L494
onnx file
pad.zip

fs-eire · 2023-05-05T06:07:37Z

The Transpose issue is figured out and a fix is in PR: #15819.

However, a model with a Transpose node requiring uint8 may not be running with a good performance. WGSL does not support uint8 (spec), so it's hard to get to hardware accelerated from int8 quantized model. So far we only support f32. f16 may be taken into consideration, but since Float16Array is not ready, it's hard to get it work E2E.

DK013 · 2023-05-08T12:57:48Z

@fs-eire with the latest build (following your gist: here) with the following code with the same model and inputs as @xenova :

<script src="./dist/ort.webgpu.min.js"></script>
<script>
    document.addEventListener('DOMContentLoaded', async () => {

     // Load model
     let url = 'https://huggingface.co/Xenova/bert-base-cased_web/resolve/main/onnx/model_quantized.onnx'
     let model = await fetch(url);
     let buffer = await model.arrayBuffer();
     let array = new Uint8Array(buffer);

     // Create a new session
     ort.env.wasm.simd = false;
     ort.env.wasm.numThreads = 1;
     let session = await ort.InferenceSession.create(array)
      ...
   })
</script>

results in the following error at line ort.InferenceSession.create :

Uncaught (in promise) Error: no available backend found. ERR: [wasm] RuntimeError: Aborted(TypeError: WebAssembly.instantiate(): Import #0 module="a" error: module is not an object or function), [cpu] Error: previous call to 'initializeWebAssembly()' failed., [xnnpack] Error: previous call to 'initializeWebAssembly()' failed., [webgpu] Error: previous call to 'initializeWebAssembly()' failed.

If I provide only 'webgpu' as executionProviders, the error is as follows:

Uncaught (in promise) Error: no available backend found. ERR: [wasm] RuntimeError: Aborted(TypeError: WebAssembly.instantiate(): Import #0 module="a" error: module is not an object or function)

which is the same as before.

If I use <script src="./dist/ort.js"></script> instead of ort.webgpu.min.js and use webgpu as executionProviders, the error is: Uncaught (in promise) Error: no available backend found.

fs-eire · 2023-05-08T17:09:38Z

@DK013 The error message looks like the corresponding .wasm file is not served. please check from the devtool/network tab, if any 404 error occurs on *.wasm

DK013 · 2023-05-08T19:46:00Z

@DK013 The error message looks like the corresponding .wasm file is not served. please check from the devtool/network tab, if any 404 error occurs on *.wasm

Doesn't look like it. However I was under the impression it's gonna load the jsep wasm files which exist in dist directory after build (I double checked). Seems like that's not the case.

fs-eire · 2023-05-08T20:24:10Z

I think I know the reason. so it is because of this line
ort.env.wasm.simd = false;

I don't provide a non-simd version of webgpu wasm file because I assume every environment that supports webgpu should always also support wasm fixed-size SIMD

so remove the line should make it work.

I think I can add a warning message if simd is off and webgpu is request.

xenova · 2023-05-13T01:40:12Z

Is there a list of supported WebGPU ops, as well as those planned to be implemented?

visheratin · 2023-05-13T02:18:32Z

I think I can add a warning message if simd is off and webgpu is request.

@fs-eire Maybe along with the warning, you could ignore the ort.env.wasm.simd or reset it to true? Because warnings are often missed/ignored and the execution will still fail.

fs-eire · 2023-05-13T21:55:32Z

Is there a list of supported WebGPU ops, as well as those planned to be implemented?

Let me update to the summary.

fs-eire · 2023-05-13T21:56:36Z

I think I can add a warning message if simd is off and webgpu is request.

@fs-eire Maybe along with the warning, you could ignore the ort.env.wasm.simd or reset it to true? Because warnings are often missed/ignored and the execution will still fail.

I will fail the initialization. see this PR: #15924

xenova · 2023-05-17T21:42:44Z

@fs-eire Are the versions released under the dev tag (e.g., https://www.npmjs.com/package/onnxruntime-web/v/1.16.0-dev.20230508-045c623415) built automatically from the main branch? This will mean I don't have to build the files myself for testing.

fs-eire · 2023-05-17T21:52:04Z

@fs-eire Are the versions released under the dev tag (e.g., https://www.npmjs.com/package/onnxruntime-web/v/1.16.0-dev.20230508-045c623415) built automatically from the main branch? This will mean I don't have to build the files myself for testing.

They are, but it seems that the release pipeline is not doing perfectly. We are reworking the release pipeline for nightly builds recently. Before our work is done, you can use this link to download latest artifacts from our public CI. hope this helps save your time.

xenova · 2023-05-17T21:59:03Z

~~Okay great! Can you perhaps just show where I can download the final builds? I'm not too familiar with the Azure DevOps UI.~~ Nevermind, found it

xenova · 2023-05-17T23:01:46Z

So, I got the imports working (for this build), but I'm getting a lot of errors when running a simple text-classification model:

input:

{
    attention_mask: Tensor {
      type: 'int64',
      data: [1n, 1n, 1n],
      dims: [1,3],
    },
    input_ids: Tensor {
      type: 'int64',
      data: [101n, 3231n, 102n],
      dims: [1,3],
    },
}

### Description because of #15618 , the default allocator changed to device allocator, which will be GPU instead of CPU. in transpose optimizer we expect to read data from initializers so a CPU allocator is required here. this change fixes transpose optimizer on GPU EP Fixes the issue referred to in #15869, #15796

### Description fix buffer size when download. buffer size should always be padded to multiple of 4. resolved issue described in #15796 > ![Image](https://user-images.githubusercontent.com/26504141/239093785-9417dffc-6f00-47b2-956d-402b43bdb0a9.png)

gyagp · 2023-10-27T17:07:47Z

@mrdomino, to make WebGPU work, you may just need to import ort.webgpu.min.js.
I have some sample code at https://github.com/webatintel/ort-toolkit/blob/main/index.html#L109, and you may try a live demo at https://webatintel.github.io/ort-toolkit/?tasks=performance&ep=webgpu&modelName=mobilenetv2-12&modelUrl=hf&enableReadback=true

gabrielgrant · 2023-10-27T18:00:46Z

@gyagp thanks for the pointers!

Your demo does run, but seems to throw some errors, so it's a little unclear what's being run on GPU vs CPU:

ort-wasm-simd.jsep.js:54 2023-10-27 10:51:47.156700 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
j @ ort-wasm-simd.jsep.js:54
ort-wasm-simd.jsep.js:54 2023-10-27 10:51:47.162400 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

I think the suggestion to "rerun with verbose output on a non-minimal build" to show node assignments requires you to re-build, right? Is that something you're able to do?

Thanks again!

mrdomino · 2023-10-27T20:24:33Z

@mrdomino, to make WebGPU work, you may just need to import ort.webgpu.min.js. I have some sample code at https://github.com/webatintel/ort-toolkit/blob/main/index.html#L109, and you may try a live demo at https://webatintel.github.io/ort-toolkit/?tasks=performance&ep=webgpu&modelName=mobilenetv2-12&modelUrl=hf&enableReadback=true

Thanks for the link! I'm now exploring further, and the behavior I'm seeing in Firefox (which I just installed) is different from the behavior I was seeing with Chrome on my Android phone. I'm going to focus on Firefox for now, as it's harder to debug on the phone.

First of all, it turns out that the import is not the issue after all — if I just pass ["wasm"] as the executionProvider, it works whether I import onnxruntime-web or onnxruntime-web/webgpu (which is just an alias for ort.webgpu.min.js IIUC).

If I just pass ["webgpu"] on Firefox or Safari, I get no available backend found. ERR: (with nothing after ERR.) This seems to be coming from here, but it's surprising that there is no error message propagated up.

If I pass either ["webgpu", "wasm"], or ["wasm", "webgpu"], on Chrome, it works. (I cannot tell which execution provider it chose though.) But if I do so on either Firefox or Safari, I get: i.Ea is not a function with a reference into the minified ort.webgpu.min.js source.

I have thus far been working around this by manually testing navigator.gpu.requestAdapter() and setting the backend based on that, but I'm confused as to why that is not happening automatically.

mrdomino · 2023-10-27T20:51:22Z

Okay, wow, this is getting complicated.

I just checked again with Chrome on Android, and even with my navigator.gpu.requestAdapter() workaround, I still get WebGpuBackend: Failed to get GPU adapter., which appears to be coming from here. The only way to work around this is to make not just the executionProvider, but the file import, conditional on the result of requestAdapter. I'm not sure, but I bet your code has this problem too on Chrome on Android @gyagp: when I load the page there, the test results are never filled in with anything.

So to recap:

Recent desktop Chrome: everything fine.
Firefox/Safari on desktop: 'wasm' works, 'webgpu' does not work, and passing an array of backends causes things to break in a strange way, regardless of whether onnxruntime-web or onnxruntime-web/webgpu.
Chrome on Android x onnxruntime-web: wasm works, webgpu presumably doesn't work (haven't tested since it seems pointless to)
Chrome on Android x onnxruntime-web/webgpu: webgpu does not work, wasm does not work, and in both cases the error thrown is different from either of the Firefox/Safari errors.

gyagp · 2023-10-28T01:50:39Z

@gyagp thanks for the pointers!

Your demo does run, but seems to throw some errors, so it's a little unclear what's being run on GPU vs CPU:
ort-wasm-simd.jsep.js:54 2023-10-27 10:51:47.156700 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
j @ ort-wasm-simd.jsep.js:54
ort-wasm-simd.jsep.js:54 2023-10-27 10:51:47.162400 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
I think the suggestion to "rerun with verbose output on a non-minimal build" to show node assignments requires you to re-build, right? Is that something you're able to do?

Thanks again!

@gabrielgrant, these errors could be ignored now. If we use webgpu as ep, if some ops are not supported by WebGPU, they would automatically fall back to wasm. My script has a ortProfiling task (Whole url is https://webatintel.github.io/ort-toolkit/?tasks=ortProfiling&ep=webgpu&modelName=mobilenetv2-12&modelUrl=hf&enableReadback=true), and it will show you where each op is running on (JsExecutionProvider means WebGPU, while CPUExecutionProvider means wasm).
We're still working hard on the WebGPU backend, including many missing op supports. You're always welcome to raise your requirement so that we can prioritize the effort.

gyagp · 2023-10-28T01:54:15Z

Okay, wow, this is getting complicated.

I just checked again with Chrome on Android, and even with my navigator.gpu.requestAdapter() workaround, I still get WebGpuBackend: Failed to get GPU adapter., which appears to be coming from here. The only way to work around this is to make not just the executionProvider, but the file import, conditional on the result of requestAdapter. I'm not sure, but I bet your code has this problem too on Chrome on Android @gyagp: when I load the page there, the test results are never filled in with anything.

So to recap:

Recent desktop Chrome: everything fine.

Firefox/Safari on desktop: 'wasm' works, 'webgpu' does not work, and passing an array of backends causes things to break in a strange way, regardless of whether onnxruntime-web or onnxruntime-web/webgpu.

Chrome on Android x onnxruntime-web: wasm works, webgpu presumably doesn't work (haven't tested since it seems pointless to)

Chrome on Android x onnxruntime-web/webgpu: webgpu does not work, wasm does not work, and in both cases the error thrown is different from either of the Firefox/Safari errors.

@mrdomino I'm not sure about the exact status of Firefox and Safari, but for Chrome, WebGPU was only officially supported on Windows, macOS and ChromeOS (since M113). Android support is still behind the flag "--enable-unsafe-webgpu". Fortunately the its status is very good now, as Google just sent out "intent to ship" in Chrome (https://chromestatus.com/feature/5119617865613312), and planned to ship it in M121 (Jan 23 is the release date). So before then, you still need to pass switch "--enable-unsafe-webgpu" to enable WebGPU on Android with latest Chrome (better to experiment with Chrome Canary).

gyagp · 2023-10-28T01:56:36Z

@mrdomino BTW, if you have interest to follow up the WebGPU status on Safari, here has some clue: gpuweb/gpuweb#4238

mrdomino · 2023-10-28T03:07:41Z

I'm actually not that worried about webgpu on Safari — using pure wasm as a fallback is acceptable for my use case, and actually works well enough on Firefox.

The things I'm concerned about are basically just:

Having to conditionally import one of two different headers depending on navigator.gpu is painful, particularly with TypeScript — it'd be nice if one header worked in both situations.
The inconsistent fallback / failure behavior is surprising, and seems like it's unintentional.

gyagp · 2023-10-28T13:52:18Z

I'm actually not that worried about webgpu on Safari — using pure wasm as a fallback is acceptable for my use case, and actually works well enough on Firefox.

The things I'm concerned about are basically just:

Having to conditionally import one of two different headers depending on navigator.gpu is painful, particularly with TypeScript — it'd be nice if one header worked in both situations.

You only need to import ort.webgpu.min.js. Then if WebGPU is supported, use webgpu as ep; otherwise, change ep to wasm. Pseudo code:
if (webgpuSupported) {
ep = 'webgpu';
} else {
ep = 'wasm';
}
const option = {
executionProviders: [
{
name: ep,
}
]
};
const session = await createSession(option);

The inconsistent fallback / failure behavior is surprising, and seems like it's unintentional.

If WebGPU is supported, the fallback to wasm is either a limitation (some ops, including dataType variants, are not implemented by WebGPU) or an optimization (ORT has heuristic to use wasm over WebGPU for better performance). We will continue to improve the framework, including the profiling mechanism, so that it's easier to differentiate these two. You're always welcome to report perf issue when in doubt.

mrdomino · 2023-10-28T15:45:12Z

You only need to import ort.webgpu.min.js. Then if WebGPU is supported, use webgpu as ep; otherwise, change ep to wasm.

No, that is not the case in Chrome on Android. That works on Chrome on Desktop, and on Firefox and Safari on desktop, but not on Android. On Android, no matter what ep is, importing ort.webgpu.min.js causes a crash.

And by point 2, I was referring to the strange error message thrown on Firefox/Safari (i.Ea is not a function) when I pass an array of backend hints.

mrdomino · 2023-10-28T15:52:29Z

The specific code I am using to decide which backend to use is:

const backend = await (async () => {
  if (!navigator.gpu) return 'wasm'
  const adapter = await navigator.gpu.requestAdapter()
  if (!adapter) return 'wasm'
  return 'webgpu'
})()

That backend is then passed in to InferenceSession.create (as { executionProviders: [backend], ...}). On Android Chrome, depending on which javascript file is imported, that either crashes with "WebGpuBackend: Failed to get GPU adapter." or works.

I will try to get a code sandbox up with a minimal example.

mrdomino · 2023-10-28T16:53:26Z

Here: https://ort-test.vercel.app/

The only difference between the WASM and WebGPU pages is which file is imported. Both are using "wasm" as the ep. On Chrome Android, WASM says "Everything worked" and WebGPU says "Failed during InferenceSession.create" with the WebGpuBackend error message.

https://bitbucket.org/mrdomino/ort-test/src/main/app/ort/page.tsx
https://bitbucket.org/mrdomino/ort-test/src/main/app/ort-webgpu/page.tsx

mrdomino · 2023-10-28T17:18:16Z

Curiously, Chrome on Android exposes a navigator.gpu. It just doesn't produce an adapter if you request one. Firefox and Safari both do not have a navigator.gpu at all. (I added text to the test app's main page to distinguish this.)

Is it possible that somewhere in the code there is a simple check for the presence of navigator.gpu to decide to use WebGPU?

mrdomino · 2023-10-28T17:30:30Z

It looks like registerBackend has such a check:

onnxruntime/js/web/lib/index.ts

Lines 18 to 23 in 24f9c1a

    
           if (!BUILD_DEFS.DISABLE_WASM) { 
        
             const wasmBackend = BUILD_DEFS.DISABLE_TRAINING ? require('./backend-wasm-inference').wasmBackend : 
        
                                                               require('./backend-wasm-training').wasmBackend; 
        
             if (!BUILD_DEFS.DISABLE_WEBGPU && typeof navigator !== 'undefined' && navigator.gpu) { 
        
               registerBackend('webgpu', wasmBackend, 5); 
        
             }

So there is a difference between the good cases of Firefox/Safari and the bad case of Android Chrome: the former do not have the backend registered, while the latter does.

Still, the logic in resolveBackend really looks like it should be handling the error, and from the observed behavior, it is not. So a different code path must be getting taken that is trying to initialize a WebGpuBackend.

Ah, and indeed, here we go:

onnxruntime/js/web/lib/wasm/jsep/init.ts

Lines 133 to 140 in 24f9c1a

    
           export const init = async(module: OrtWasmModule, env: Env): Promise<void> => { 
        
             const init = module.jsepInit; 
        
             if (init && navigator.gpu) { 
        
               if (!env.wasm.simd) { 
        
                 throw new Error( 
        
                     'Not supported for WebGPU=ON and SIMD=OFF. Please set `env.wasm.simd` to true when using WebGPU EP'); 
        
               } 
        
               const backend = new WebGpuBackend();

I think I can probably submit a PR to fix that.

Just testing for the presence of navigator.gpu is not sufficient to establish WebGPU support: in particular, at time of writing, Chrome on Android exposes a navigator.gpu but does not return anything from requestAdapter. Context: microsoft#15796 (comment)

gyagp · 2023-10-29T16:02:37Z

Thanks for the PR, @fs-eire and @guschmue, any comments on this?
@mrdomino It's also easy for you to work around this at your side like below:
async function getEp() {
if (!navigator.gpu) {
return 'wasm';
}
const adapter = await navigator.gpu.requestAdapter();
if (!adapter) {
delete Navigator.prototype.gpu;
return 'wasm';
}
return 'webgpu';
}

mrdomino · 2023-10-29T16:28:54Z

Hahaha, I like it.

One thought I had is it probably makes sense to also check the adapter at the registerBackend call site, and in that case, maybe it makes sense for hasGpu to be in the env?

mrdomino · 2024-02-03T15:46:25Z

FYI, using onnxruntime-web 1.17.0, executionProviders: ['webgpu', 'wasm'] still does not work on Firefox — it dies with the same "[minified-name] is not a function" as before.

alba-saco · 2024-03-12T19:36:47Z

Hi there,

I'm getting this error when I set executionProviders=['webgpu'] (I am running on Chrome via https):
audioProcessor.js:2 Error during inference: Error: no available backend found. ERR: [webgpu] RuntimeError: null function or function signature mismatch

When I remove ort.env.wasm.simd = false I get the following error:
Error during inference: TypeError: Cannot read properties of undefined (reading 'apply')
I am aware of this issue and to confirm, I am importing from onnxruntime-web/webgpu

When I run my code with executionProviders=['wasm'] everything executes perfectly.

I wasn't sure if I should create a new issue or just put a comment here

fs-eire · 2024-03-12T22:52:58Z

Hi there,

I'm getting this error when I set executionProviders=['webgpu'] (I am running on Chrome via https): audioProcessor.js:2 Error during inference: Error: no available backend found. ERR: [webgpu] RuntimeError: null function or function signature mismatch

When I remove ort.env.wasm.simd = false I get the following error: Error during inference: TypeError: Cannot read properties of undefined (reading 'apply') I am aware of this issue and to confirm, I am importing from onnxruntime-web/webgpu

When I run my code with executionProviders=['wasm'] everything executes perfectly.

I wasn't sure if I should create a new issue or just put a comment here

This looks like you may import from onnxruntime-web not onnxruntime-web/webgpu , see https://github.com/microsoft/onnxruntime-inference-examples/tree/main/js/importing_onnxruntime-web#conditional-importing

alba-saco · 2024-03-13T17:57:12Z

Thank you for your prompt response. I get
Error during inference: Error: no available backend found. ERR:
when importing from onnxruntime-web. I am running on the latest version of chrome served with https and have confirmed that webGPU is available in my environment

### Description This PR rewrite the backend resolve logic to support specifying multiple EPs. #### Backend The first version of ONNX Runtime Web actually carried some existing code from [ONNX.js](https://github.com/microsoft/onnxjs), which includes the "backend" concept. The original "backend" in ONNX.js is designed in a way assuming there is only one backend from user's backend hint list will be used. For example, in ONNX.js, if user specify a backend hint as `['webgl', 'wasm']`, ONNX.js will first try to use WebGL backend - if it loads successfully (the browser supports webgl), then "webgl" backend will be used and "wasm" will be ignored; otherwise, "webgl" will be ignored and try to load "wasm" backend. In short: only one backend will be used when initializing a session. #### Execution Provider Execution Provider, or EP, in ONNX Runtime is a different concept. One of the differences is that users are allow to specify multiple EPs, and if one does not support a particular kernel, it can fallback to other EP. This is a very common case when using a GPU EP in ONNX Runtime. #### Current Status: Backend v.s. EP Because of the history reasons mentioned above, the current status is quite confusing. There are **real backend**s, which means it's different implementation in code; and there are **backend hint**s, which are used as string names for backend hint; and there are **EP**s of the ONNX Runtime concepts. currently there are only 2 **backend**s in our code base: The "onnxjs backend", and the "wasm backend". The "onnxjs backend" currently only powers backend hint "webgl", which go into the old onnx.js code path. All other backend hints including "wasm", "cpu"(alias to wasm), "webgpu" and "webnn" are all powered by "wasm backend". And because ORT Web treat "backend" as an internal concept and want to align with ONNX Runtime, so those names of backend hints are becoming EP names. The following table shows today's status: | Execution Provider Name (public) / Backend Hint (internal) | Backend | EP in ORT | -------- | ------- | ------- | | "wasm"/"cpu" | WasmBackend | CPU EP | "webgl" | OnnxjsBackend | \* technically not an EP | "webgpu" | WasmBackend | JSEP | "webnn" | WasmBackend | WebNN EP #### Problem While the API allows to specify multiple EPs, the backend resolving only allows one backend. This causes issues when user specify multiple EP names in session options, the backend resolve behavior and EP registration behavior is inconsistent. Specifically, in this issue: #15796 (comment): EP list `['webgpu', 'wasm']` on a browser without WebGPU support resolves to 'wasm' backend, but the full EP list is passed in session options, so JSEP is still enabled, causing the runtime error. #### Solution Since we still need WebGL backend, we cannot totally remove the backend register/resolve system. In this PR I made the following changes: - initialize every backend from the EP list, instead of only do that for the first successful one. - for the first resolved backend, filter all EP using the exact same backend. Remove all EPs not using this backend from session options - for every explicitly specified EP, if it's removed, show a warning message in console

fs-eire · 2024-03-16T00:28:29Z

Thank you for your prompt response. I get Error during inference: Error: no available backend found. ERR: when importing from onnxruntime-web. I am running on the latest version of chrome served with https and have confirmed that webGPU is available in my environment

Yes, as I explained, if you want to use WebGPU, you need to import onnxruntime-web/webgpu. Otherwise it does not work.

### Description This PR rewrite the backend resolve logic to support specifying multiple EPs. #### Backend The first version of ONNX Runtime Web actually carried some existing code from [ONNX.js](https://github.com/microsoft/onnxjs), which includes the "backend" concept. The original "backend" in ONNX.js is designed in a way assuming there is only one backend from user's backend hint list will be used. For example, in ONNX.js, if user specify a backend hint as `['webgl', 'wasm']`, ONNX.js will first try to use WebGL backend - if it loads successfully (the browser supports webgl), then "webgl" backend will be used and "wasm" will be ignored; otherwise, "webgl" will be ignored and try to load "wasm" backend. In short: only one backend will be used when initializing a session. #### Execution Provider Execution Provider, or EP, in ONNX Runtime is a different concept. One of the differences is that users are allow to specify multiple EPs, and if one does not support a particular kernel, it can fallback to other EP. This is a very common case when using a GPU EP in ONNX Runtime. #### Current Status: Backend v.s. EP Because of the history reasons mentioned above, the current status is quite confusing. There are **real backend**s, which means it's different implementation in code; and there are **backend hint**s, which are used as string names for backend hint; and there are **EP**s of the ONNX Runtime concepts. currently there are only 2 **backend**s in our code base: The "onnxjs backend", and the "wasm backend". The "onnxjs backend" currently only powers backend hint "webgl", which go into the old onnx.js code path. All other backend hints including "wasm", "cpu"(alias to wasm), "webgpu" and "webnn" are all powered by "wasm backend". And because ORT Web treat "backend" as an internal concept and want to align with ONNX Runtime, so those names of backend hints are becoming EP names. The following table shows today's status: | Execution Provider Name (public) / Backend Hint (internal) | Backend | EP in ORT | -------- | ------- | ------- | | "wasm"/"cpu" | WasmBackend | CPU EP | "webgl" | OnnxjsBackend | \* technically not an EP | "webgpu" | WasmBackend | JSEP | "webnn" | WasmBackend | WebNN EP #### Problem While the API allows to specify multiple EPs, the backend resolving only allows one backend. This causes issues when user specify multiple EP names in session options, the backend resolve behavior and EP registration behavior is inconsistent. Specifically, in this issue: #15796 (comment): EP list `['webgpu', 'wasm']` on a browser without WebGPU support resolves to 'wasm' backend, but the full EP list is passed in session options, so JSEP is still enabled, causing the runtime error. #### Solution Since we still need WebGL backend, we cannot totally remove the backend register/resolve system. In this PR I made the following changes: - initialize every backend from the EP list, instead of only do that for the first successful one. - for the first resolved backend, filter all EP using the exact same backend. Remove all EPs not using this backend from session options - for every explicitly specified EP, if it's removed, show a warning message in console

### Description fix buffer size when download. buffer size should always be padded to multiple of 4. resolved issue described in microsoft#15796 > ![Image](https://user-images.githubusercontent.com/26504141/239093785-9417dffc-6f00-47b2-956d-402b43bdb0a9.png)

guschmue · 2024-05-23T18:20:13Z

closing this one, gotten a bit stale.
For webgpu issues lets create new issues for each problem.

fs-eire added the platform:web issues related to ONNX Runtime web; typically submitted using template label May 3, 2023

fs-eire changed the title ~~[Web] editing ...~~ [Web] WebGPU issues tracking May 3, 2023

fs-eire mentioned this issue May 3, 2023

[Web] An error occurred during model execution: "TypeError: Cannot read properties of undefined (reading 'apply')". #15719

Closed

github-actions bot added the model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. label May 3, 2023

nagadomi mentioned this issue May 5, 2023

[unlimited:waifu2x] Multithreading is possible but not configured properly nagadomi/nunif#34

Open

3 tasks

nagadomi mentioned this issue May 6, 2023

[Web] TypeError: a.jsepAlloc is not a function #15832

Closed

tuna2134 mentioned this issue May 15, 2023

WebGPUに対応させる VOICEVOX/voicevox_core#491

Open

fs-eire self-assigned this May 15, 2023

fs-eire mentioned this issue May 15, 2023

[js/web] WebGPU backend #11695

Closed

This was referenced May 17, 2023

[js/webgpu] fix buffer size when download #15990

Merged

fix transpose optimizer on GPU EP #15988

Merged

kungfooman mentioned this issue May 18, 2023

[Bug] Percent X and transcripts get stuck with console error huggingface/transformers.js#66

Closed

mrdomino mentioned this issue Oct 28, 2023

Correct check for WebGPU support #18144

Closed

gabrielgrant mentioned this issue Nov 28, 2023

Whisper on webGPU? huggingface/transformers.js#100

Closed

fs-eire mentioned this issue Mar 1, 2024

[js/web] rewrite backend resolve to allow multiple EPs #19735

Merged

guschmue closed this as completed May 23, 2024

[Web] WebGPU issues tracking #15796

[Web] WebGPU issues tracking #15796

Comments

fs-eire commented May 3, 2023 • edited Loading

Can not consume

Runtime failures

Kernel coverage or running slow

xenova commented May 3, 2023

xenova commented May 3, 2023 • edited Loading

nagadomi commented May 3, 2023

fs-eire commented May 5, 2023 • edited Loading

DK013 commented May 8, 2023 • edited Loading

fs-eire commented May 8, 2023

DK013 commented May 8, 2023

fs-eire commented May 8, 2023

xenova commented May 13, 2023

visheratin commented May 13, 2023

fs-eire commented May 13, 2023

fs-eire commented May 13, 2023

xenova commented May 17, 2023 • edited Loading

fs-eire commented May 17, 2023

xenova commented May 17, 2023 • edited Loading

xenova commented May 17, 2023 • edited Loading

gyagp commented Oct 27, 2023

gabrielgrant commented Oct 27, 2023

mrdomino commented Oct 27, 2023 • edited Loading

mrdomino commented Oct 27, 2023

gyagp commented Oct 28, 2023 • edited Loading

gyagp commented Oct 28, 2023

gyagp commented Oct 28, 2023

mrdomino commented Oct 28, 2023

gyagp commented Oct 28, 2023 • edited Loading

mrdomino commented Oct 28, 2023

mrdomino commented Oct 28, 2023 • edited Loading

mrdomino commented Oct 28, 2023

mrdomino commented Oct 28, 2023

mrdomino commented Oct 28, 2023

gyagp commented Oct 29, 2023

mrdomino commented Oct 29, 2023 • edited Loading

mrdomino commented Feb 3, 2024 • edited Loading

alba-saco commented Mar 12, 2024 • edited Loading

fs-eire commented Mar 12, 2024

alba-saco commented Mar 13, 2024 • edited Loading

fs-eire commented Mar 16, 2024

guschmue commented May 23, 2024

fs-eire commented May 3, 2023 •

edited

Loading

xenova commented May 3, 2023 •

edited

Loading

fs-eire commented May 5, 2023 •

edited

Loading

DK013 commented May 8, 2023 •

edited

Loading

xenova commented May 17, 2023 •

edited

Loading

xenova commented May 17, 2023 •

edited

Loading

xenova commented May 17, 2023 •

edited

Loading

mrdomino commented Oct 27, 2023 •

edited

Loading

gyagp commented Oct 28, 2023 •

edited

Loading

gyagp commented Oct 28, 2023 •

edited

Loading

mrdomino commented Oct 28, 2023 •

edited

Loading

mrdomino commented Oct 29, 2023 •

edited

Loading

mrdomino commented Feb 3, 2024 •

edited

Loading

alba-saco commented Mar 12, 2024 •

edited

Loading

alba-saco commented Mar 13, 2024 •

edited

Loading