Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Web] WebGPU issues tracking #15796

Closed
fs-eire opened this issue May 3, 2023 · 46 comments
Closed

[Web] WebGPU issues tracking #15796

fs-eire opened this issue May 3, 2023 · 46 comments
Assignees
Labels
model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. platform:web issues related to ONNX Runtime web; typically submitted using template

Comments

@fs-eire
Copy link
Contributor

fs-eire commented May 3, 2023

This issue is for tracking WebGPU related problems. WebGPU EP is available since ONNX Runtime Web v1.15.0 as experimental feature. We are working on improving stability, operator coverage and performance.

For a list of supported/WIP operators, comments or any operator specific issues: #15952


Can not consume

Q: How to build?
A: Building ort-web with webgpu support from source: please refer to this gist

Q: [Web] An error occurred during model execution: "TypeError: Cannot read properties of undefined (reading 'apply')".
A: #15780 <--- this PR fixed it

Q: no available backend found. ERR: ...
A: Need to make sure webgpu is available in the current context. Upgrade to latest Chrome or Edge (v113), and served in a secured location ( https or localhost )

Runtime failures

Q: Non-zero status code returned while running Transpose node. ....
A: #15819 <--- This PR should fix it

Q: crash in the transpose optimizer for various models (#15869: cannot load model https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/onnx/vae_encoder)
A: issue being investigated - see the PR for detailed info

Kernel coverage or running slow

Q: General investigation tips?
A: a few tools that can be used to taking deeper look at it: ( don't do them together, it will generate too many logs )

  • env.logLevel = 'verbose'; env.debug = true; - This will let onnxruntime-web to output some logs helpful for analysing the execution. including telling which operators are running on webgpu and which are on CPU (fallback). to improve performance caused by fallback we need to improve the operator coverage. I can help to implement the missing ops.
  • env.webgpu.profilingMode = 'default'; - This will output quite a lot of logs into console for each webgpu shaders - by aggregating and analyzing those we can know which shader is slow. Need to launch chrome/edge with flag --disable-dawn-features=disallow_unsafe_apis.
  • set sessionOptions.enableProfiling = true when creating inference session. This shows which operator running on GPU, which fallback to CPU.

Q: running slow on image classification model. (logs)
A: jsepCopyGpuToCpu occurred 114 times, which indicating frequent CPU <--> GPU data transfer. Adding implementation of the missing operators may improve performance.

@fs-eire fs-eire added the platform:web issues related to ONNX Runtime web; typically submitted using template label May 3, 2023
@fs-eire fs-eire changed the title [Web] editing ... [Web] WebGPU issues tracking May 3, 2023
@github-actions github-actions bot added the model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. label May 3, 2023
@xenova
Copy link

xenova commented May 3, 2023

For now, need chrome/edge canary launched with flag --enable-unsafe-webgpu, and served in a secured location ( https or localhost )

WebGPU is now supported in the latest version of the official Chrome build (no longer only Canary, and not locked behind the flag). That said, I do not know about support for other browsers.

@xenova
Copy link

xenova commented May 3, 2023

Q: Non-zero status code returned while running Transpose node. ....
A: investigating. @xenova could you share the model and corresponding input?

Here are the model files.

Here is some example input (for the encoder):

let input = {
    attention_mask: new Tensor(
        'int64',
        new BigInt64Array([1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n]),
        [1, 12]
    ),
    input_ids: new Tensor(
        'int64',
        new BigInt64Array([13959n, 1566n, 12n, 2379n, 10n, 8774n, 6n, 149n, 33n, 25n, 58n, 1n]),
        [1, 12]
    )
}

Note: These are the same as I mentioned in the original issue: #15719 (comment)

@nagadomi
Copy link

nagadomi commented May 3, 2023

Q: Non-zero status code returned while running Transpose node. ....

I got the similar error.

23-05-04 08:10:11.420400 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
i @ ort-wasm.js:49
23-05-04 08:10:11.427000 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Transpose node. Name:'/Transpose' Status Message: Failed to run JSEP kernel
Non-zero status code returned while running Transpose node. Name:'/Transpose' Status Message: Failed to run JSEP kernel
i @ ort-wasm.js:49
caught (in promise) Error: failed to call OrtRun(). error code = 1.
    at t.run (wasm-core-impl.ts:282:15)
    at async t.OnnxruntimeWebAssemblySessionHandler.run (session-handler.ts:81:11)
    at async d.run (inference-session-impl.js:91:15)
    at async Object.padding (script.js:498:19)
    at async Object.tiled_render (script.js:397:17)
    at async process (script.js:617:9)
    at async HTMLInputElement.<anonymous> (script.js:718:13)

The model(Object.padding) is very simple reflection padding.

PyTorch module
https://github.com/nagadomi/nunif/blob/6b605f8687b0ff5439f4c3776fd794f104cc2e15/nunif/models/onnx_helper_models.py#L12-L38
javascript caller
https://github.com/nagadomi/nunif/blob/6b605f8687b0ff5439f4c3776fd794f104cc2e15/waifu2x/unlimited_waifu2x/public_html/script.js#L483-L494
onnx file
pad.zip

@fs-eire
Copy link
Contributor Author

fs-eire commented May 5, 2023

The Transpose issue is figured out and a fix is in PR: #15819.

However, a model with a Transpose node requiring uint8 may not be running with a good performance. WGSL does not support uint8 (spec), so it's hard to get to hardware accelerated from int8 quantized model. So far we only support f32. f16 may be taken into consideration, but since Float16Array is not ready, it's hard to get it work E2E.

@DK013
Copy link

DK013 commented May 8, 2023

@fs-eire with the latest build (following your gist: here) with the following code with the same model and inputs as @xenova :

<script src="./dist/ort.webgpu.min.js"></script>
<script>
    document.addEventListener('DOMContentLoaded', async () => {

     // Load model
     let url = 'https://huggingface.co/Xenova/bert-base-cased_web/resolve/main/onnx/model_quantized.onnx'
     let model = await fetch(url);
     let buffer = await model.arrayBuffer();
     let array = new Uint8Array(buffer);

     // Create a new session
     ort.env.wasm.simd = false;
     ort.env.wasm.numThreads = 1;
     let session = await ort.InferenceSession.create(array)
      ...
   })
</script>

results in the following error at line ort.InferenceSession.create :

Uncaught (in promise) Error: no available backend found. ERR: [wasm] RuntimeError: Aborted(TypeError: WebAssembly.instantiate(): Import #0 module="a" error: module is not an object or function), [cpu] Error: previous call to 'initializeWebAssembly()' failed., [xnnpack] Error: previous call to 'initializeWebAssembly()' failed., [webgpu] Error: previous call to 'initializeWebAssembly()' failed.

If I provide only 'webgpu' as executionProviders, the error is as follows:

Uncaught (in promise) Error: no available backend found. ERR: [wasm] RuntimeError: Aborted(TypeError: WebAssembly.instantiate(): Import #0 module="a" error: module is not an object or function)

which is the same as before.

If I use <script src="./dist/ort.js"></script> instead of ort.webgpu.min.js and use webgpu as executionProviders, the error is: Uncaught (in promise) Error: no available backend found.

@fs-eire
Copy link
Contributor Author

fs-eire commented May 8, 2023

@DK013 The error message looks like the corresponding .wasm file is not served. please check from the devtool/network tab, if any 404 error occurs on *.wasm

@DK013
Copy link

DK013 commented May 8, 2023

@DK013 The error message looks like the corresponding .wasm file is not served. please check from the devtool/network tab, if any 404 error occurs on *.wasm

image
Doesn't look like it. However I was under the impression it's gonna load the jsep wasm files which exist in dist directory after build (I double checked). Seems like that's not the case.

@fs-eire
Copy link
Contributor Author

fs-eire commented May 8, 2023

I think I know the reason. so it is because of this line
ort.env.wasm.simd = false;

I don't provide a non-simd version of webgpu wasm file because I assume every environment that supports webgpu should always also support wasm fixed-size SIMD

so remove the line should make it work.

I think I can add a warning message if simd is off and webgpu is request.

@xenova
Copy link

xenova commented May 13, 2023

Is there a list of supported WebGPU ops, as well as those planned to be implemented?

@visheratin
Copy link
Contributor

I think I can add a warning message if simd is off and webgpu is request.

@fs-eire Maybe along with the warning, you could ignore the ort.env.wasm.simd or reset it to true? Because warnings are often missed/ignored and the execution will still fail.

@fs-eire
Copy link
Contributor Author

fs-eire commented May 13, 2023

Is there a list of supported WebGPU ops, as well as those planned to be implemented?

Let me update to the summary.

@fs-eire
Copy link
Contributor Author

fs-eire commented May 13, 2023

I think I can add a warning message if simd is off and webgpu is request.

@fs-eire Maybe along with the warning, you could ignore the ort.env.wasm.simd or reset it to true? Because warnings are often missed/ignored and the execution will still fail.

I will fail the initialization. see this PR: #15924

@xenova
Copy link

xenova commented May 17, 2023

@fs-eire Are the versions released under the dev tag (e.g., https://www.npmjs.com/package/onnxruntime-web/v/1.16.0-dev.20230508-045c623415) built automatically from the main branch? This will mean I don't have to build the files myself for testing.

@fs-eire
Copy link
Contributor Author

fs-eire commented May 17, 2023

@fs-eire Are the versions released under the dev tag (e.g., https://www.npmjs.com/package/onnxruntime-web/v/1.16.0-dev.20230508-045c623415) built automatically from the main branch? This will mean I don't have to build the files myself for testing.

They are, but it seems that the release pipeline is not doing perfectly. We are reworking the release pipeline for nightly builds recently. Before our work is done, you can use this link to download latest artifacts from our public CI. hope this helps save your time.

@xenova
Copy link

xenova commented May 17, 2023

Okay great! Can you perhaps just show where I can download the final builds? I'm not too familiar with the Azure DevOps UI. Nevermind, found it

@xenova
Copy link

xenova commented May 17, 2023

So, I got the imports working (for this build), but I'm getting a lot of errors when running a simple text-classification model:

input:

{
    attention_mask: Tensor {
      type: 'int64',
      data: [1n, 1n, 1n],
      dims: [1,3],
    },
    input_ids: Tensor {
      type: 'int64',
      data: [101n, 3231n, 102n],
      dims: [1,3],
    },
}

image

fs-eire added a commit that referenced this issue May 19, 2023
### Description
because of #15618 , the default allocator changed to device allocator,
which will be GPU instead of CPU. in transpose optimizer we expect to
read data from initializers so a CPU allocator is required here.

this change fixes transpose optimizer on GPU EP

Fixes the issue referred to in #15869, #15796
fs-eire added a commit that referenced this issue May 20, 2023
### Description
fix buffer size when download. buffer size should always be padded to
multiple of 4.

resolved issue described in #15796

>
![Image](https://user-images.githubusercontent.com/26504141/239093785-9417dffc-6f00-47b2-956d-402b43bdb0a9.png)
@gyagp
Copy link
Contributor

gyagp commented Oct 27, 2023

@mrdomino, to make WebGPU work, you may just need to import ort.webgpu.min.js.
I have some sample code at https://github.com/webatintel/ort-toolkit/blob/main/index.html#L109, and you may try a live demo at https://webatintel.github.io/ort-toolkit/?tasks=performance&ep=webgpu&modelName=mobilenetv2-12&modelUrl=hf&enableReadback=true

@gabrielgrant
Copy link

@gyagp thanks for the pointers!

Your demo does run, but seems to throw some errors, so it's a little unclear what's being run on GPU vs CPU:

ort-wasm-simd.jsep.js:54 2023-10-27 10:51:47.156700 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
j @ ort-wasm-simd.jsep.js:54
ort-wasm-simd.jsep.js:54 2023-10-27 10:51:47.162400 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

image

I think the suggestion to "rerun with verbose output on a non-minimal build" to show node assignments requires you to re-build, right? Is that something you're able to do?

Thanks again!

@mrdomino
Copy link

mrdomino commented Oct 27, 2023

@mrdomino, to make WebGPU work, you may just need to import ort.webgpu.min.js. I have some sample code at https://github.com/webatintel/ort-toolkit/blob/main/index.html#L109, and you may try a live demo at https://webatintel.github.io/ort-toolkit/?tasks=performance&ep=webgpu&modelName=mobilenetv2-12&modelUrl=hf&enableReadback=true

Thanks for the link! I'm now exploring further, and the behavior I'm seeing in Firefox (which I just installed) is different from the behavior I was seeing with Chrome on my Android phone. I'm going to focus on Firefox for now, as it's harder to debug on the phone.

First of all, it turns out that the import is not the issue after all — if I just pass ["wasm"] as the executionProvider, it works whether I import onnxruntime-web or onnxruntime-web/webgpu (which is just an alias for ort.webgpu.min.js IIUC).

If I just pass ["webgpu"] on Firefox or Safari, I get no available backend found. ERR: (with nothing after ERR.) This seems to be coming from here, but it's surprising that there is no error message propagated up.

If I pass either ["webgpu", "wasm"], or ["wasm", "webgpu"], on Chrome, it works. (I cannot tell which execution provider it chose though.) But if I do so on either Firefox or Safari, I get: i.Ea is not a function with a reference into the minified ort.webgpu.min.js source.

I have thus far been working around this by manually testing navigator.gpu.requestAdapter() and setting the backend based on that, but I'm confused as to why that is not happening automatically.

@mrdomino
Copy link

Okay, wow, this is getting complicated.

I just checked again with Chrome on Android, and even with my navigator.gpu.requestAdapter() workaround, I still get WebGpuBackend: Failed to get GPU adapter., which appears to be coming from here. The only way to work around this is to make not just the executionProvider, but the file import, conditional on the result of requestAdapter. I'm not sure, but I bet your code has this problem too on Chrome on Android @gyagp: when I load the page there, the test results are never filled in with anything.

So to recap:

  • Recent desktop Chrome: everything fine.
  • Firefox/Safari on desktop: 'wasm' works, 'webgpu' does not work, and passing an array of backends causes things to break in a strange way, regardless of whether onnxruntime-web or onnxruntime-web/webgpu.
  • Chrome on Android x onnxruntime-web: wasm works, webgpu presumably doesn't work (haven't tested since it seems pointless to)
  • Chrome on Android x onnxruntime-web/webgpu: webgpu does not work, wasm does not work, and in both cases the error thrown is different from either of the Firefox/Safari errors.

@gyagp
Copy link
Contributor

gyagp commented Oct 28, 2023

@gyagp thanks for the pointers!

Your demo does run, but seems to throw some errors, so it's a little unclear what's being run on GPU vs CPU:

ort-wasm-simd.jsep.js:54 2023-10-27 10:51:47.156700 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
j @ ort-wasm-simd.jsep.js:54
ort-wasm-simd.jsep.js:54 2023-10-27 10:51:47.162400 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

image

I think the suggestion to "rerun with verbose output on a non-minimal build" to show node assignments requires you to re-build, right? Is that something you're able to do?

Thanks again!

@gabrielgrant, these errors could be ignored now. If we use webgpu as ep, if some ops are not supported by WebGPU, they would automatically fall back to wasm. My script has a ortProfiling task (Whole url is https://webatintel.github.io/ort-toolkit/?tasks=ortProfiling&ep=webgpu&modelName=mobilenetv2-12&modelUrl=hf&enableReadback=true), and it will show you where each op is running on (JsExecutionProvider means WebGPU, while CPUExecutionProvider means wasm).
We're still working hard on the WebGPU backend, including many missing op supports. You're always welcome to raise your requirement so that we can prioritize the effort.

@gyagp
Copy link
Contributor

gyagp commented Oct 28, 2023

Okay, wow, this is getting complicated.

I just checked again with Chrome on Android, and even with my navigator.gpu.requestAdapter() workaround, I still get WebGpuBackend: Failed to get GPU adapter., which appears to be coming from here. The only way to work around this is to make not just the executionProvider, but the file import, conditional on the result of requestAdapter. I'm not sure, but I bet your code has this problem too on Chrome on Android @gyagp: when I load the page there, the test results are never filled in with anything.

So to recap:

  • Recent desktop Chrome: everything fine.
  • Firefox/Safari on desktop: 'wasm' works, 'webgpu' does not work, and passing an array of backends causes things to break in a strange way, regardless of whether onnxruntime-web or onnxruntime-web/webgpu.
  • Chrome on Android x onnxruntime-web: wasm works, webgpu presumably doesn't work (haven't tested since it seems pointless to)
  • Chrome on Android x onnxruntime-web/webgpu: webgpu does not work, wasm does not work, and in both cases the error thrown is different from either of the Firefox/Safari errors.

@mrdomino I'm not sure about the exact status of Firefox and Safari, but for Chrome, WebGPU was only officially supported on Windows, macOS and ChromeOS (since M113). Android support is still behind the flag "--enable-unsafe-webgpu". Fortunately the its status is very good now, as Google just sent out "intent to ship" in Chrome (https://chromestatus.com/feature/5119617865613312), and planned to ship it in M121 (Jan 23 is the release date). So before then, you still need to pass switch "--enable-unsafe-webgpu" to enable WebGPU on Android with latest Chrome (better to experiment with Chrome Canary).

@gyagp
Copy link
Contributor

gyagp commented Oct 28, 2023

@mrdomino BTW, if you have interest to follow up the WebGPU status on Safari, here has some clue: gpuweb/gpuweb#4238

@mrdomino
Copy link

I'm actually not that worried about webgpu on Safari — using pure wasm as a fallback is acceptable for my use case, and actually works well enough on Firefox.

The things I'm concerned about are basically just:

  1. Having to conditionally import one of two different headers depending on navigator.gpu is painful, particularly with TypeScript — it'd be nice if one header worked in both situations.
  2. The inconsistent fallback / failure behavior is surprising, and seems like it's unintentional.

@gyagp
Copy link
Contributor

gyagp commented Oct 28, 2023

I'm actually not that worried about webgpu on Safari — using pure wasm as a fallback is acceptable for my use case, and actually works well enough on Firefox.

The things I'm concerned about are basically just:

  1. Having to conditionally import one of two different headers depending on navigator.gpu is painful, particularly with TypeScript — it'd be nice if one header worked in both situations.

You only need to import ort.webgpu.min.js. Then if WebGPU is supported, use webgpu as ep; otherwise, change ep to wasm. Pseudo code:
if (webgpuSupported) {
ep = 'webgpu';
} else {
ep = 'wasm';
}
const option = {
executionProviders: [
{
name: ep,
}
]
};
const session = await createSession(option);

  1. The inconsistent fallback / failure behavior is surprising, and seems like it's unintentional.

If WebGPU is supported, the fallback to wasm is either a limitation (some ops, including dataType variants, are not implemented by WebGPU) or an optimization (ORT has heuristic to use wasm over WebGPU for better performance). We will continue to improve the framework, including the profiling mechanism, so that it's easier to differentiate these two. You're always welcome to report perf issue when in doubt.

@mrdomino
Copy link

You only need to import ort.webgpu.min.js. Then if WebGPU is supported, use webgpu as ep; otherwise, change ep to wasm.

No, that is not the case in Chrome on Android. That works on Chrome on Desktop, and on Firefox and Safari on desktop, but not on Android. On Android, no matter what ep is, importing ort.webgpu.min.js causes a crash.

And by point 2, I was referring to the strange error message thrown on Firefox/Safari (i.Ea is not a function) when I pass an array of backend hints.

@mrdomino
Copy link

mrdomino commented Oct 28, 2023

The specific code I am using to decide which backend to use is:

const backend = await (async () => {
  if (!navigator.gpu) return 'wasm'
  const adapter = await navigator.gpu.requestAdapter()
  if (!adapter) return 'wasm'
  return 'webgpu'
})()

That backend is then passed in to InferenceSession.create (as { executionProviders: [backend], ...}). On Android Chrome, depending on which javascript file is imported, that either crashes with "WebGpuBackend: Failed to get GPU adapter." or works.

I will try to get a code sandbox up with a minimal example.

@mrdomino
Copy link

Here: https://ort-test.vercel.app/

The only difference between the WASM and WebGPU pages is which file is imported. Both are using "wasm" as the ep. On Chrome Android, WASM says "Everything worked" and WebGPU says "Failed during InferenceSession.create" with the WebGpuBackend error message.

https://bitbucket.org/mrdomino/ort-test/src/main/app/ort/page.tsx
https://bitbucket.org/mrdomino/ort-test/src/main/app/ort-webgpu/page.tsx

@mrdomino
Copy link

Curiously, Chrome on Android exposes a navigator.gpu. It just doesn't produce an adapter if you request one. Firefox and Safari both do not have a navigator.gpu at all. (I added text to the test app's main page to distinguish this.)

Is it possible that somewhere in the code there is a simple check for the presence of navigator.gpu to decide to use WebGPU?

@mrdomino
Copy link

It looks like registerBackend has such a check:

if (!BUILD_DEFS.DISABLE_WASM) {
const wasmBackend = BUILD_DEFS.DISABLE_TRAINING ? require('./backend-wasm-inference').wasmBackend :
require('./backend-wasm-training').wasmBackend;
if (!BUILD_DEFS.DISABLE_WEBGPU && typeof navigator !== 'undefined' && navigator.gpu) {
registerBackend('webgpu', wasmBackend, 5);
}

So there is a difference between the good cases of Firefox/Safari and the bad case of Android Chrome: the former do not have the backend registered, while the latter does.

Still, the logic in resolveBackend really looks like it should be handling the error, and from the observed behavior, it is not. So a different code path must be getting taken that is trying to initialize a WebGpuBackend.

Ah, and indeed, here we go:

export const init = async(module: OrtWasmModule, env: Env): Promise<void> => {
const init = module.jsepInit;
if (init && navigator.gpu) {
if (!env.wasm.simd) {
throw new Error(
'Not supported for WebGPU=ON and SIMD=OFF. Please set `env.wasm.simd` to true when using WebGPU EP');
}
const backend = new WebGpuBackend();

I think I can probably submit a PR to fix that.

mrdomino added a commit to mrdomino/onnxruntime that referenced this issue Oct 28, 2023
Just testing for the presence of navigator.gpu is not sufficient to
establish WebGPU support: in particular, at time of writing, Chrome on
Android exposes a navigator.gpu but does not return anything from
requestAdapter.

Context: microsoft#15796 (comment)
@gyagp
Copy link
Contributor

gyagp commented Oct 29, 2023

Thanks for the PR, @fs-eire and @guschmue, any comments on this?
@mrdomino It's also easy for you to work around this at your side like below:
async function getEp() {
if (!navigator.gpu) {
return 'wasm';
}
const adapter = await navigator.gpu.requestAdapter();
if (!adapter) {
delete Navigator.prototype.gpu;
return 'wasm';
}
return 'webgpu';
}

@mrdomino
Copy link

mrdomino commented Oct 29, 2023

Hahaha, I like it.

One thought I had is it probably makes sense to also check the adapter at the registerBackend call site, and in that case, maybe it makes sense for hasGpu to be in the env?

@mrdomino
Copy link

mrdomino commented Feb 3, 2024

FYI, using onnxruntime-web 1.17.0, executionProviders: ['webgpu', 'wasm'] still does not work on Firefox — it dies with the same "[minified-name] is not a function" as before.

@alba-saco
Copy link

alba-saco commented Mar 12, 2024

Hi there,

I'm getting this error when I set executionProviders=['webgpu'] (I am running on Chrome via https):
audioProcessor.js:2 Error during inference: Error: no available backend found. ERR: [webgpu] RuntimeError: null function or function signature mismatch

When I remove ort.env.wasm.simd = false I get the following error:
Error during inference: TypeError: Cannot read properties of undefined (reading 'apply')
I am aware of this issue and to confirm, I am importing from onnxruntime-web/webgpu

When I run my code with executionProviders=['wasm'] everything executes perfectly.

I wasn't sure if I should create a new issue or just put a comment here

@fs-eire
Copy link
Contributor Author

fs-eire commented Mar 12, 2024

Hi there,

I'm getting this error when I set executionProviders=['webgpu'] (I am running on Chrome via https): audioProcessor.js:2 Error during inference: Error: no available backend found. ERR: [webgpu] RuntimeError: null function or function signature mismatch

When I remove ort.env.wasm.simd = false I get the following error: Error during inference: TypeError: Cannot read properties of undefined (reading 'apply') I am aware of this issue and to confirm, I am importing from onnxruntime-web/webgpu

When I run my code with executionProviders=['wasm'] everything executes perfectly.

I wasn't sure if I should create a new issue or just put a comment here

This looks like you may import from onnxruntime-web not onnxruntime-web/webgpu , see https://github.com/microsoft/onnxruntime-inference-examples/tree/main/js/importing_onnxruntime-web#conditional-importing

@alba-saco
Copy link

alba-saco commented Mar 13, 2024

Thank you for your prompt response. I get
Error during inference: Error: no available backend found. ERR:
when importing from onnxruntime-web. I am running on the latest version of chrome served with https and have confirmed that webGPU is available in my environment

fs-eire added a commit that referenced this issue Mar 15, 2024
### Description

This PR rewrite the backend resolve logic to support specifying multiple
EPs.

#### Backend

The first version of ONNX Runtime Web actually carried some existing
code from [ONNX.js](https://github.com/microsoft/onnxjs), which includes
the "backend" concept. The original "backend" in ONNX.js is designed in
a way assuming there is only one backend from user's backend hint list
will be used. For example, in ONNX.js, if user specify a backend hint as
`['webgl', 'wasm']`, ONNX.js will first try to use WebGL backend - if it
loads successfully (the browser supports webgl), then "webgl" backend
will be used and "wasm" will be ignored; otherwise, "webgl" will be
ignored and try to load "wasm" backend.

In short: only one backend will be used when initializing a session.

#### Execution Provider

Execution Provider, or EP, in ONNX Runtime is a different concept. One
of the differences is that users are allow to specify multiple EPs, and
if one does not support a particular kernel, it can fallback to other
EP. This is a very common case when using a GPU EP in ONNX Runtime.

#### Current Status: Backend v.s. EP

Because of the history reasons mentioned above, the current status is
quite confusing. There are **real backend**s, which means it's different
implementation in code; and there are **backend hint**s, which are used
as string names for backend hint; and there are **EP**s of the ONNX
Runtime concepts.

currently there are only 2 **backend**s in our code base: The "onnxjs
backend", and the "wasm backend". The "onnxjs backend" currently only
powers backend hint "webgl", which go into the old onnx.js code path.
All other backend hints including "wasm", "cpu"(alias to wasm), "webgpu"
and "webnn" are all powered by "wasm backend".

And because ORT Web treat "backend" as an internal concept and want to
align with ONNX Runtime, so those names of backend hints are becoming EP
names.

The following table shows today's status:

| Execution Provider Name (public) / Backend Hint (internal) | Backend |
EP in ORT
| -------- | ------- | ------- |
| "wasm"/"cpu" | WasmBackend | CPU EP
| "webgl" | OnnxjsBackend | \* technically not an EP
| "webgpu" | WasmBackend | JSEP
| "webnn" | WasmBackend | WebNN EP

#### Problem

While the API allows to specify multiple EPs, the backend resolving only
allows one backend. This causes issues when user specify multiple EP
names in session options, the backend resolve behavior and EP
registration behavior is inconsistent. Specifically, in this issue:
#15796 (comment):

EP list `['webgpu', 'wasm']` on a browser without WebGPU support
resolves to 'wasm' backend, but the full EP list is passed in session
options, so JSEP is still enabled, causing the runtime error.


#### Solution

Since we still need WebGL backend, we cannot totally remove the backend
register/resolve system. In this PR I made the following changes:
- initialize every backend from the EP list, instead of only do that for
the first successful one.
- for the first resolved backend, filter all EP using the exact same
backend. Remove all EPs not using this backend from session options
- for every explicitly specified EP, if it's removed, show a warning
message in console
@fs-eire
Copy link
Contributor Author

fs-eire commented Mar 16, 2024

Thank you for your prompt response. I get Error during inference: Error: no available backend found. ERR: when importing from onnxruntime-web. I am running on the latest version of chrome served with https and have confirmed that webGPU is available in my environment

Yes, as I explained, if you want to use WebGPU, you need to import onnxruntime-web/webgpu. Otherwise it does not work.

fs-eire added a commit that referenced this issue Mar 20, 2024
### Description

This PR rewrite the backend resolve logic to support specifying multiple
EPs.

#### Backend

The first version of ONNX Runtime Web actually carried some existing
code from [ONNX.js](https://github.com/microsoft/onnxjs), which includes
the "backend" concept. The original "backend" in ONNX.js is designed in
a way assuming there is only one backend from user's backend hint list
will be used. For example, in ONNX.js, if user specify a backend hint as
`['webgl', 'wasm']`, ONNX.js will first try to use WebGL backend - if it
loads successfully (the browser supports webgl), then "webgl" backend
will be used and "wasm" will be ignored; otherwise, "webgl" will be
ignored and try to load "wasm" backend.

In short: only one backend will be used when initializing a session.

#### Execution Provider

Execution Provider, or EP, in ONNX Runtime is a different concept. One
of the differences is that users are allow to specify multiple EPs, and
if one does not support a particular kernel, it can fallback to other
EP. This is a very common case when using a GPU EP in ONNX Runtime.

#### Current Status: Backend v.s. EP

Because of the history reasons mentioned above, the current status is
quite confusing. There are **real backend**s, which means it's different
implementation in code; and there are **backend hint**s, which are used
as string names for backend hint; and there are **EP**s of the ONNX
Runtime concepts.

currently there are only 2 **backend**s in our code base: The "onnxjs
backend", and the "wasm backend". The "onnxjs backend" currently only
powers backend hint "webgl", which go into the old onnx.js code path.
All other backend hints including "wasm", "cpu"(alias to wasm), "webgpu"
and "webnn" are all powered by "wasm backend".

And because ORT Web treat "backend" as an internal concept and want to
align with ONNX Runtime, so those names of backend hints are becoming EP
names.

The following table shows today's status:

| Execution Provider Name (public) / Backend Hint (internal) | Backend |
EP in ORT
| -------- | ------- | ------- |
| "wasm"/"cpu" | WasmBackend | CPU EP
| "webgl" | OnnxjsBackend | \* technically not an EP
| "webgpu" | WasmBackend | JSEP
| "webnn" | WasmBackend | WebNN EP

#### Problem

While the API allows to specify multiple EPs, the backend resolving only
allows one backend. This causes issues when user specify multiple EP
names in session options, the backend resolve behavior and EP
registration behavior is inconsistent. Specifically, in this issue:
#15796 (comment):

EP list `['webgpu', 'wasm']` on a browser without WebGPU support
resolves to 'wasm' backend, but the full EP list is passed in session
options, so JSEP is still enabled, causing the runtime error.


#### Solution

Since we still need WebGL backend, we cannot totally remove the backend
register/resolve system. In this PR I made the following changes:
- initialize every backend from the EP list, instead of only do that for
the first successful one.
- for the first resolved backend, filter all EP using the exact same
backend. Remove all EPs not using this backend from session options
- for every explicitly specified EP, if it's removed, show a warning
message in console
siweic0 pushed a commit to siweic0/onnxruntime-web that referenced this issue May 9, 2024
### Description
fix buffer size when download. buffer size should always be padded to
multiple of 4.

resolved issue described in microsoft#15796

>
![Image](https://user-images.githubusercontent.com/26504141/239093785-9417dffc-6f00-47b2-956d-402b43bdb0a9.png)
@guschmue
Copy link
Contributor

closing this one, gotten a bit stale.
For webgpu issues lets create new issues for each problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. platform:web issues related to ONNX Runtime web; typically submitted using template
Projects
None yet
Development

No branches or pull requests