[ExecuTorch][WebGPU] Enable backend test suite + x86 CI by JulianCloudNTH · Pull Request #19964 · pytorch/executorch

JulianCloudNTH · 2026-06-02T22:55:46Z

Stack from ghstack (oldest at bottom):

Wires the WebGPU backend into the standard ExecuTorch backend test suite and adds an x86 Linux CI job, mirroring the Vulkan delegate: backends/test/suite/flows/webgpu.py plus a WebGPUTester, run by oss/.github/workflows/test-backend-webgpu.yml on SwiftShader (a software Vulkan adapter, via wgpu-native, minimal dependencies, no GPU).

Two fixes were needed for SwiftShader's downlevel limits: request the adapter's full requiredLimits at device creation (software adapters default storage-buffer limits to 0), and make the add op's workgroup size dynamic instead of a hardcoded constant. The WGSL now declares a pipeline-overridable override wg_size: u32 = 256 and the host clamps it to the device's maxComputeInvocationsPerWorkgroup (256 on real GPUs and lavapipe, 128 on SwiftShader), so SwiftShader's 128-invocation cap no longer forces a smaller workgroup size on real hardware. This mirrors the dynamic-workgroup-sizing approach in D107259348 and opens the door to selecting device/algorithm-optimal sizes later. The add op also validates its 1D dispatch count before allocating any GPU objects, against the device's queried maxComputeWorkgroupsPerDimension (falling back to the WebGPU spec-default floor of 65535 only when the limit query fails). Per Stephen's review, the workgroup-size clamp and the dispatch-count computation are factored into reusable inline helpers in runtime/WebGPUUtils.h (clamp_workgroup_size and compute_1d_workgroup_count, mirroring the Vulkan delegate's utils::div_up) so the other ops can share them rather than re-inlining the logic. The editable CMake build additionally marks the vulkan_schema subdirectory EXCLUDE_FROM_ALL so the WebGPU ALL build does not pull in targets that need glslc.
@exported-using-ghexport

Differential Revision: D107288999

[ghstack-poisoned]

pytorch-bot · 2026-06-02T22:55:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19964

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 3 Cancelled Jobs, 2 Unrelated Failures, 2 Unclassified Failures

As of commit a464b37 with merge base 915a82d ():

NEW FAILURES - The following jobs have failed:

pull / test-static-llama-qnn-linux (stories_110m) / linux-job (gh)
RuntimeError: Command docker exec -t 8d05ef7faeedee281947509fe1d6106e72a6ac4592feed0cbcd672399bfa865a /exec failed with exit code 92
trunk / test-arm-backend-vkml (test_pytest_models_vkml) / linux-job (gh)
RuntimeError: Command docker exec -t e1be6c697f4408ac9aa77afbfc67ff4d6043c250f8056f37132187a62f3adc40 /exec failed with exit code 1
trunk / test-llama-runner-qnn-linux (fp32, qnn_16a16w, qnn) / linux-job (gh)
RuntimeError: Command docker exec -t bd76c7071c20618d480affed120a8bc51971940bf87349e2908589b50d951741 /exec failed with exit code 92

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

Test WebGPU Backend / test-webgpu / test-backend-linux (webgpu, models) / linux-job (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
RuntimeError: Command docker exec -t 442766bd488c5015faf143e61aebf0770f2ae043585c425d7a6992c8ede80664 /exec failed with exit code 1
Test WebGPU Backend / test-webgpu / test-backend-linux (webgpu, operators) / linux-job (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
RuntimeError: Command docker exec -t fb6630b05dad0262721629498a63d0b5d2acd83e12771c79ef86048166afea15 /exec failed with exit code 1

CANCELLED JOBS - The following jobs were cancelled. Please retry:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / test-qnn-testsuite-linux / test-backend-linux (qnn, models) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / unittest-release / macos / macos-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-02T22:56:59Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Summary: Wires the WebGPU backend into the standard ExecuTorch backend test suite and adds an x86 Linux CI job, mirroring the Vulkan delegate: `backends/test/suite/flows/webgpu.py` plus a `WebGPUTester`, run by `oss/.github/workflows/test-backend-webgpu.yml` on SwiftShader (a software Vulkan adapter, via `wgpu-native`, minimal dependencies, no GPU). Two fixes were needed for SwiftShader's downlevel limits: request the adapter's full `requiredLimits` at device creation (software adapters default storage-buffer limits to 0), and make the `add` op's workgroup size dynamic instead of a hardcoded constant. The WGSL now declares a pipeline-overridable `override wg_size: u32 = 256` and the host clamps it to the device's `maxComputeInvocationsPerWorkgroup` (256 on real GPUs and lavapipe, 128 on SwiftShader), so SwiftShader's 128-invocation cap no longer forces a smaller workgroup size on real hardware. This mirrors the dynamic-workgroup-sizing approach in D107259348 and opens the door to selecting device/algorithm-optimal sizes later. The `add` op also validates its 1D dispatch count before allocating any GPU objects, against the device's queried `maxComputeWorkgroupsPerDimension` (falling back to the WebGPU spec-default floor of 65535 only when the limit query fails). Per Stephen's review, the workgroup-size clamp and the dispatch-count computation are factored into reusable `inline` helpers in `runtime/WebGPUUtils.h` (`clamp_workgroup_size` and `compute_1d_workgroup_count`, mirroring the Vulkan delegate's `utils::div_up`) so the other ops can share them rather than re-inlining the logic. The editable CMake build additionally marks the `vulkan_schema` subdirectory `EXCLUDE_FROM_ALL` so the WebGPU `ALL` build does not pull in targets that need glslc. ghstack-source-id: 389222646 exported-using-ghexport Differential Revision: D107288999

[ghstack-poisoned]

Pull Request resolved: #19964 Wires the WebGPU backend into the standard ExecuTorch backend test suite and adds an x86 Linux CI job, mirroring the Vulkan delegate: `backends/test/suite/flows/webgpu.py` plus a `WebGPUTester`, run by `oss/.github/workflows/test-backend-webgpu.yml` on SwiftShader (a software Vulkan adapter, via `wgpu-native`, minimal dependencies, no GPU). Two fixes were needed for SwiftShader's downlevel limits: request the adapter's full `requiredLimits` at device creation (software adapters default storage-buffer limits to 0), and make the `add` op's workgroup size dynamic instead of a hardcoded constant. The WGSL now declares a pipeline-overridable `override wg_size: u32 = 256` and the host clamps it to the device's `maxComputeInvocationsPerWorkgroup` (256 on real GPUs and lavapipe, 128 on SwiftShader), so SwiftShader's 128-invocation cap no longer forces a smaller workgroup size on real hardware. This mirrors the dynamic-workgroup-sizing approach in D107259348 and opens the door to selecting device/algorithm-optimal sizes later. The `add` op also validates its 1D dispatch count before allocating any GPU objects, against the device's queried `maxComputeWorkgroupsPerDimension` (falling back to the WebGPU spec-default floor of 65535 only when the limit query fails). Per Stephen's review, the workgroup-size clamp and the dispatch-count computation are factored into reusable `inline` helpers in `runtime/WebGPUUtils.h` (`clamp_workgroup_size` and `compute_1d_workgroup_count`, mirroring the Vulkan delegate's `utils::div_up`) so the other ops can share them rather than re-inlining the logic. The editable CMake build additionally marks the `vulkan_schema` subdirectory `EXCLUDE_FROM_ALL` so the WebGPU `ALL` build does not pull in targets that need glslc. ghstack-source-id: 389636486 @exported-using-ghexport Differential Revision: [D107288999](https://our.internmc.facebook.com/intern/diff/D107288999/)

Summary: Pull Request resolved: pytorch#19964 Wires the WebGPU backend into the standard ExecuTorch backend test suite and adds an x86 Linux CI job, mirroring the Vulkan delegate: `backends/test/suite/flows/webgpu.py` plus a `WebGPUTester`, run by `oss/.github/workflows/test-backend-webgpu.yml` on SwiftShader (a software Vulkan adapter, via `wgpu-native`, minimal dependencies, no GPU). Two fixes were needed for SwiftShader's downlevel limits: request the adapter's full `requiredLimits` at device creation (software adapters default storage-buffer limits to 0), and make the `add` op's workgroup size dynamic instead of a hardcoded constant. The WGSL now declares a pipeline-overridable `override wg_size: u32 = 256` and the host clamps it to the device's `maxComputeInvocationsPerWorkgroup` (256 on real GPUs and lavapipe, 128 on SwiftShader), so SwiftShader's 128-invocation cap no longer forces a smaller workgroup size on real hardware. This mirrors the dynamic-workgroup-sizing approach in D107259348 and opens the door to selecting device/algorithm-optimal sizes later. The `add` op also validates its 1D dispatch count before allocating any GPU objects, against the device's queried `maxComputeWorkgroupsPerDimension` (falling back to the WebGPU spec-default floor of 65535 only when the limit query fails). Per Stephen's review, the workgroup-size clamp and the dispatch-count computation are factored into reusable `inline` helpers in `runtime/WebGPUUtils.h` (`clamp_workgroup_size` and `compute_1d_workgroup_count`, mirroring the Vulkan delegate's `utils::div_up`) so the other ops can share them rather than re-inlining the logic. The editable CMake build additionally marks the `vulkan_schema` subdirectory `EXCLUDE_FROM_ALL` so the WebGPU `ALL` build does not pull in targets that need glslc. ghstack-source-id: 389222646 exported-using-ghexport Differential Revision: D107288999

Pull Request resolved: #19964 Wires the WebGPU backend into the standard ExecuTorch backend test suite and adds an x86 Linux CI job, mirroring the Vulkan delegate: `backends/test/suite/flows/webgpu.py` plus a `WebGPUTester`, run by `oss/.github/workflows/test-backend-webgpu.yml` on SwiftShader (a software Vulkan adapter, via `wgpu-native`, minimal dependencies, no GPU). Two fixes were needed for SwiftShader's downlevel limits: request the adapter's full `requiredLimits` at device creation (software adapters default storage-buffer limits to 0), and make the `add` op's workgroup size dynamic instead of a hardcoded constant. The WGSL now declares a pipeline-overridable `override wg_size: u32 = 256` and the host clamps it to the device's `maxComputeInvocationsPerWorkgroup` (256 on real GPUs and lavapipe, 128 on SwiftShader), so SwiftShader's 128-invocation cap no longer forces a smaller workgroup size on real hardware. This mirrors the dynamic-workgroup-sizing approach in D107259348 and opens the door to selecting device/algorithm-optimal sizes later. The `add` op also validates its 1D dispatch count before allocating any GPU objects, against the device's queried `maxComputeWorkgroupsPerDimension` (falling back to the WebGPU spec-default floor of 65535 only when the limit query fails). Per Stephen's review, the workgroup-size clamp and the dispatch-count computation are factored into reusable `inline` helpers in `runtime/WebGPUUtils.h` (`clamp_workgroup_size` and `compute_1d_workgroup_count`, mirroring the Vulkan delegate's `utils::div_up`) so the other ops can share them rather than re-inlining the logic. The editable CMake build additionally marks the `vulkan_schema` subdirectory `EXCLUDE_FROM_ALL` so the WebGPU `ALL` build does not pull in targets that need glslc. ghstack-source-id: 389636486 @exported-using-ghexport Differential Revision: [D107288999](https://our.internmc.facebook.com/intern/diff/D107288999/)

Update

ca90cb2

[ghstack-poisoned]

JulianCloudNTH requested review from kirklandsign and larryliu0820 as code owners June 2, 2026 22:55

This was referenced Jun 2, 2026

[ExecuTorch][WebGPU] Upload named-data constants in WebGPUGraph #19962

Merged

[ExecuTorch][WebGPU] Add rms_norm op #19963

Merged

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 2, 2026

SS-JIA approved these changes Jun 3, 2026

View reviewed changes

Update

a464b37

[ghstack-poisoned]

meta-codesync Bot added fb-exported meta-exported labels Jun 3, 2026

meta-codesync Bot merged commit 66b405f into gh/JulianCloudNTH/9/base Jun 4, 2026
320 of 334 checks passed

meta-codesync Bot deleted the gh/JulianCloudNTH/9/head branch June 4, 2026 01:32

meta-codesync Bot temporarily deployed to cherry-pick-bot June 4, 2026 01:32 Inactive

pytorchbot mentioned this pull request Jun 4, 2026

[ExecuTorch][WebGPU] Enable backend test suite + x86 CI #20007

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ExecuTorch][WebGPU] Enable backend test suite + x86 CI#19964

[ExecuTorch][WebGPU] Enable backend test suite + x86 CI#19964
meta-codesync[bot] merged 2 commits into
gh/JulianCloudNTH/9/basefrom
gh/JulianCloudNTH/9/head

JulianCloudNTH commented Jun 2, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JulianCloudNTH commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19964

❌ 3 New Failures, 3 Cancelled Jobs, 2 Unrelated Failures, 2 Unclassified Failures

Uh oh!

github-actions Bot commented Jun 2, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JulianCloudNTH commented Jun 2, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 2, 2026 •

edited

Loading

This PR needs a `release notes:` label