[ExecuTorch][WebGPU] Enable backend test suite + x86 CI#19964
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19964
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New Failures, 3 Cancelled Jobs, 2 Unrelated Failures, 2 Unclassified FailuresAs of commit a464b37 with merge base 915a82d ( NEW FAILURES - The following jobs have failed:
UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:
CANCELLED JOBS - The following jobs were cancelled. Please retry:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
Summary: Wires the WebGPU backend into the standard ExecuTorch backend test suite and adds an x86 Linux CI job, mirroring the Vulkan delegate: `backends/test/suite/flows/webgpu.py` plus a `WebGPUTester`, run by `oss/.github/workflows/test-backend-webgpu.yml` on SwiftShader (a software Vulkan adapter, via `wgpu-native`, minimal dependencies, no GPU). Two fixes were needed for SwiftShader's downlevel limits: request the adapter's full `requiredLimits` at device creation (software adapters default storage-buffer limits to 0), and make the `add` op's workgroup size dynamic instead of a hardcoded constant. The WGSL now declares a pipeline-overridable `override wg_size: u32 = 256` and the host clamps it to the device's `maxComputeInvocationsPerWorkgroup` (256 on real GPUs and lavapipe, 128 on SwiftShader), so SwiftShader's 128-invocation cap no longer forces a smaller workgroup size on real hardware. This mirrors the dynamic-workgroup-sizing approach in D107259348 and opens the door to selecting device/algorithm-optimal sizes later. The `add` op also validates its 1D dispatch count before allocating any GPU objects, against the device's queried `maxComputeWorkgroupsPerDimension` (falling back to the WebGPU spec-default floor of 65535 only when the limit query fails). Per Stephen's review, the workgroup-size clamp and the dispatch-count computation are factored into reusable `inline` helpers in `runtime/WebGPUUtils.h` (`clamp_workgroup_size` and `compute_1d_workgroup_count`, mirroring the Vulkan delegate's `utils::div_up`) so the other ops can share them rather than re-inlining the logic. The editable CMake build additionally marks the `vulkan_schema` subdirectory `EXCLUDE_FROM_ALL` so the WebGPU `ALL` build does not pull in targets that need glslc. ghstack-source-id: 389222646 exported-using-ghexport Differential Revision: D107288999
Pull Request resolved: #19964 Wires the WebGPU backend into the standard ExecuTorch backend test suite and adds an x86 Linux CI job, mirroring the Vulkan delegate: `backends/test/suite/flows/webgpu.py` plus a `WebGPUTester`, run by `oss/.github/workflows/test-backend-webgpu.yml` on SwiftShader (a software Vulkan adapter, via `wgpu-native`, minimal dependencies, no GPU). Two fixes were needed for SwiftShader's downlevel limits: request the adapter's full `requiredLimits` at device creation (software adapters default storage-buffer limits to 0), and make the `add` op's workgroup size dynamic instead of a hardcoded constant. The WGSL now declares a pipeline-overridable `override wg_size: u32 = 256` and the host clamps it to the device's `maxComputeInvocationsPerWorkgroup` (256 on real GPUs and lavapipe, 128 on SwiftShader), so SwiftShader's 128-invocation cap no longer forces a smaller workgroup size on real hardware. This mirrors the dynamic-workgroup-sizing approach in D107259348 and opens the door to selecting device/algorithm-optimal sizes later. The `add` op also validates its 1D dispatch count before allocating any GPU objects, against the device's queried `maxComputeWorkgroupsPerDimension` (falling back to the WebGPU spec-default floor of 65535 only when the limit query fails). Per Stephen's review, the workgroup-size clamp and the dispatch-count computation are factored into reusable `inline` helpers in `runtime/WebGPUUtils.h` (`clamp_workgroup_size` and `compute_1d_workgroup_count`, mirroring the Vulkan delegate's `utils::div_up`) so the other ops can share them rather than re-inlining the logic. The editable CMake build additionally marks the `vulkan_schema` subdirectory `EXCLUDE_FROM_ALL` so the WebGPU `ALL` build does not pull in targets that need glslc. ghstack-source-id: 389636486 @exported-using-ghexport Differential Revision: [D107288999](https://our.internmc.facebook.com/intern/diff/D107288999/)
Summary: Pull Request resolved: pytorch#19964 Wires the WebGPU backend into the standard ExecuTorch backend test suite and adds an x86 Linux CI job, mirroring the Vulkan delegate: `backends/test/suite/flows/webgpu.py` plus a `WebGPUTester`, run by `oss/.github/workflows/test-backend-webgpu.yml` on SwiftShader (a software Vulkan adapter, via `wgpu-native`, minimal dependencies, no GPU). Two fixes were needed for SwiftShader's downlevel limits: request the adapter's full `requiredLimits` at device creation (software adapters default storage-buffer limits to 0), and make the `add` op's workgroup size dynamic instead of a hardcoded constant. The WGSL now declares a pipeline-overridable `override wg_size: u32 = 256` and the host clamps it to the device's `maxComputeInvocationsPerWorkgroup` (256 on real GPUs and lavapipe, 128 on SwiftShader), so SwiftShader's 128-invocation cap no longer forces a smaller workgroup size on real hardware. This mirrors the dynamic-workgroup-sizing approach in D107259348 and opens the door to selecting device/algorithm-optimal sizes later. The `add` op also validates its 1D dispatch count before allocating any GPU objects, against the device's queried `maxComputeWorkgroupsPerDimension` (falling back to the WebGPU spec-default floor of 65535 only when the limit query fails). Per Stephen's review, the workgroup-size clamp and the dispatch-count computation are factored into reusable `inline` helpers in `runtime/WebGPUUtils.h` (`clamp_workgroup_size` and `compute_1d_workgroup_count`, mirroring the Vulkan delegate's `utils::div_up`) so the other ops can share them rather than re-inlining the logic. The editable CMake build additionally marks the `vulkan_schema` subdirectory `EXCLUDE_FROM_ALL` so the WebGPU `ALL` build does not pull in targets that need glslc. ghstack-source-id: 389222646 exported-using-ghexport Differential Revision: D107288999
66b405f
into
gh/JulianCloudNTH/9/base
Pull Request resolved: #19964 Wires the WebGPU backend into the standard ExecuTorch backend test suite and adds an x86 Linux CI job, mirroring the Vulkan delegate: `backends/test/suite/flows/webgpu.py` plus a `WebGPUTester`, run by `oss/.github/workflows/test-backend-webgpu.yml` on SwiftShader (a software Vulkan adapter, via `wgpu-native`, minimal dependencies, no GPU). Two fixes were needed for SwiftShader's downlevel limits: request the adapter's full `requiredLimits` at device creation (software adapters default storage-buffer limits to 0), and make the `add` op's workgroup size dynamic instead of a hardcoded constant. The WGSL now declares a pipeline-overridable `override wg_size: u32 = 256` and the host clamps it to the device's `maxComputeInvocationsPerWorkgroup` (256 on real GPUs and lavapipe, 128 on SwiftShader), so SwiftShader's 128-invocation cap no longer forces a smaller workgroup size on real hardware. This mirrors the dynamic-workgroup-sizing approach in D107259348 and opens the door to selecting device/algorithm-optimal sizes later. The `add` op also validates its 1D dispatch count before allocating any GPU objects, against the device's queried `maxComputeWorkgroupsPerDimension` (falling back to the WebGPU spec-default floor of 65535 only when the limit query fails). Per Stephen's review, the workgroup-size clamp and the dispatch-count computation are factored into reusable `inline` helpers in `runtime/WebGPUUtils.h` (`clamp_workgroup_size` and `compute_1d_workgroup_count`, mirroring the Vulkan delegate's `utils::div_up`) so the other ops can share them rather than re-inlining the logic. The editable CMake build additionally marks the `vulkan_schema` subdirectory `EXCLUDE_FROM_ALL` so the WebGPU `ALL` build does not pull in targets that need glslc. ghstack-source-id: 389636486 @exported-using-ghexport Differential Revision: [D107288999](https://our.internmc.facebook.com/intern/diff/D107288999/)
Pull Request resolved: #19964 Wires the WebGPU backend into the standard ExecuTorch backend test suite and adds an x86 Linux CI job, mirroring the Vulkan delegate: `backends/test/suite/flows/webgpu.py` plus a `WebGPUTester`, run by `oss/.github/workflows/test-backend-webgpu.yml` on SwiftShader (a software Vulkan adapter, via `wgpu-native`, minimal dependencies, no GPU). Two fixes were needed for SwiftShader's downlevel limits: request the adapter's full `requiredLimits` at device creation (software adapters default storage-buffer limits to 0), and make the `add` op's workgroup size dynamic instead of a hardcoded constant. The WGSL now declares a pipeline-overridable `override wg_size: u32 = 256` and the host clamps it to the device's `maxComputeInvocationsPerWorkgroup` (256 on real GPUs and lavapipe, 128 on SwiftShader), so SwiftShader's 128-invocation cap no longer forces a smaller workgroup size on real hardware. This mirrors the dynamic-workgroup-sizing approach in D107259348 and opens the door to selecting device/algorithm-optimal sizes later. The `add` op also validates its 1D dispatch count before allocating any GPU objects, against the device's queried `maxComputeWorkgroupsPerDimension` (falling back to the WebGPU spec-default floor of 65535 only when the limit query fails). Per Stephen's review, the workgroup-size clamp and the dispatch-count computation are factored into reusable `inline` helpers in `runtime/WebGPUUtils.h` (`clamp_workgroup_size` and `compute_1d_workgroup_count`, mirroring the Vulkan delegate's `utils::div_up`) so the other ops can share them rather than re-inlining the logic. The editable CMake build additionally marks the `vulkan_schema` subdirectory `EXCLUDE_FROM_ALL` so the WebGPU `ALL` build does not pull in targets that need glslc. ghstack-source-id: 389636486 @exported-using-ghexport Differential Revision: [D107288999](https://our.internmc.facebook.com/intern/diff/D107288999/)
Stack from ghstack (oldest at bottom):
Wires the WebGPU backend into the standard ExecuTorch backend test suite and adds an x86 Linux CI job, mirroring the Vulkan delegate:
backends/test/suite/flows/webgpu.pyplus aWebGPUTester, run byoss/.github/workflows/test-backend-webgpu.ymlon SwiftShader (a software Vulkan adapter, viawgpu-native, minimal dependencies, no GPU).Two fixes were needed for SwiftShader's downlevel limits: request the adapter's full
requiredLimitsat device creation (software adapters default storage-buffer limits to 0), and make theaddop's workgroup size dynamic instead of a hardcoded constant. The WGSL now declares a pipeline-overridableoverride wg_size: u32 = 256and the host clamps it to the device'smaxComputeInvocationsPerWorkgroup(256 on real GPUs and lavapipe, 128 on SwiftShader), so SwiftShader's 128-invocation cap no longer forces a smaller workgroup size on real hardware. This mirrors the dynamic-workgroup-sizing approach in D107259348 and opens the door to selecting device/algorithm-optimal sizes later. Theaddop also validates its 1D dispatch count before allocating any GPU objects, against the device's queriedmaxComputeWorkgroupsPerDimension(falling back to the WebGPU spec-default floor of 65535 only when the limit query fails). Per Stephen's review, the workgroup-size clamp and the dispatch-count computation are factored into reusableinlinehelpers inruntime/WebGPUUtils.h(clamp_workgroup_sizeandcompute_1d_workgroup_count, mirroring the Vulkan delegate'sutils::div_up) so the other ops can share them rather than re-inlining the logic. The editable CMake build additionally marks thevulkan_schemasubdirectoryEXCLUDE_FROM_ALLso the WebGPUALLbuild does not pull in targets that need glslc.@exported-using-ghexport
Differential Revision: D107288999
Differential Revision: D107288999