Skip to content

feat(graphics): expose WGSL packed_4x8_integer_dot_product as a device cap#8787

Merged
mvaligursky merged 1 commit into
mainfrom
mv-wgsl-packed-4x8-integer-dot-product
May 27, 2026
Merged

feat(graphics): expose WGSL packed_4x8_integer_dot_product as a device cap#8787
mvaligursky merged 1 commit into
mainfrom
mv-wgsl-packed-4x8-integer-dot-product

Conversation

@mvaligursky
Copy link
Copy Markdown
Contributor

Adds detection and automatic wiring for the WGSL packed_4x8_integer_dot_product language feature, which exposes the DP4a built-in functions for 8-bit packed integer dot products:

  • dot4U8Packed, dot4I8Packed — packed 4×i8/u8 dot product
  • pack4xI8, pack4xU8, pack4xI8Clamp, pack4xU8Clamp — pack helpers
  • unpack4xI8, unpack4xU8 — unpack helpers

These accelerate quantized inference and similar integer-heavy compute workloads on hardware that exposes the DP4a instruction.

Follows the same pattern as unrestricted_pointer_parameters (#8785) and pointer_composite_access (#8786).

Changes:

  • GraphicsDevice.supportsPacked4x8IntegerDotProduct flag, probed from navigator.gpu.wgslLanguageFeatures in WebgpuGraphicsDevice#initDeviceCaps.
  • CAPS_PACKED_4X8_INTEGER_DOT_PRODUCT shader define for conditional compilation (#ifdef).
  • requires packed_4x8_integer_dot_product; directive automatically injected into WGSL shaders on supporting devices via ShaderDefinitionUtils.getWGSLEnables.

Notes:

  • Infrastructure-only — no shader callers yet. The cap is available for future compute shaders that benefit from DP4a (e.g. ML inference, image processing).
  • On devices that don't expose the feature, the cap is false and no requires directive is emitted, so existing portable shaders continue to compile.

…e cap

Adds detection and automatic wiring for the WGSL
`packed_4x8_integer_dot_product` language feature, which exposes the
DP4a-family built-in functions for 8-bit packed integer dot products
(`dot4U8Packed`, `dot4I8Packed`) and the associated pack/unpack helpers
(`pack4x{I,U}8`, `pack4x{I,U}8Clamp`, `unpack4x{I,U}8`). Useful for
accelerating quantized inference and similar integer-heavy compute
workloads on hardware that exposes the DP4a instruction.

Follows the same pattern as `unrestricted_pointer_parameters` and
`pointer_composite_access`:

- `supportsPacked4x8IntegerDotProduct` device flag, probed from
  navigator.gpu.wgslLanguageFeatures.
- `CAPS_PACKED_4X8_INTEGER_DOT_PRODUCT` shader define for conditional
  compilation.
- `requires packed_4x8_integer_dot_product;` directive automatically
  injected into WGSL shaders on supporting devices.

See https://developer.chrome.com/blog/new-in-webgpu-123#dp4a_built-in_functions_support_in_wgsl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: graphics Graphics related issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant