New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WebGPU Compatibility Mode #4266
Comments
It was asked why we did not consider extending the reach further to include D3D FL10_0. However, the limitations imposed in compute shaders seem overly restrictive, and would prevent many useful compute features. In particular (if I'm reading it correctly), only a single UAV is allowed, meaning only a single storage buffer. Storage textures would be unavailable. There are no atomic instructions. |
The idea came up today in our meeting to which can be summed up as 'Make WebGPU work as is for compat' The suggested solutions were
That sounds great! ... if possible Some points were made
All that said, any ways in which we can make "compat" more compatible with full WebGPU and keep high performance would super great IMO. With that in mind, some things that don't seem emulatable with a out large perf hit (note: for the points below, assume we're discussing a compat implementation on top of OpenGL ES 3.1
It seems like most of the remaining issues don't matter (they're emulated already) with the following exceptions
|
This is a miscommunication. In one proposal, a bunch of features won't work. In another proposal, those features will work, albeit more slowly than they would in non-compat mode. Any application which is okay with missing features must necessarily be okay if those features exist but are slower than they would in non-compat mode. In both scenarios, the author has to know which features are missing/polyfilled, so they can know to avoid them if they're missing, or decrease their use if polyfilled. So it's totally okay to decrease performance for features which otherwise would be missing. (We'd be totally OK telling applications what is slower than it would be in non-compat mode (and telling them if they are in compat mode or not). Either in the JS console, in the spec, in MDN, or elsewhere.)
I'm kind of bewildered by this idea, for 2 reasons:
Our expectation is that fp16 will become ubiquitous over time. The difference is that usage of fp16 will increase to ubiquity, whereas usage of compat mode will decrease to irrelevance. |
Here are some ideas about polyfilling:
This is a situation where late compilation can help.
As a first-order approximation, that's generally right. As long as there is no data loss, and all the information necessary to be stored is, in fact, stored somewhere in the texture, it will always be possible to emulate any kind of lookup (cube map or otherwise) in software. It will be slower - you may need to texelFetch multiple texels and blend them together in software, but that's better than the feature being missing entirely.
This seems possible with a compute shader. These operations are already only valid between passes, where the implementation can inject compute shaders. (And, of course, compute shader support is already necessary in general.) (Also, compressed texture format support is already optional. Another option is just to not support them.)
Similar to (1) above. If we can recompile whenever we want, we can emit code that adds a constant to any miplevel accesses or array layers.
This one seems difficult. Depending on the use cases, there may be a few possible paths forward: a) require the GL extension, b) enforce the requirement in WebGPU proper, c) run the render pass N times for N color targets with different states. Maybe there's another way, too.
Again, late compilation and emulating sample operations in software can polyfill this.
It should be possible to emulate loading by crafting sample positions and sampling.
This is trivial to polyfill if you know how big the texture is.
There is no spec change necessary for this. Implementations can just do it.
Being able to recompile whenever you want alleviates the need for this too. WebGPU BGRA textures can be represented as OpenGL ES RGBA textures, and at draw call time it's possible to re-compile the shader to do the necessary swizzling. This uses the fact that every texture access in WGSL knows which texture it's accessing. (We are intentionally not commenting on the "Compatibility mode workarounds" section, as they are in the spirit of this proposal we are making.) (As an aside: it strikes me that the above set of polyfills is way smaller than the amount of polyfills required to get WebGL running on D3D on Windows machines.) |
Reiterating some of the points I made on the previous call here, just for posterity:
|
Thank you for all the details. I was in the middle or writing up thoughts but it sounds like you covered most of them above. I personally like this direction. Recompiling shaders doesn't seem bad to me. It will only come up if users go outside the proposed limits and the browser can warn them "this operation is slower, consider this alternative" The question still comes up what specifically to do on some features
|
I want to add, I still see the compat mode validation limits as a viable solution. In particular, it doesn't require removing features from shipping WebGPU (assuming that ends up being a requirement to take the "more emulation" path). So, I wanted to comment on these points
This doesn't seem true to me. Any browser that doesn't want to implement compat mode doesn't have to. Devs may set What the devs get still works with their code. There is no validation that needs to live forever.
I'd say those are 2 of the same thing. Full WebGPU will increase to ubiquity exactly as fp16 will increase to ubiquity. I don't understand the difference given the design. |
More things to consider:Given 2 possible implementations
emulation may be slow for things that would be fast using validation+hintIt's not clear emulation will always be fast for paths that would be fast under the validation+hint scheme. For example, 6 layer 2d-array vs cube-map. In the validation+hint impl, the user provides a In the emulation impl, the impl probably guesses? If 6 layers then TEXTURE_CUBE_MAP (because more common?), it not 6 layers then TEXTURE_2D_ARRAY. If the user then uses a 6 layer texture as a 2d-array the implementation would have to emulate 2d-array logic on top of a TEXTURE_CUBE_MAP. That emulation appears to be pretty slow (wrote it as an exercise), in particular, it's slow because you need to emulate wrapping and clamping at the edges. So this path that would be faster with the validation+hint solution is significantly slower with the emulation solution. Maybe you could keep a copy of the texture data and generate both TEXTURE_2D_ARRAY and TEXTURE_CUBE_MAP when used. It seems unacceptable to keep a CPU copy of the texture data to generate 2d-array or cube-map as needed because that would use 2x-3x the memory, nor can you convert the texture from cube-map to 2d-array since if the texture is a non-copyable format (like a compressed texture) then you have no way to do the conversion. Further, it doesn't appear you can wait until first use to decide what to do on the backend (TEXTURE_2D_ARRAY or TEXTURE_CUBE_MAP). For one, that would require more memory as any texture yet to be used needs to keep a CPU copy of the data, waiting for first use. A user could easily upload lots of textures first, and only later use them and run out of CPU memory as the backend holds its copy. Further, all of the functions that operate on a texture would have to have 2 paths, one to work on the CPU copy and the existing ones that work on a real texture (so writeTexture, copyTextureToTexture, copyBufferToTexture, copyTextureToBuffer) etc. Because, the only signal of which actually lets the impl know which type of texture to make on the backend is TEXTURE_ATTACHMENT usage. Another issue with deciding on first use is first use is often generating mipmaps. So, user loads a 6 layers, generates mips, implementation now has to choose is it TEXTURE_2D_ARRAY or TEXTURE_CUBE_MAP. Which ever one it picks, if it picks wrong, user gets the slow path. We could warn the user, but now user will need some really strange workaround where they need to use a texture as TEXTURE_ATTACHMENT with the view they really want to see in the end, before actually generating mips, all to hopefully coerce the impl to choose the faster path. It seems better to let the user just tell it their intent via a hint. Yet another memory issue is shader variations. To do the emulation requires baking emulated sampler parameters into the shader PER emulated texture. In practice I suspect there wouldn't be that many variations used but, as it is there is 18 bits of info needed per emulated texture so Emulation is potentially a waste of timeIf we go about implementing slow emulation solutions for compat, we'll likely want to tell users they're hitting the slow path. Concrete examples
So, user uses a 6 layer texture as a 2d-array, browser prints to dev-console "this is really slow, consider a different solution". It seems strange to spend a bunch of time writing slow emulation just to then tell the user "don't use this". There's an assumption here that emulation is not slow because it has to compile a shader, it's slow because the resulting shader doing the emulation is slow. |
Thanks to all for your thoughts and ideas! While I do think the Compatibility mode is still the better option, I think there are merits to the proposal to implement as much of WebGPU as possible. If further core functionality can be made to work and the community can agree to unship those features which cannot, it would obviate the need for the compatibility flag. This would make it possible to run the large majority of existing core WebGPU content without modification. However, one baseline condition I feel we must meet is that any further core functionality implemented would not hinder performance on the common use cases that the Compatibility proposal already allows. If emulation for the rare cases impedes performance in the common case, it's a non-starter. I believe that makes the viewDimension on GPUTextureDimension a requirement, even if it is only as a hint. Otherwise, in the case of the 6-layer array <> cube map ambiguity for example, creating a view of non-default dimension would incur emulation, regardless of the default chosen, resulting in more texture samples and more ALU (or texture copying) than would be needed with a specified viewDimension. If I'm correct, then perhaps a first step would be to debate and standardize the viewDimension hint on its own. Another concern I have is that providing warnings on slow paths in the dev console would have to be implemented on all browsers and all backends, even on those platforms which do not need to incur them. Otherwise, the developer may hit those unexpected performance cliffs. It would be best if those warnings were incorporated in the spec. That said, I'd like to focus on the issues that we think are difficult to implement efficiently, such as per-attachment blend state and copying of compressed textures. If we cannot agree to unship those features, then a compatibility flag becomes the better option and we can debate the other issues (such as texture views) with knowledge that we are heading towards consensus rather than stumble on possible deal-breakers. Another one I'd like to draw your attention to is the lack of baseVertex/baseInstance in indirect draws. (My fault: I placed it in the list of workarounds, since direct draws can be emulated. Will fix!) |
Want to mention a few more restrictions imposed by trying to run on top of OpenGL ES 3.1:
Limits:
Features:
There might be more restrictions but these are the additional ones I found so far. It would be helpful to have a tool similar to https://github.com/kainino0x/gpuinfo-vulkan-query but for OpenGL ES. |
Also, has anyone checked if OpenGL ES 3.1+ has the same level of support for the texture formats we currently have in the spec (and more importantly their capabilities)? |
GPU Web 2023-08-16 (Atlantic-timed)
|
As requested, here's a doc which sketches out the spec changes required for a single issue, in this case per-attachment blend state, from both the "unship and make optional Feature" approach and the "compatibilityMode" approach: https://github.com/SenorBlanco/webgpu-per-attachment-blend-state/blob/main/proposal.md (Please forgive me for non-standard or weird formatting or terminology; I'm not a spec editor.) |
We discussed internally and would prefer the compat flag approach over unshipping features from v1. Per-attachment blend state is something that can not easily be poly-filled, so it seems reasonable to move this behind the compat flag. |
Note: removed the 6-layer cube map default from the viewDimension defaults (to match the consensus proposal). |
We discussed some of the remaining open compat mode issues:
|
Looking at the specs:
❌ = fn with It's odd that
The |
We are ok with the current proposal and restrictions for compat mode, specifically restrictions (7), (8), and (10) would be fine to have as restrictions. |
@mwyrzykowski in #3838 (comment) asked if we can still support Quoting my comment here #3838 (comment):
We already covered the lack of cube map texture arrays and agreed to validate those out since OpenGL ES 3.1 also doesn't have those. The remaining item is @mwyrzykowski it would be worth double-checking that I haven't missed any other capabilities that |
GL_MAX_FRAGMENT_IMAGE_UNIFORMS has a minimum value of 0, as does GL_MAX_VERTEX_IMAGE_UNIFORMS. Only GL_MAX_COMPUTE_IMAGE_UNIFORMS has a non-zero minimum (4). We might have to look at the values on actual devices, but if we were to set the Compat limits based on the ES 3.1 limits, devices lacking storage textures in fragment shaders would be compliant. |
@teoxoy I did not find any other missing capabilities that Apple3 doesn't have and the core spec requires. |
Just to follow up: there are no reports with zero GL_MAX_FRAGMENT_IMAGE_UNIFORMS or GL_MAX_VERTEX_IMAGE_UNIFORMS on ES 3.1 on gpuinfo.org: https://opengles.gpuinfo.org/displaycapability.php?name=GL_MAX_FRAGMENT_IMAGE_UNIFORMS&esversion=31. I'm not sure this is conclusive; I need to look at the spec sheets for some devices to verify that there aren't any other popular devices with zero maximums. The limitation I see above in Apple3 is with respect to read/write storage textures; are we sure that applies to write-only storage textures as well? |
It was previously called "Function Texture Read-Writes". Looking at the article below it talks about writable textures which includes read-write ones.
|
I think this is another thing that needs to be added to the list. WebGPU supports a stride of 0 for attributes. OpenGL ES does not. Stride of 0 in GL = advance the size of the attribute. Stride of 0 in WebGPU = a stride of 0 as in don't advance the attribute. Just keep reading the same value. I guess in Compat a stride if 0 in an attribute would generate a validation error? Otherwise you could read value in the buffer, turn off the attribute (gl.disableVertexAttribPointer), and set the attribute's constant value (gl.vertexAttrib4f(...valueReadFromBuffer)) |
I am pretty sure we only ever disallowed storage textures (which at that point were always write-only) in vertex shaders, which would mean Apple3 must have supported write-only in fragment shaders. You can verify this from the Metal Feature Set Tables: in the "texture capabilities by pixel format" table, a lot of formats have the "All" or "Write" capabilities all the way back to Apple2 (Apple1 if you look at older copies of the PDF). I think these apply to both fragment and compute.
I think this means "Before, only fragment functions could write to textures. Now, both vertex and fragment can write to textures." |
@kainino0x yes that appears to be correct and is my understanding as well. |
I see. It's an interesting choice of wording; I would have expected something along the lines of: "Vertex functions can now write to textures (in addition to fragment and kernel functions).". |
It would be worth double-checking with someone from the Metal team as declaring a write-only texture in a fragment function didn't use to work prior to MSL 1.2 (which came out at the same time as the "Function Texture Read-Writes" capability). See https://shader-playground.timjones.io/f08d3d49dcb231d15b062c614baf6002; found in gfx-rs/naga#2486 (comment). It could be that this was just a compiler limitation and not a hardware one meaning that if you are using MSL 1.2+ and running on any GPU family (even |
Checking into this: The CTS tests this here: https://github.com/gpuweb/cts/blob/6e75f19212e3deaebd5bd8542fe10a6fdedc0cdf/src/webgpu/api/operation/vertex_state/correctness.spec.ts#L728 In OpenGL this can be emulated by setting the divisor for the attribute to a high number
Which is what Dawn is currently doing. So I guess no spec changes are needed for this. Just passing on the existing solution. |
GPU Web 2023-11-29 Atlantic-time
|
Problem
WebGPU is a good match for modern explicit graphics APIs such as Vulkan, Metal and D3D12. However, there are a large number of devices which do not yet support those APIs. In particular, on Chrome on Windows, 31% of Chrome users do not have D3D11.1 or higher. On Android, 23% of Android users do not have Vulkan 1.1 (15% do not have Vulkan at all). On ChromeOS, Vulkan penetration is still quite low, while OpenGL ES 3.1 is ubiquitous.
Goals
The primary goal of WebGPU Compatibility mode is to increase the reach of WebGPU by providing an opt-in, slightly restricted subset of WebGPU which will run on older APIs such as D3D11 and OpenGL ES. This will increase adoption of WebGPU applications via a wider userbase.
Since WebGPU Compatibility mode is a subset of WebGPU, all valid Compatibility mode applications are also valid WebGPU applications. Consequently, Compatibility mode applications will also run on user agents which do not support Compatibility mode. Such user agents will simply ignore the option requesting a Compatibility mode Adapter and return a Core WebGPU Adapter instead.
WebGPU Spec Changes
partial dictionary GPURequestAdapterOptions { boolean compatibilityMode = false; }
When calling
GPU.RequestAdapter()
, passingcompatibilityMode = true
in theGPURequestAdapterOptions
will indicate to the User Agent to select the Compatibility subset of WebGPU. Any Devices created from the resulting Adapter on supporting UAs will support only Compatibility mode. Calls to APIs unsupported by Compatibility mode will result in validation errors.Note that a supporting User Agent may return a
compatibilityMode = true
Adapter which is backed by a fully WebGPU-capable hardware adapter, such as D3D12, Metal or Vulkan, so long as it validates all subsequent API calls made on the Adapter and the objects it vends against the Compatibility subset.As a convenience to the developer, the Adapter returned will have the
isCompatibilityMode
property set totrue
.See "Texture view dimension can be specified", below.
Compatibility mode restrictions
1. Texture view dimension may be specified
When specifying a texture, a
textureBindingViewDimension
property determines the views which can be bound from that texture for sampling (see "Proposed IDL changes", above). Binding a view of a different dimension for sampling than specified at texture creation time will cause a validation error. IftextureBindingViewDimension
is unspecified, use the same algorithm ascreateView()
:Justification: OpenGL ES does not support texture views.
Alternatives considered:
add
viewDimension
toGPUTextureDescriptor
, as above, but make it mandatory; not specifying a viewDimension is a validation errormake a view dimension guess at texture creation time, and perform a texture-to-texture copy at bind time if the guess was incorrect.
make a view dimension guess at texture creation time, and perform a texture-to-texture copy on first binding if the guess was incorrect. All subsequent views bound for sampling from that texture must have the same dimension as the first use, else a validation error occurs.
disallow 6-layer 2D arrays (always cube maps)
disallow cube maps (always create 6-layer 2D arrays)
2. Disallow
CommandEncoder.copyTextureToBuffer()
andCommandEncoder.copyTextureToTexture()
for compressed texture formatsCommandEncoder.copyTextureToBuffer()
andCommandEncoder.copyTextureToTexture()
of a compressed texture is disallowed, and will result in a validation error.Justification: Compressed texture formats are non-renderable in OpenGL ES, and
glReadPixels() on works on a framebuffer-complete FBO. Additionally, because ES 3.1 does not support glCopyImageSubData(), texture-to-texture copies must be worked around with glBlitFramebuffer(). Since compressed textures cannot be bound for rendering, they cannot use the glBlitFramebuffer() workaround.
Alternatives considered:
3. Views of the same texture used in a single draw may not differ in mip level or array layer parameters.
A draw call may not reference the same texture with two views differing in
baseMipLevel
,mipLevelCount
,baseArrayLayer
, orarrayLayerCount
. Only a single mip level range and array layer range per texture is supported. This is enforced via validation at encode time.Justification: OpenGL ES does not support texture views.
Alternatives considered:
4. Color state
alphaBlend
,colorBlend
andwriteMask
may not differ between color attachments in a single draw.Color state descriptors used in a single draw must have the same alphaBlend, colorBlend and writeMask, or else an encode-time validation error will occur.
Justification: OpenGL ES 3.1 does not support indexed draw buffer state.
Alternatives considered
GL_EXT_draw_buffers_indexed
GL_EXT_draw_buffers_indexed
has limited support (~42%)5. Disallow
sample_mask
builtin in WGSL.Justification: OpenGL ES 3.1 does not support
gl_SampleMask
,gl_SampleMaskIn
.Alternatives considered
GL_OES_sample_variables
GL_OES_sample_variables
has limited support (~48%)6. Disallow
GPUTextureViewDimension
"CubeArray"
via validationJustification: OpenGL ES does not support Cube Array textures.
Alternatives Considered:
7. Disallow
textureLoad()
of depth textures in WGSL via validation.Justification: OpenGL ES does not support
texelFetch()
of a depth texture.Alternatives considered:
texture()
with quantized texture coordinates; massage the results8. Disallow
texture*()
of atexture_depth_2d_array
with an offsetJustification: OpenGL ES does not support
textureOffset()
on a sampler2DArrayShadow.Alternatives considered:
texture()
call and use ALU for offset9. Emit
dpdx()
anddpdy()
for all derivative functions (include Coarse and Fine variants).Justification: GLSL does not support
dFd*Coarse()
ordFd*Fine()
functions. However, these variants can be interpreted as a hint in WGSL, and emitted asdFd*()
.Alternatives considered:
Coarse
andFine
variants via validation in WGSLCoarse
is allowed;Fine
is disallowed via validation10. Disallow bgra8unorm-srgb textures.
Justification: OpenGL ES does not support sRGB BGRA texture formats.
Alternatives considered:
copyBufferToTexture()
and the reverse oncopyTextureToBuffer()
Compatibility mode workarounds
The features below are not supported natively in OpenGL ES, but it is proposed to implement them in the User Agent via workarounds.
1. Emulate
copyTextureToBuffer()
of depth/stencil textures with a compute shaderJustification: OpenGL ES does not support
glReadPixels()
of depth/stencil textures.Alternatives considered:
GL_NV_read_depth_stencil
2. Emulate
copyTextureToBuffer()
of SNORM textures with a compute shaderJustification: OpenGL ES does not support
glReadPixels()
of SNORM texturesAlternatives considered:
3. Emulate separate sampler and texture objects with a cache of combined texture/samplers.
Justification: OpenGL ES does not support separate sampler and texture objects.
Alternatives considered:
4. Inject hidden uniforms for
textureNumLevels()
andtextureNumSamples()
where required.Justification: OpenGL ES 3.1 does not support textureQueryLevels() (only added to desktop GL in OpenGL 4.3).
Alternatives Considered:
textureNumLevels()
andtextureNumSamples()
in WGSL via validation.5. Emulate 1D textures with 2D textures.
Justification: OpenGL ES does not support 1D textures.
Alternatives Considered:
6. Manually pad out GLSL structs and interface blocks to support explicit
@align
or@size
decorations.Justification: OpenGL ES does not support offset= interface block decorations on anything but
atomic_uint
.Alternatives considered:
@align
and@size
on WGSL structs via validation7. Use
GL_ext_texture_format_BGRA8888
to support BGRAcopyBufferToTexture()
and swizzle workarounds and RGBA textures where unavailable.Justification: OpenGL ES does not support BGRA texture formats.
GL_ext_texture_format_BGRA8888
supports texture uploads and the BGRA8888 texture format, and has 99%+ support. The vast majority of devices which do not support it are GLES 3.0 implementations, and so would not support Compatibility mode anyway, but if an important device emerges, a CPU- or GPU-based swizzle workaround and RGBA textures should be implemented.Alternatives considered
8. Work around lack of BGRA support in copyTextureToBuffer() via compute or sampling.
Justification: OpenGL ES does not support BGRA texture formats for
glReadPixels()
, even with theGL_ext_texture_format_BGRA8888
extension.Alternatives considered:
9. Use emulation workaround to support BaseVertex / BaseInstance in direct draws. Disallow via validation in indirect draws.
Justification: OpenGL ES 3.1 does not support
baseVertex
orbaseInstance
parameters in Draw calls.Alternatives considered
OES_draw_elements_base_vertex
(21% support) orEXT_draw_elements_base_vertex
(21%) andGL_EXT_base_instance
(1.7%)The text was updated successfully, but these errors were encountered: