-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FP16 support #658
Comments
cc #230 |
Definite +1 from us for fp16/fp64 extensions; we've been expecting them for a long time. Possibly separate extensions for 16-bit (and 8-bit) reads/writes from memory. |
+CC @qjia7 that was investigating the very same thing from the TF.js side. Unfortunately I don't think Vulkan requires support for FP16 so it should be an extension. Also one thing to be careful about is that there are two separate capabilities between being able to load FP16 data from buffers, and having FP16 ALUs, so we have to figure out how we want to expose them. |
So, if |
Two options IMO:
|
A third option is GLSL-style minimum precision guarantees. This is particularly valuable for doing basic math on lowp float (~float9) data for colors, and keeping colors packed into 32bytes. |
For shader variables that makes sense to me, though technically it does generate a portability issue since it's not possible to know you're testing with the lowest possible precision. WebGL users run into this occasionally, but not often, since hardware is in practice not that inconsistent. |
I agree FP16 is a highly desirable feature. Sadly, support is not universal among Vulkan devices (but growing). So I agree this would have to be an extension. Let's keep it simple and have a single extension to enable the feature. Vulkan split it into a storage (load/store) feature for certain storage classes, and a distinct arithmetic feature. The apparent motivation was that some devices supported one and not the other, and vice versa. Let's avoid that. Let's make FP16 one feature in WebGPU. |
SPIR-V opted to model GLSL lowp and mediump as:
Some implementations do take advantage of this RelaxedPrecision feature to attain better performance and energy usage. If I were king, I'd rather the world adopt FP16 instead. RelaxedPrecision feels like a half-step we should avoid with WebGPU. |
I forgot to mention where to get more info: The Vulkan 16bit float arithmetic feature bit is "shaderFloat16" from "VK_KHR_shader_float16_int8"
The Vulkan 16bit storage features are from "VK_KHR_16bit_storage"
|
It would be valuable to enable devs to tag things as lower precision than full float32, whether that's float16 or something more vague. (but perhaps more flexible) Does Qualcomm Android really not have a non-float32 arithmetic path for spir-v? That seems surprising to me. I had thought I'd seen cases where moving the same GLSL from desktop to mobile caused lack-of-precision artifacts! Maybe this was related to input/output load/stores, not arithmetic? |
If you unroll the loops, you can get much closer to the theoretical maximum: This represents a 44% progression on that same iPhone 11 Pro. |
Recently, I had re-written dawn/examples/ComputeBoids example with FP16 arithmetic and FP16 data load/store, could also get nearly 50% performance improvement on Vulkan backend. |
Like @dneto0 points out, as a developer f16 support would be amazing to have. However, support for them is not as universal as we'd like so we typically would only ship f16 based code on fixed platforms (console / phone) where we can guarantee it's availability. |
There's a couple of things going on here. If you're only limited by memory bandwidth, then load/store of fp16 values is what you need. So that might explain why there wasn't enough pressure early enough to force this issue in the Android space. But I certainly believe there are also gains when you only do the arithmetic in 16bits. |
I'm just explaining backstory here. I'm not advocating any particular path. Regarding mediump and lowp: As I understand it, mediump and lowp caused a lot of grief due to variability between devices. In the move to Vulkan, they were remapped to SPIR-V RelaxedPrecision. Yes, this still allows some painful variability. (I've helped customers through this). A slight mitigation is you can clamp the precision of a result to 16bits with the funky OpQuantizeToF16 https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpQuantizeToF16 |
Currently, TypedArray for FP16 is Stage 1 of TC39 proposals process but without active progress. I think it's an important proposal for this issue, so can't work on it? |
Thanks for the heads-up, though JS FP16 support is not required (and not directly important) for WebGPU. (WebGL already has FP16 resources) Indeed most native work is done in languages (all of them?) without first-class FP16 support. F16<->double manual conversion is pretty similar in C++ as in JS, which is proof-of-MVP to me. |
@litherum, you did not check whether Metal support true FP16 in your demo. I read Metal spec, Metal does not raise requirements, for example Metal version or hardware query result, to support true FP16 feature. Does all mac devices support metal true 16-bit float? |
Discussed at the 2020-04-14 meeting. |
Resolution was FP16 supported as an optional extension covering computation and storage, but with some follow-up issues to be raised (e.g @dneto0 on quantize) Interpolation is not included. |
I said in the meeting that:
Spec, testing, and tooling work will compete for staff time for actually-minimum-viable work. |
We believe this is, actually, part of the minimum set. In our team's experience, many shaders just straight-up won't run at reasonable speeds without FP16 support. It's absence makes many apps unusable. |
Do you have a proposal to achieve that? Relaxed precision? |
According to the conversation during this week's call, the group seemed to agree that an extension was the right direction. (I'm not saying FP16 should be part of core. I'm saying FP16 should be usable on iPhones in the first software release of WebGPU) |
Ok, sorry, I thought "minimum set" meant core and you were disagreeing with the result from the meeting. Agreed with having it at release, but I think that's a vendor decision. (I hope we will be able to do it as well.) |
Closing this issue, as the F16 extension against spec PR #2696 has been merged. |
…pect* (gpuweb#658) - Remove cases that do not specify an aspect for sampling a multiplanar format - Choose 'uint' as the GPUTextureSampleType when sampling stencil - Pass the depthStencilFormat to the render bundle encoder to match the GPURenderPassDescriptor Bug: crbug.com/dawn/993
FP16 provides significant benefits over FP32:
I wanted to characterize the ALU performance, so I made a small Metal benchmark to execute on iOS. Here are the shaders:
When running on an iPhone 11 Pro, here are the results:
As you can see, FP16 is a demonstrable 24.9% progression. Theoretically, it could be a 50% progression on this device. This, coupled with the significant decrease in memory footprint, indicates the feature is important to include in WGSL.
The text was updated successfully, but these errors were encountered: