-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffer indices should be unsigned #1135
Comments
I assume that since we have a resolution we don't need to discuss this? We only need a PR? |
I don't think we had finalized this. Background: In the meeting I mentioned SPIR-V / Vulkan treat array indices as signed. Since the meeting, I learned that LLVM's LangRef is clear in that indices are treated as signed integers.
...
An implication: For arrays with more than 2**31-1 elements, you can't access the upper elements. If you want something in that range you have to use a wider int width for the index. That's perfectly reasonable. Also, I dispute the implication that signed index checks are necessarily less efficient. |
I think in the long term we should allow either signed or unsigned indices. We would need the limitation in any case that the index values be limited to 0 .. max-signed-int. I'd support limiting it to one of the options (signed or unsigned), as a work-reduction choice for MVP. |
Thanks @dneto0 for elaboration and the info! First, let's talk about the limit. If the user wants to access anything at 2^31 index or higher, what would be the buffer size containing that data? Even for half-floats, 2 bytes each, it seems like the buffer has to be of size at least 4GB, and all of it must be visible to shaders. I don't think WebGPU have to support storage buffer bindings that are that large:
With that in mind, WebGPU will have a limit on the storage buffer binding with the baseline likely well below 4GB mark. Therefore, the implementations can be safe in knowing that the index 2^31 and higher is never going to be valid, anyway. So this addresses the note that 2^31 will have to be a limit even if unsigned indices are used, because in fact our limit will be even lower anyway. And it makes unsigned indices a logical choice going forward, I think. |
I think this argument would convince me if we knew the maximum size of a resource at shader compilation time. GPU memory sizes are increasing; indeed, modern GPUs regularly have > 4GB of memory, so we can't really use the device memory size as a signal that it's okay to cast the signed index to an unsigned and use a single comparison. However, I'd guess that the average resource (view?) size in common applications isn't growing past 4GB any time soon (in accordance with @kvark's comment above). So I think the fundamental point is correct - casting a signed index to an unsigned before running the conditional should work for most resources. If there was a way to know at shader generation time that it was safe to take this shortcut in a future-proof way, I think that would solve this problem. |
See also recently filed #1371 |
FYI. The spec already allows you to use either signed or unsigned indices. https://gpuweb.github.io/gpuweb/wgsl/#array-access-expr Maybe it's hard to see because of the "Int" metavariable. |
A quick round-up on what I was saying on the call:
|
It's no worse than having the addition operator (+) work on both signed operands or unsigned operands. It's going to be very very common in users source code. I think supporting both int and uint has way higher benefit-to-cost ratio compared to #1588 (scalar weight on the mix builtin-function). Note: texture sizing (dimensions and levels) are provided or returned as signed integers. It would be weird to disallow signed array indices. |
|
I don't see where pipeline layout includes information about the maximum size a buffer can be; all I can see is the minimum size a buffer can be: https://gpuweb.github.io/gpuweb/#dom-gpubufferbindinglayout-minbindingsize. Regardless, I think this is what the generated code would hold, if we could somehow know:
I'm worried about the "Cast signed index to unsigned index" cell in Metal, |
It's not a pipeline layout option, it's two global limits on uniform and storage buffer binding sizes: |
Right. If the "maximum buffer size" concept is per-device, rather than per-buffer, then we can't differentiate between the first 3 rows to use in that table above, so we have to pessimize when we generate code, and use the 4th row. |
The maximum storage buffer binding size default is only 128MiB. Wouldn't it have to be go >2GiB (actually >8GiB when only ≥4-byte loads are supported (default), >4GiB when only ≥2-byte loads are supported) before we start falling into the 4th row? Sorry if I'm totally missing something. |
From a quick bit of research it appears that signed-to-unsigned is well-defined in C++, and only unsigned-to-signed is implementation-defined. https://stackoverflow.com/a/43336256 |
2-byte loads are important for us; we'll probably implement them at the same time as implementing 4-byte loads. But your general point is correct - we will hit this when we want larger buffers, which the industry is moving towards. Big graphics cards you can buy today often already have > 10GB of memory. I expect the industry will want to go beyond 128MB quite soon, and I don't think it will be that long before the industry wants to go beyond 4GB. So, the question is: When that happens, and our software runs on a device which has lots of memory, and we want to expose large buffers, what are we going to do? Are we going to:
I did not know this! Good to know. |
They'll still be gated on an optional feature, so you can use the device configuration to determine when it's needed.
I haven't been watching this issue closely enough but my uninformed opinion is that this sounds great. |
WGSL meeting minutes 2021-05-11
|
We're discussing the options around: Initially:
With the question of: what is the plan for very large arrays, i.e. element count larger than i32 max value. One option is:
The biggest problem I see with this is:
I think trying to make this case more smooth starts to get into automatic promotion from smaller scalar types to larger scalar types. I think I don't want to go in that direction. |
If the indices can be both i32 or u32, does it means that we need a |
@dneto0, your example seems wrong. Did you mean |
It's materially the same, for example, when i = -5:
|
I meant i32. But it could have been u32 as well. |
i32 is never problematic though. The range for u32 between SIGNED_INT_MAX and UNSIGNED_INT_MAX is the part that needs spelled out for expansion. |
Thinking about this more, we can make life easier on the programmer, and get rid of the weird "code doesn't compile" case. Modify the proposal to:
This is exactly what both LLVM and SPIR-V do. So restating completely: Initially:
Future expansion:
This is simpler for the user, and a closer match to both LLVM and SPIR-V. That meshes well with the compiler stack underlying MSL, DXC/DXIL, and Vulkan. |
Well, WGSL doesn't actually have any 64-bit types, yet, so we are de-facto already limiting ourself to the range 0 .. MAX_UINT32. (For now! Using 64-bit types at all will have to require an extension anyway. I don't see any 64-bit integral types on https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-scalar so it looks like D3D wouldn't be able to offer this extension when we get around to spec'ing it.) I believe this discussion is regarding "what happens when indices are between MAX_INT32 .. MAX_UINT32." |
It's unfortunate, but these docs are quite out-of-date and have not been updated for what DXC supports. There is a more up-to-date table on this DXC Wiki page, but it isn't titled well: https://github.com/microsoft/DirectXShaderCompiler/wiki/16-Bit-Scalar-Types. This includes 64-bit signed and unsigned int types, which are supported under an optional feature flag. It also includes explicit 16-bit int and float types (rather than fuzzy min-precision ones), also supported under an optional feature flag. This doesn't mean that HLSL supports 64-bit array indexing or buffer offsets however. Those are still limited by the API and in HLSL to 32-bit unsigned sizes/indexes/offsets. |
WGSL meeting minutes 2021-06-29
|
I don't really understand the subtleties of this discussion but note that WebGPU has a |
I tried to understand the behavior that @dneto0 was describing in Vulkan, and came away more confused than when I started. I wrote a Vulkan program that's as simple as possible to test out the behavior of indices > 2B. This is the entire shader:
The API code dispatches 1 thread to execute this, and then runs Running the test app, it appears that unsigned indices > 2B work correctly on Vulkan, which seems to contradict @dneto0's previous statements. I tested on a NVIDIA GeForce RTX 2080 Ti and a Using Radeon RX 560 Series (though the Radeon doesn't support The shader, when run through glslangValidator, produces this SPIR-V:
This appears like it's passing the expected unsigned value to the Here is the test program: VulkanLargeArrayTest.zip @dneto0 Do you think you could take a look and help me understand what's going on? |
I also transcribed this program in a bunch of other APIs, too. Here are the results:
Here are the test apps:
⭐: See above |
The simple answer is that you've exercised unchecked undefined behaviour. This is akin to the "works on NVIDIA" problem. |
I'll reassert that those large arrays should be allowed when the largest signed integer type can express their element count. Here's a scenario:
|
Some more notes (possibly repeating what I've said verbally):
|
WGSL meeting minutes 2021-07-13
|
I'd like to be clear here about what the proposal is. I think this is the same as @jdashg's proposal during the call today, but he can correct me if I'm mistaken. There are 3 (or 4) 'overloads' of array accesses in WGSL: On Vulkan:
On D3D12, the largest valid index will already have to be 0x3FFFFFFF (1B) because of that "Highest byte offset from view start" error message above. Anything higher would end up with a byte offset higher than the max 32-bit byte offset. So, D3D12 can just cast the On Metal, the index is just passed directly into the index operation in the generated MSL (after a bounds check, of course). Everything just works. |
@litherum thank you for writing this down! |
What, exactly is the proposed WGSL spec change? https://gpuweb.github.io/gpuweb/wgsl/#array-access-expr allows both i32 and u32 as array indices. |
- rename "array size" in many places to the more specific term "element count" - element count may be an unsigned integer literal, as per gpuweb#1135 - describe array type matching in the positive sense, and as an if-and-only-if set of rules. - update example to show array size with unsigned integer literal. - Reword array layout to make "element stride" a defined term, and separately write out how its value is determined.
During the office hours, we got to a point where (I think!!) we agreed on the following:
|
Thank you! For clarity, this was an agreement between you and @dneto0 specifically, not a group consensus. But I'm sure it's very close to it.
The limits we have apply to uniform and storage buffers. But what if I have a |
+1 thanks for the summary @litherum Regarding status of the spec and necessary changes to implement:
|
WGSL meeting minutes 2021-07-27
|
WGSL meeting minutes 2021-08-10
|
- rename "array size" in many places to the more specific term "element count" - element count may be an unsigned integer literal, as per gpuweb#1135 - describe array type matching in the positive sense, and as an if-and-only-if set of rules. - update example to show array size with unsigned integer literal. - Reword array layout to make "element stride" a defined term, and separately write out how its value is determined.
…le (#2031) * wgsl: array size may be overridable constant identifier It's still an i32; it was only ever an INT_LITERAL in the first place. Partially addressess #1431 (Later commits remove ability to override that constant) * Array size can be a module-scope constant Not just pipeline-overridable * Spell out when array types are different Also give examples. * Simplify IO-shareable Remove matrix, array, and nested structs Fixes: #1877 * Rename "sized array" to "fixed-size array" * Various updates and cleanups: - rename "array size" in many places to the more specific term "element count" - element count may be an unsigned integer literal, as per #1135 - describe array type matching in the positive sense, and as an if-and-only-if set of rules. - update example to show array size with unsigned integer literal. - Reword array layout to make "element stride" a defined term, and separately write out how its value is determined. * Array element count can't be overridable * An size array is host-shareable, even if element count is a module-scope constant * Fix the note about IO-shareable types, regarding bool * Remove stale (and incorrect) note. Change-Id: Ic68a52b994b42ef3e7b12d9eecd8ab6e83cb61eb * Relax element-count condition for array type equality The element-count condition for fixed-size array types depends only on element-count value. Change-Id: I7452375741160412071b5f0fe2e6e615264a4b11
The current interpolation tests will not attempt to validate an interpolation with just the type (e.g. `@interpolate(flat)`). This is due to the `,` always being appended so `@interpolate(flat, )` is generated which is not a valid value. This PR updates the generation to only add the `,` if there is a sampling value to be appended. Issue: gpuweb#1135
If indices are unsigned, bounds checks only need to have one conditional.
There was a resolution for this on 2020-10-6
The text was updated successfully, but these errors were encountered: