Buffer indices should be unsigned #1135

litherum · 2020-10-06T18:24:50Z

If indices are unsigned, bounds checks only need to have one conditional.

There was a resolution for this on 2020-10-6

grorg · 2020-10-13T10:24:42Z

I assume that since we have a resolution we don't need to discuss this? We only need a PR?

dneto0 · 2020-10-19T18:52:01Z

I don't think we had finalized this.

Background:

In the meeting I mentioned SPIR-V / Vulkan treat array indices as signed.

Since the meeting, I learned that LLVM's LangRef is clear in that indices are treated as signed integers.
See the getelementptr instruction:

If the inbounds keyword is present ... with infinitely precise signed arithmetic ...

...

If the inbounds keyword is not present, the offsets are added to the base address with silently-wrapping two’s complement arithmetic. If the offsets have a different width from the pointer, they are sign-extended or truncated to the width of the pointer.

An implication: For arrays with more than 2**31-1 elements, you can't access the upper elements. If you want something in that range you have to use a wider int width for the index. That's perfectly reasonable.

Also, I dispute the implication that signed index checks are necessarily less efficient.
For example, LLVM IR has a "icmp ule" instruction for integer comparison unsigned less-than-or-equal that can be used to tell if the index is in range 0..MAX_INT. or 0..MY_ARRAY_SIZE.
That's only one comparison.

dneto0 · 2020-10-19T18:58:00Z

I think in the long term we should allow either signed or unsigned indices.

We would need the limitation in any case that the index values be limited to 0 .. max-signed-int.

I'd support limiting it to one of the options (signed or unsigned), as a work-reduction choice for MVP.

kvark · 2020-10-21T16:49:04Z

Thanks @dneto0 for elaboration and the info!

First, let's talk about the limit. If the user wants to access anything at 2^31 index or higher, what would be the buffer size containing that data? Even for half-floats, 2 bytes each, it seems like the buffer has to be of size at least 4GB, and all of it must be visible to shaders.

I don't think WebGPU have to support storage buffer bindings that are that large:

Vulkan's maxStorageBufferRange limit starts at 128Mb. We'll need to add it to our limits similar to the existing maxUniformBufferBindingSize that we have.
Metal has a runtime limit for the maximum buffer length that we have to query. It's been reported that the limit can easily be as low as 256Mb on some configurations.

With that in mind, WebGPU will have a limit on the storage buffer binding with the baseline likely well below 4GB mark. Therefore, the implementations can be safe in knowing that the index 2^31 and higher is never going to be valid, anyway.

So this addresses the note that 2^31 will have to be a limit even if unsigned indices are used, because in fact our limit will be even lower anyway. And it makes unsigned indices a logical choice going forward, I think.

litherum · 2021-01-09T10:05:30Z

I think this argument would convince me if we knew the maximum size of a resource at shader compilation time.

GPU memory sizes are increasing; indeed, modern GPUs regularly have > 4GB of memory, so we can't really use the device memory size as a signal that it's okay to cast the signed index to an unsigned and use a single comparison.

However, I'd guess that the average resource (view?) size in common applications isn't growing past 4GB any time soon (in accordance with @kvark's comment above). So I think the fundamental point is correct - casting a signed index to an unsigned before running the conditional should work for most resources. If there was a way to know at shader generation time that it was safe to take this shortcut in a future-proof way, I think that would solve this problem.

dneto0 · 2021-01-26T20:47:12Z

See also recently filed #1371

dneto0 · 2021-05-11T14:55:29Z

FYI. The spec already allows you to use either signed or unsigned indices.

https://gpuweb.github.io/gpuweb/wgsl/#array-access-expr

Maybe it's hard to see because of the "Int" metavariable.

kvark · 2021-05-11T18:59:49Z

A quick round-up on what I was saying on the call:

The main problem with uint, which conceptually matches indexing better, is the inconvenience of typing u suffix: foo[1u]. This should be solved in the planned future (unable to find a proper reference to this consensus).
Current index limit is effectively 128M, coming from maxStorageBufferBindingSize on the host side. The sign of the index doesn't matter for sizes less than 2Gb.
A Vulkan implementation that wishes to expose this limit higher than 2Gb can do so only with ShaderInt64 feature. If it sees the limit requested higher than 2Gb, and the structure size in the shader would permit indexes higher than 2Gb, it would internally convert the indexes (only to the storage run-time arrays, nothing else) to 64-bit before bounds checking and passing to the driver.
Supporting both int and uint seems to be a strange choice (that nobody asked for?).
Supporting only uint would require a tiny bit more complexity for 32-bit indexes of large arrays (2.5 code paths instead of 2), but nothing major.
If we go with signed int only, it seems to be both ergonomic and simple to implement. (<- proposal?)

dneto0 · 2021-05-12T16:29:16Z

Supporting both int and uint seems to be a strange choice (that nobody asked for?).

It's no worse than having the addition operator (+) work on both signed operands or unsigned operands. It's going to be very very common in users source code.

I think supporting both int and uint has way higher benefit-to-cost ratio compared to #1588 (scalar weight on the mix builtin-function).

Note: texture sizing (dimensions and levels) are provided or returned as signed integers. It would be weird to disallow signed array indices.

litherum · 2021-05-12T18:29:03Z

This should be solved in the planned future (unable to find a proper reference to this consensus).

#739

litherum · 2021-05-12T19:31:05Z

I don't see where pipeline layout includes information about the maximum size a buffer can be; all I can see is the minimum size a buffer can be: https://gpuweb.github.io/gpuweb/#dom-gpubufferbindinglayout-minbindingsize.

Regardless, I think this is what the generated code would hold, if we could somehow know:

Number of elements	Signed N-bit Index (e.g. `i16`, `i32`, `i64`)	Unsigned N-bit Index
Known to be < 2^(N-1)	Cast signed index to unsigned index. Single unsigned comparison.	Single unsigned comparison.
Known to be between 2^(N-1) and 2^N	~~A single signed comparison with 0 just to tell if the index is negative.~~ Edit: see below	Single unsigned comparison.
Known to be > 2^N	~~A single signed comparison with 0 just to tell if the index is negative.~~ Edit: see below	No comparison!
Unknown	~~Two comparisons~~ Edit: see below	Single unsigned comparison.

I'm worried about the "Cast signed index to unsigned index" cell in Metal, ~~as this is technically undefined in C++~~ Edit: Turns out this is defined! See #1135 (comment)

kainino0x · 2021-05-12T19:33:14Z

It's not a pipeline layout option, it's two global limits on uniform and storage buffer binding sizes:
https://gpuweb.github.io/gpuweb/#dom-supported-limits-maxuniformbufferbindingsize
These limits are applied in createBindGroup.

litherum · 2021-05-12T19:37:47Z

Right.

If the "maximum buffer size" concept is per-device, rather than per-buffer, then we can't differentiate between the first 3 rows to use in that table above, so we have to pessimize when we generate code, and use the 4th row.

kainino0x · 2021-05-12T19:48:40Z

The maximum storage buffer binding size default is only 128MiB. Wouldn't it have to be go >2GiB (actually >8GiB when only ≥4-byte loads are supported (default), >4GiB when only ≥2-byte loads are supported) before we start falling into the 4th row? Sorry if I'm totally missing something.

kainino0x · 2021-05-12T19:51:38Z

I'm worried about the "Cast signed index to unsigned index" cell in Metal, as this is technically undefined in C++.

From a quick bit of research it appears that signed-to-unsigned is well-defined in C++, and only unsigned-to-signed is implementation-defined. https://stackoverflow.com/a/43336256

litherum · 2021-05-12T20:06:37Z

>4GiB when only ≥2-byte loads are supported

2-byte loads are important for us; we'll probably implement them at the same time as implementing 4-byte loads.

But your general point is correct - we will hit this when we want larger buffers, which the industry is moving towards. Big graphics cards you can buy today often already have > 10GB of memory.

I expect the industry will want to go beyond 128MB quite soon, and I don't think it will be that long before the industry wants to go beyond 4GB.

So, the question is: When that happens, and our software runs on a device which has lots of memory, and we want to expose large buffers, what are we going to do? Are we going to:

Start injecting two comparisons instead of one, for every access of every buffer, just because we don't know at compilation time whether or not we're hitting the first row of the table or the second.
1a. Same as 1, but allow authors to supply unsigned indices if they want, too. Evangelize the fact that unsigned indices are going to be faster than signed indices.
Give developers a way of telling us their (minimum and) maximum buffer size, for each buffer, during compilation time. Also, hope that both the minimum and the maximum fall within the same row of the table above
Just make array indices unsigned and be done with it
Maybe there's a 4th option? Would be interested to hear ideas.

signed-to-unsigned is well-defined in C++

I did not know this! Good to know.

kainino0x · 2021-05-12T20:56:23Z

2-byte loads are important for us; we'll probably implement them at the same time as implementing 4-byte loads.

They'll still be gated on an optional feature, so you can use the device configuration to determine when it's needed.

Just make array indices unsigned and be done with it

I haven't been watching this issue closely enough but my uninformed opinion is that this sounds great.

kdashg · 2021-05-17T18:07:33Z

WGSL meeting minutes 2021-05-11

MM: One Q: How does this work in VK? Can you use a uint to index into array? (yes) If the value is too large, does it wrap around? (yes, unless the buffer is big enough) If there’s an array of size 3B, and I use uint=2.9B, does that work?
DN: Hmm, in llvm address info is signed, so what I said above is probably wrong.
AB: Well, could use i64 if supported, and unlikely to have a system support 3B addressable that doesn’t support i64, so this is probably a perf question.
MM: I understand that buffers are usually smaller than 2GB, but it would be nice if we didn’t have to pick between perf and capability up-front.
TR: Is this a backend/implementation question? (maybe)
AB: VK doesn’t usually care about signedness of ints, but addressing is defined in signed ints, which limits us.
MM: So my 2.9B might be interpreted as negative i32? (yes)
AB: What’s the current limit? (128MB?) So implementations could choose to offer <=2GB and just not have this problem.
TR: Seems preferable not to decide that unsigned ints aren’t interpreted as signed for the purposes for addressing.
DM: So in the future, we could offer >2GB limits only if we support the i64 support we need. And since limits are opt-in, we could know when authors expect to need >2GB, and do the right thing.
DN: Is there a portability concern?
JG: Probably no?
DM: We are free to not expose anything above the baseline limits.
DN: Do we still need uint and sint overloads for addressing?
MM: If I have a buffer with 3.9B indices, and I ask for -0.1B, would that work?
MM: So is it, if the author doesn’t claim a max buffer size, they get the 128MB limit. Only if they ask for >2B, would we have to emit the extra ops during shader translation.
DM: Why are we investigating negative indices? Don’t we not want them?
DN: I don’t know how much of a hit it would be to usability to offer i32 support here.
JB: Rust always uses uints for indices, but it has good enough ergonomics that supports this.
DN: Do we really need the full >2GB?
JG: <missing>
MM: In real-pointer languages, they really are signed, because you can move them up or down.
JG: But fundamentally i32 and u32 add are the same bit results
<missing>
AB: Can we just limit the size?
MM: Yeah but what happens with out-of-bounds inputs?

dneto0 · 2021-05-18T20:25:26Z

We're discussing the options around:

Initially:

allow both i32 and u32 indices
limit array element count to i32 max value

With the question of: what is the plan for very large arrays, i.e. element count larger than i32 max value.

One option is:

only allow very large arrays when the implementation also supports i64 in WGSL
- Suggest requiring explicit opt-in enable i64_sized_array;.
- Don't kick in just because i64 is available.
Array element count is at most i64 max value.
- Note: arrayLength returns u32, so we would need a new builtin anyway, e.g. arrayLength64 (ha!)
an array is "very large" if and only if:
- it has a declared size larger than i32 max value, or
- it is a runtime-sized array
Index into a very large array must be either i64 or u64

The biggest problem I see with this is:

Code written without i64 in mind no longer compiles. Most troublesome (to me): any code that indexes into a runtime-sized array:

    [[block]] struct Foo { data : array<i32>; }

    [[...]] var<storage> buf: Foo;
    fn ... {
          let i: i32 = ...
          // Won't compile if large arrays are used anywhere.
          ...  buf.data[i];  
    }

I think trying to make this case more smooth starts to get into automatic promotion from smaller scalar types to larger scalar types. I think I don't want to go in that direction.

Kangz · 2021-05-19T11:36:42Z

If the indices can be both i32 or u32, does it means that we need a clamp operation to inject robust access in the shader for i32 accesses, or are we going to rely on min(static_cast<u32>(index), arrayLength-1) instead?

alan-baker · 2021-05-19T12:37:24Z

@dneto0, your example seems wrong. Did you mean u32 as the type of i?

dneto0 · 2021-05-19T13:14:57Z

If the indices can be both i32 or u32, does it means that we need a clamp operation to inject robust access in the shader for i32 accesses, or are we going to rely on min(static_cast(index), arrayLength-1) instead?

It's materially the same, for example, when i = -5:

clamp( -5, 0, INT_MAX) = min( max(-5,0), arrayLen-1) = 0
min(static_cast(-5), arrayLength-1) = min(4billion-ish, arrayLength-1) = arrayLength-1

dneto0 · 2021-05-19T13:15:44Z

@dneto0, your example seems wrong. Did you mean u32 as the type of i?

I meant i32. But it could have been u32 as well.

alan-baker · 2021-05-19T13:22:12Z

i32 is never problematic though. The range for u32 between SIGNED_INT_MAX and UNSIGNED_INT_MAX is the part that needs spelled out for expansion.

dneto0 · 2021-05-19T13:40:43Z

Thinking about this more, we can make life easier on the programmer, and get rid of the weird "code doesn't compile" case. Modify the proposal to:

allow any signed integer type that's already valid in the language.
allow any unsigned integer type that's already valid in the language. (Assume that when n-bit integers are introduced into the language, that both signed and unsigned forms are valid.)
always interpret the indices as signed. (So if it's unsigned, reinterpret the bits as a signed integer)

This is exactly what both LLVM and SPIR-V do.

So restating completely:

Initially:

allow both i32 and u32 indices
limit array element count to i32 max value

Future expansion:

allow both signed and unsigned indices
max array size is the max signed integer value (over any signed integer type)
always interpret array indices as signed. Bitcast an unsigned value to signed, if needed

This is simpler for the user, and a closer match to both LLVM and SPIR-V. That meshes well with the compiler stack underlying MSL, DXC/DXIL, and Vulkan.

litherum · 2021-06-23T02:35:01Z

I wrote this comment earlier in the issue but it bears repeating in case people missed it. I've clarified the comment to make it less ambiguous.

For D3D12, indices only work for values up to MAX_UINT32. I suggest we limit WebGPU to that range, or a subset of that range if we need to.

Well, WGSL doesn't actually have any 64-bit types, yet, so we are de-facto already limiting ourself to the range 0 .. MAX_UINT32. (For now! Using 64-bit types at all will have to require an extension anyway. I don't see any 64-bit integral types on https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-scalar so it looks like D3D wouldn't be able to offer this extension when we get around to spec'ing it.)

I believe this discussion is regarding "what happens when indices are between MAX_INT32 .. MAX_UINT32."

tex3d · 2021-06-29T18:35:14Z

I don't see any 64-bit integral types on https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-scalar so it looks like D3D wouldn't be able to offer this extension when we get around to spec'ing it.)

It's unfortunate, but these docs are quite out-of-date and have not been updated for what DXC supports. There is a more up-to-date table on this DXC Wiki page, but it isn't titled well: https://github.com/microsoft/DirectXShaderCompiler/wiki/16-Bit-Scalar-Types.

This includes 64-bit signed and unsigned int types, which are supported under an optional feature flag. It also includes explicit 16-bit int and float types (rather than fuzzy min-precision ones), also supported under an optional feature flag.

This doesn't mean that HLSL supports 64-bit array indexing or buffer offsets however. Those are still limited by the API and in HLSL to 32-bit unsigned sizes/indexes/offsets.

kdashg · 2021-06-29T19:38:33Z

WGSL meeting minutes 2021-06-29

MM: I’m still working more on this. Two attacks: 1) Try in a number of shading languages; 2) Experimenting with zero-extending vs perf. I found I didn’t fully understand DN’s proposal. Could you elaborate?
DN: I think the problem is for large arrays, and I think I want a large array to require a wide int to index it. I want the max size of the array to be no larger than the max signed integer size.
MM: Consider, what if I index with -10. Is that “too big”? How should it work?
DN: It would wrap around.
MM: If the base of the array lies above 4 billion, then that should still work?
DN: That’s an implementation detail. The addressing that happens should be 32-bit friendly/well-defined. The fact that it has byte addresses that are very large is an implementation detail.
MM: I’m looking at the cost of a zero extension, and an array implementation involves a multiply and an add. On 32-bit pointer device, this is moot, so ignore this. On a 64-bit pointer device (that has enough memory), the multiply and the add would then be on the wider 64-bit types. So the question is what we do to promote the 32-bit signed indexing to our 64-bit internals? Assuming data lies at addresses, and arrays have a base pointer…
DN: That’s an implementation question.
JG: Is this sign-extend vs zero-extend?
MM: On many processors, the 32-bit operations will automatically zero the top half of the 64bit result register. So if that’s the goal, it’s generally free. For sign-extension, sometimes we’d need an extra instruction to get that behavior.
DN: I think I’m saying, if you’re thinking byte addressing has to happen a certain way, I’m saying it’s an implementation choice.
DN: For example, OpenCL didn’t start out with byte addressable memory, it had to be an extension. It’s often in hardware now, but not everywhere.
MM: If I write a program that use either signed or unsigned for the implementation, it will be the same instructions. Instead, in order to get a perf comparison, I have done all my own pointer math. If I present that, would it be valuable to others?
DN: I suspect that the memory access overhead/bandwidth will dominate the pointer math, even with caching. I would start with testing on a CPU first.
MM: I have some results, but not ready to share.
DN: I also think that given our portability requirements, then we shouldn’t admit this comparison.
[missing]
DN: I think if authors want >2GB they should author using 64bit types
MM: I think the math might still be expensive, which would mean that it would be useful to use a 32bit index
MM: We could potentially give the authors both options: u32 could be cheaply zero extended if users wanted it, or they could use i32, or i64 if they want
DN: Going back to automatic promotion to the widest signed type.
MM: If I have an array, I want to access the 3 billionth element, I would then need a wider type. Devices that support a 3 billion array probably do support i64. The compiler could see an unsigned array index and output i64 math. For the devices that don’t support i64, they would have lower max-size limits, and that would be ok.
DN: Could have a rule, if 32bit index, we’ll use a 64bit index if the device supports it, and you’ve opted into it.
MM: I think what you’re saying is: In WGSL, if the author has not enabled 64bit ext, max buffer sizes will be artificially low, but only for spir-v devices. If they opt-in instead, the max size of buffers would be higher
DN: Not quite: No special case for spir-v. If you want have a 2B+ element array, you’d need to opt in to the i64 extension.
MM: I think that’s over-restrictive. Could make it up to implementations, where some implementations could handle u32 indices properly, even without i64.
DN: Right now spec says index can be i32 or u32, so we have that already.
JG: I think the proposal is that implementations that don’t want to support 3B arrays without i64, could choose to limit the max size to 2B and not have to deal with it. What’s the portability issue exactly?
DN: It would be clear that e.g. a device is a spir-v device because they are limited to 2B if it doesn’t have i64.
JG: So this is a problem with non-portable limits? (yes) I think that’s not a philosophical problem/blocker for us.
DM: Are we debating whether to put the max array size in the spec?
DN: Right now we limit them in WGSL based on the max int size.
DM: Is MM unhappy with this? (yes)
MM: I just want my u32(3B) to work
DN: I want you to have to use i64(3B)
MM: I think that’s unreasonable. Is it valuable if other shading language would support this?
DN: No, because we still need spir-v to work.
JG: I think we’re at the point where we just need to choose qualitatively, and that there’s not more data to be gathered. I think we should take another week to think to ourselves about it before forcing a decision (at worst, voting) next week.
GR: As a note, DX doesn’t support 64bit indexing, though there are 64bit capability bits, but not for indexing.
GR: We may want to reconsider taking a consequential vote so close after the US holiday

Kangz · 2021-06-29T20:39:13Z

I don't really understand the subtleties of this discussion but note that WebGPU has a maxStorageBufferBindingSize limit with a guaranteed value of 128MB only. An implementation could use this value to control the maximum size of arrays indexable by the shader.

litherum · 2021-07-10T05:37:52Z

I tried to understand the behavior that @dneto0 was describing in Vulkan, and came away more confused than when I started. I wrote a Vulkan program that's as simple as possible to test out the behavior of indices > 2B. This is the entire shader:

#version 460

#extension GL_EXT_shader_explicit_arithmetic_types_int8 : require

layout(std430, binding = 0) buffer BigBuffer {
	uint8_t bigBuffer[];
};

layout(push_constant) uniform PushConstants {
	uint value;
};

layout(local_size_x = 1) in;
void main() {
	uint index = (1u << 31u) + 4u;
	bigBuffer[index] = uint8_t(value);
}

The API code dispatches 1 thread to execute this, and then runs vkCmdCopyBuffer() to copy the result to the CPU to see what was written. I expect the vkCmdCopyBuffer() is copying from the correct place, because the byte offset is represented as a VkDeviceSize, which is a uint64_t. I also enabled VK_LAYER_KHRONOS_validation, RobustBufferAccess, and RobustBufferAccess2 to make sure I wasn't doing anything wrong.

Running the test app, it appears that unsigned indices > 2B work correctly on Vulkan, which seems to contradict @dneto0's previous statements. I tested on a NVIDIA GeForce RTX 2080 Ti and a Using Radeon RX 560 Series (though the Radeon doesn't support VK_EXT_robustness2), and they both seem to agree.

The shader, when run through glslangValidator, produces this SPIR-V:

...
               6:             TypeInt 32 0 // "0" means unsigned
               7:             TypePointer Function 6(uint)
               9:     6(uint) Constant 2147483652
...
              14:     13(ptr) Variable Uniform
...
              16:     15(int) Constant 0
...
        8(index):      7(ptr) Variable Function
                              Store 8(index) 9
              17:     6(uint) Load 8(index)
...
              26:     25(ptr) AccessChain 14 16 17
                              Store 26 24

This appears like it's passing the expected unsigned value to the AccessChain operation.

Here is the test program: VulkanLargeArrayTest.zip

@dneto0 Do you think you could take a look and help me understand what's going on?

litherum · 2021-07-10T06:11:49Z

I also transcribed this program in a bunch of other APIs, too. Here are the results:

API	GPU	OS	Result	Notes
Metal	AMD Radeon Pro Vega 56	macOS	✅
Metal	Apple M1	macOS	✅
Metal	AMD Radeon RX 570	macOS	✅
Metal	AMD Radeon Pro 560	macOS	✅
Metal	Intel(R) HD Graphics 630	macOS	❓	Cannot create a buffer big enough to test
OpenCL	AMD Radeon Pro Vega 56	macOS	✅
OpenCL	Apple M1	macOS	✅
OpenCL	AMD Radeon RX 570 Compute Engine	macOS	✅
OpenCL	AMD Radeon Pro 560 Compute Engine	macOS	✅
OpenCL	Intel(R) HD Graphics 630	macOS	❓	Cannot create a buffer big enough to test
D3D12	NVIDIA GeForce RTX 2080 Ti	Windows	❌✅	HLSL doesn't have any 1-byte types, so I can't test this on a buffer that's smaller than 4GB. Using a larger buffer, the debug layer outputs `Root descriptor access out of bounds (results undefined): ... Highest byte offset from view start accessed: [0xffffffff](max 32-bit offset or overflow)`. However, it produces the correct result.
OpenCL	NVIDIA GeForce RTX 2080 Ti	Windows	✅
OpenGL	NVIDIA GeForce RTX 2080 Ti/PCIe/SSE2	Windows	❌	`glLinkProgram()` fails with this error: `error C1068: array index out of bounds`
Vulkan	NVIDIA GeForce RTX 2080 Ti	Windows	✅⭐️
CUDA	NVIDIA GeForce RTX 2080 Ti	Windows	✅
D3D12	Radeon RX 560 Series	Windows	❌	Debug layer outputs same error as above, and does not produce the correct result.
OpenCL	Baffin	Windows	✅
OpenGL	Radeon RX 560 Series	Windows	❌	No error messages or debug output, but doesn't produce correct result
Vulkan	Radeon RX 560 Series	Windows	✅⭐️

Here are the test apps:

⭐: See above

dneto0 · 2021-07-13T16:10:47Z

@dneto0 Do you think you could take a look and help me understand what's going on?

The simple answer is that you've exercised unchecked undefined behaviour. This is akin to the "works on NVIDIA" problem.

dneto0 · 2021-07-13T16:17:08Z

I'll reassert that those large arrays should be allowed when the largest signed integer type can express their element count.

Here's a scenario:

we expand the pointer functionality in a future version (or extension) of WGSL
we allow pointer-difference within the same array
then what's the result type of that pointer diff, i.e. what corresponds to std::ptrdiff_t?
C++14 says that behaviour is undefined if the resulting difference overflows. (5.7 para 6)

dneto0 · 2021-07-13T17:43:38Z

Some more notes (possibly repeating what I've said verbally):

in the OpenCL cases, I think you're exercising implementations that support 64bit integers
in the D3D12 NVIDIA case, you're (i think) proving my point that, without 64bit ints, and without a byte type (that is addressible in memory), the use case is very niche. A 2byte scalar type with 2 giga-elements already spans a 4GB space. Beyond that, you're hitting validation errors. These are huge warning signs.

kdashg · 2021-07-13T19:02:39Z

WGSL meeting minutes 2021-07-13

DM: What if we have a u32(3B)?
DN: OOB access behavior
DM: What about i64(3B)?
DN: That would be fine
[what about large runtime sized arrays as a different type]
DN: I don’t have a proposal for that.
MM: I think this would be a design mistake
MM: I think we should accept i32/u32/i64, and if it’s a valid offset, it’ll work regardless of type.
(let’s let gears turn and come back to this)

litherum · 2021-07-13T20:09:14Z

I'd like to be clear here about what the proposal is. I think this is the same as @jdashg's proposal during the call today, but he can correct me if I'm mistaken.

There are 3 (or 4) 'overloads' of array accesses in WGSL: array[i32], array[u32], array[i64] (and possibly array[u64]).

On Vulkan:

For array[i32], the index is passed directly to OpAccessChain. It just works.
For array[i64], the index is also passed directly to OpAccessChain. It just works (because the only way for there to be an i64 in the shader is if the i64 extension is enabled, which would automatically trigger the Int64 SPIR-V "Capability").
For array[u64], the index is just casted to an i64 and behaves as above, because there is no buffer that's big enough to be able to tell the difference
For array[u32], there are 2 options:
- If the device doesn't support the Int64 SPIR-V "Capability", then the WebGPU device would set the maxUniformBufferBindingSize and maxStorageBufferBindingSize to at most 0x1FFFFFFFC (~8GB). This way, even if the author attaches the biggest possible binding to the smallest possible WGSL type, the highest valid index is 0x7FFFFFFF. So, the implementation can just simply cast the u32 to an i32 and it will be correct.
  - 8GB is a totally reasonable limit for these devices, because if they don't support the Int64 SPIR-V "Capability," they probably support less than 4GB of memory anyway. Also D3D12 already doesn't support any bindings larger than 4GB, so this probably isn't a problem.
- On the other hand, if the device does support the Int64 SPIR-V "Capability", then the implementation can choose to either take the above path and restrict maxUniformBufferBindingSize and maxStorageBufferBindingSize, or it could promote the u32 to a i64 internally before passing it to SPIR-V.
  - On devices with > 4GB of memory, promotion is almost certainly not a problem, because (almost?) all implementations would be doing this promotion under the hood anyway. In order to add the element byte offset to the array's base address, that addition will (almost?) always need to be done on 64-bit values, because the array's base address could totally be higher than UINT_MAX. If the WGSL -> SPIR-V compiler does the promotion, the SPIR-V -> hardware compiler doesn't have to, so there's no performance lost.
  - And, on devices with < 4GB memory, the max bindings size will already be less than 8GB, so nothing else needs to be done. The index is just casted to an i32 and used.

On D3D12, the largest valid index will already have to be 0x3FFFFFFF (1B) because of that "Highest byte offset from view start" error message above. Anything higher would end up with a byte offset higher than the max 32-bit byte offset. So, D3D12 can just cast the u32 to an i32 and it will be correct.

On Metal, the index is just passed directly into the index operation in the generated MSL (after a bounds check, of course). Everything just works.

kvark · 2021-07-13T22:56:33Z

@litherum thank you for writing this down!
I'm wondering though, if we consider the subject of this issue ("Buffer indices should be unsigned"), isn't this logic equally compatible with having unsigned-only indices? I.e. if we wanted to only have unsigned indices, it appears to me that this would work just as well. FWIW, Rust indexing is strictly unsigned, and it works fine :)

dneto0 · 2021-07-14T16:27:22Z

What, exactly is the proposed WGSL spec change?

https://gpuweb.github.io/gpuweb/wgsl/#array-access-expr allows both i32 and u32 as array indices.

Part of gpuweb#1135

- rename "array size" in many places to the more specific term "element count" - element count may be an unsigned integer literal, as per gpuweb#1135 - describe array type matching in the positive sense, and as an if-and-only-if set of rules. - update example to show array size with unsigned integer literal. - Reword array layout to make "element stride" a defined term, and separately write out how its value is determined.

kvark · 2021-07-14T22:42:49Z

@litherum @dneto0 just a reminder that Office Hours are not official meetings, and are not scribed.
I know this has been discussed, so if there is any change to the story, please write it down loud and clear for everybody.

litherum · 2021-07-14T22:50:37Z

During the office hours, we got to a point where (I think!!) we agreed on the following:

Indices into array access operations can be both signed or unsigned
Unsigned values > 2B do not get silently implicitly wrapped around to be negative
The type of the maxUniformBufferBindingSize and maxStorageBufferBindingSize limits in the API should be a 64-bit type instead of a 32-bit type
The type declaration of a sized array should be able to have its size parameter be a signed value or an unsigned value
The WGSL spec will not state any explicit limit which sized arrays' size parameter must be less than. Instead, such a limit would be device-specific and shader-specific. (And unqueryable, I think.) If authors want to know if their array size is too big, they can compile the shader to see if it compiled successfully.

kvark · 2021-07-14T22:59:51Z

Thank you! For clarity, this was an agreement between you and @dneto0 specifically, not a group consensus. But I'm sure it's very close to it.

If authors want to know if their array size is too big, they can compile the shader to see if it compiled successfully.

The limits we have apply to uniform and storage buffers. But what if I have a function or private class array that is big? Would shader compilation fail, and if so, how?

dneto0 · 2021-07-15T13:51:32Z

+1 thanks for the summary @litherum

Regarding status of the spec and necessary changes to implement:

already in the spec, Array Access Expression
regarding wrapping: I think there is no change here. First, there is no implicit conversion of the index type. Second, the out-of-array-bounds condition kicks in. For indexing into an array reference, you get an invalid memory reference. For indexing into an array value you get any valid value for the element type.
That was Use 64bit limits for max uniform/storage buffer binding sizes. #1941 (landed) and I hear it was just an oversight that it was originally only 32-bits.
Is Allow unsigned wgsl array sizes. #1942 (and incorporated into wgsl: array size may be module-scope constant (possibly overridable) #1792 )
Agree. To answer @kvark: I assert shader creation will fail or pipeline creation will fail. There is no reliable way to anticipate that because it's very dependent on the target machine and the software stack beneath the browser. This is an aspect of can pipeline creation fail in the driver compiler? particularly due to shader compilation failure #1872

kdashg · 2021-07-27T19:17:01Z

WGSL meeting minutes 2021-07-27

Allow unsigned wgsl array sizes. #1942
wgsl: array size may be module-scope constant (possibly overridable) #1792
JG: We reached agreement on accepting both signed and unsigned. We are confident that we can implement these in a way that works for all of our platforms.
JG: I wrote up 1942 so grammatically you could use a uint instead of an int. 1792 includes this and more. Maybe that’s what we actually want.
DN: 1792 didn’t reach consensus. I want to land yours instead.
DN: For 1792, there was a discussion of a more constrained feature
JG: Is there more to talk about here?
DN: No. We all agree here.
DN: I like 1942 that just updates the grammar. That’s the only change to WGSL that’s required.
AB: I want to get back to 1792 later.
RESOLVED: Accept Allow unsigned wgsl array sizes. #1942

kdashg · 2021-08-10T20:09:33Z

WGSL meeting minutes 2021-08-10

(are we done here?)
Yes

- rename "array size" in many places to the more specific term "element count" - element count may be an unsigned integer literal, as per gpuweb#1135 - describe array type matching in the positive sense, and as an if-and-only-if set of rules. - update example to show array size with unsigned integer literal. - Reword array layout to make "element stride" a defined term, and separately write out how its value is determined.

…le (#2031) * wgsl: array size may be overridable constant identifier It's still an i32; it was only ever an INT_LITERAL in the first place. Partially addressess #1431 (Later commits remove ability to override that constant) * Array size can be a module-scope constant Not just pipeline-overridable * Spell out when array types are different Also give examples. * Simplify IO-shareable Remove matrix, array, and nested structs Fixes: #1877 * Rename "sized array" to "fixed-size array" * Various updates and cleanups: - rename "array size" in many places to the more specific term "element count" - element count may be an unsigned integer literal, as per #1135 - describe array type matching in the positive sense, and as an if-and-only-if set of rules. - update example to show array size with unsigned integer literal. - Reword array layout to make "element stride" a defined term, and separately write out how its value is determined. * Array element count can't be overridable * An size array is host-shareable, even if element count is a module-scope constant * Fix the note about IO-shareable types, regarding bool * Remove stale (and incorrect) note. Change-Id: Ic68a52b994b42ef3e7b12d9eecd8ab6e83cb61eb * Relax element-count condition for array type equality The element-count condition for fixed-size array types depends only on element-count value. Change-Id: I7452375741160412071b5f0fe2e6e615264a4b11

The current interpolation tests will not attempt to validate an interpolation with just the type (e.g. `@interpolate(flat)`). This is due to the `,` always being appended so `@interpolate(flat, )` is generated which is not a valid value. This PR updates the generation to only add the `,` if there is a sampling value to be appended. Issue: gpuweb#1135

litherum added the wgsl WebGPU Shading Language Issues label Oct 6, 2020

grorg added this to Under Discussion in WGSL Oct 13, 2020

grorg moved this from Under Discussion to Resolved: Needs Specification Work in WGSL Oct 13, 2020

dj2 moved this from Resolved: Needs Specification Work to Under Discussion in WGSL Oct 20, 2020

kvark mentioned this issue Oct 21, 2020

Add maxStorageBufferBindingSize limit #1163

Merged

This was referenced Jul 14, 2021

Use 64bit limits for max uniform/storage buffer binding sizes. #1941

Merged

Allow unsigned wgsl array sizes. #1942

Merged

dneto0 added a commit to dneto0/gpuweb that referenced this issue Jul 14, 2021

wgsl: Array size may be unsigned literal

cc9f549

Part of gpuweb#1135

dneto0 mentioned this issue Jul 14, 2021

wgsl: array size may be module-scope constant (possibly overridable) #1792

Closed

kvark mentioned this issue Aug 4, 2021

WGSL spec is not clear on whether array size can be an unsigned literal #2012

Closed

kdashg closed this as completed Aug 10, 2021

WGSL automation moved this from Needs Decision to Done Aug 10, 2021

Buffer indices should be unsigned #1135

Buffer indices should be unsigned #1135

Comments

litherum commented Oct 6, 2020

grorg commented Oct 13, 2020

dneto0 commented Oct 19, 2020

dneto0 commented Oct 19, 2020

kvark commented Oct 21, 2020

litherum commented Jan 9, 2021

dneto0 commented Jan 26, 2021

dneto0 commented May 11, 2021

kvark commented May 11, 2021

dneto0 commented May 12, 2021

litherum commented May 12, 2021

litherum commented May 12, 2021 • edited Loading

kainino0x commented May 12, 2021 • edited Loading

litherum commented May 12, 2021 • edited Loading

kainino0x commented May 12, 2021

kainino0x commented May 12, 2021

litherum commented May 12, 2021 • edited Loading

kainino0x commented May 12, 2021

kdashg commented May 17, 2021

dneto0 commented May 18, 2021

Kangz commented May 19, 2021

alan-baker commented May 19, 2021

dneto0 commented May 19, 2021

dneto0 commented May 19, 2021

alan-baker commented May 19, 2021

dneto0 commented May 19, 2021

litherum commented Jun 23, 2021 • edited Loading

tex3d commented Jun 29, 2021

kdashg commented Jun 29, 2021

Kangz commented Jun 29, 2021

litherum commented Jul 10, 2021 • edited Loading

litherum commented Jul 10, 2021 • edited Loading

dneto0 commented Jul 13, 2021

dneto0 commented Jul 13, 2021

dneto0 commented Jul 13, 2021

kdashg commented Jul 13, 2021

litherum commented Jul 13, 2021 • edited Loading

kvark commented Jul 13, 2021

dneto0 commented Jul 14, 2021

kvark commented Jul 14, 2021

litherum commented Jul 14, 2021 • edited Loading

kvark commented Jul 14, 2021

dneto0 commented Jul 15, 2021

kdashg commented Jul 27, 2021

kdashg commented Aug 10, 2021

litherum commented May 12, 2021 •

edited

Loading

kainino0x commented May 12, 2021 •

edited

Loading

litherum commented May 12, 2021 •

edited

Loading

litherum commented May 12, 2021 •

edited

Loading

litherum commented Jun 23, 2021 •

edited

Loading

litherum commented Jul 10, 2021 •

edited

Loading

litherum commented Jul 10, 2021 •

edited

Loading

litherum commented Jul 13, 2021 •

edited

Loading

litherum commented Jul 14, 2021 •

edited

Loading