Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add minimumBufferSize to BGL entry #678

Merged
merged 5 commits into from Jun 15, 2020
Merged

Add minimumBufferSize to BGL entry #678

merged 5 commits into from Jun 15, 2020

Conversation

kvark
Copy link
Contributor

@kvark kvark commented Apr 1, 2020

This is a follow-up to #546 (comment) that is meant to start the discussion on how this issue can be addressed, and proposes a strawman solution :)

It's often the case where the buffers visible to a shader are structured, which means the shader sees it as:

type Particles = struct {
  [[offset 0]] particles : array<float, 10>;
  [[offset 40]] other_field: i32;
  [[offset 44]] unbound_array: array<i32, 0>;
};

This structure can be seen as 2 parts:

  1. the bounded portion: it has named fields, some could be fixed-size arrays.
  2. the unbound array of something, indexed dynamically.

This stawman proposal is to require the minimum size of (1) portion to be specified in BGL entry. Any buffer binding created for it has to specify the size that is ">=" that minimum size. When a pipeline is created, we can inspect the structured parts of the buffer bindings of the shader, and validate that the minimum buffer size covers the (1) portion.

What this allows us to do is basically avoiding the access address checks for named non-array fields of buffer bindings (as well as the fields of structures in a arrays). This means more efficient shader code being generated and ran on the GPU. Since uniform buffers are used a lot, having efficient loads from them is essential for performance (although, we don't have concrete benchmarks showing how much it would cost to do a check for every field access, yet).

What this limits the user to do is having a buffer that only partially covers a structured buffer binding. My feeling is that this is a fine compromise, and the users can always work around this by creating a slightly bigger buffer.

Some more details on how the buffer access rules would work in the shader:

  • when accessing arrays within a buffer, we check for index and consider the access to be OOB if the structure by that index doesn't fit into the binding (i.e. no partial structure access).
  • when accessing other fields (either top-level, or within an array by index that is in bounds), there need to be no checks done.

Preview | Diff

@kainino0x
Copy link
Contributor

Would we validate this against the shader at pipeline creation? Or would it just be a "hint" to allow elision of a runtime check?

If the former, we should mention that in the definition for the new field. (can spec it later when we have spec for create*Pipeline)

@kvark
Copy link
Contributor Author

kvark commented Apr 2, 2020

Yes, I described this here:

When a pipeline is created, we can inspect the structured parts of the buffer bindings of the shader, and validate that the minimum buffer size covers the (1) portion.

I agree more details should be provided in the documentation. Will be happy to follow up after the initial discussion, if the group sees the value in pursuing this direction.

@Kangz
Copy link
Contributor

Kangz commented Apr 3, 2020

Thanks for putting this up! This is an interesting proposal and would avoid the need to do runtime checks for this.

My biggest concern is that this reduces flexibility and requires another piece of data that applications were not used to providing. It's unclear how much harder it makes WebGPU to use (from not at all, to OMG I CAN'T PORT MY CONTENT). I think Jasper commented about this in the Matrix chatroom, and hacks that D3D developers were using to do unsized arrays in shaders.

Another small thing to consider is maybe asking for the size of an extra array element to be in the minimumBufferSize so that just clamping the array index is sufficient, since there's always valid data in the first element of the array.

@kvark
Copy link
Contributor Author

kvark commented Apr 3, 2020

@Kangz

I think Jasper commented about this in the Matrix chatroom, and hacks that D3D developers were using to do unsized arrays in shaders

IIRC, we agreed that this wasn't applicable to WGSL. If there is a concrete statement there, let's have it in this issue (or any other).

My biggest concern is that this reduces flexibility and requires another piece of data that applications were not used to providing.

We clearly require more data already, and we clearly require the existing content to go through some transformation (together with another round of testing) before it can ship. Providing an extra value here shouldn't be a problem unless there is a really tricky use-case where the same bindings was used with 2 different shader structures. I haven't seen this in practice, would be open to hear more from ISVs if that's something they intend to do.

Another small thing to consider is maybe asking for the size of an extra array element to be in the minimumBufferSize so that just clamping the array index is sufficient, since there's always valid data in the first element of the array.

It's a valid implementation behavior for this PR, but I don't think it needs to be specified or requires any changes to the PR, does it? The semantics of the "minimumBufferSize" is, well, that it's a lower bound. So the implementation can extract as much benefit as it can from knowing that ahead of time.

@kainino0x
Copy link
Contributor

Not in the PR yet, but wouldn't it change the validation for pipeline creation (when validating the minimumBufferSize against the shader code)?

@kvark
Copy link
Contributor Author

kvark commented Apr 3, 2020

I think it would still be exactly as described:

When a pipeline is created, we can inspect the structured parts of the buffer bindings of the shader, and validate that the minimum buffer size covers the (1) portion.

In other words, the validation only cares about the structured part being covered.

If we want to wiggle it, we also can: e.g. by removing all the validation here. If the user provided the minimum size - use it, otherwise - do something more complex. I do however think that just eliminating the complexity of access-guarding the structured fields as a class is most appealing, and it would require the validation as described.

@kainino0x
Copy link
Contributor

If I understand that and what @Kangz said correctly, this would validate the minimumBufferSize is at least 44, while @Kangz's option would require it to be 48: 44 + one unbounded array element (i32).

@kvark
Copy link
Contributor Author

kvark commented Apr 4, 2020

I asked why people care about clamping in #546 (comment) and got no responses, so I guess I'm asking about the same here. Why do we need to support clamping? Does it have any advantages over zeroing?

@kainino0x
Copy link
Contributor

Got it, and I'm not sure. We'll have to discuss it.

@litherum
Copy link
Contributor

litherum commented Apr 6, 2020

@Kangz said:

requires another piece of data that applications were not used to providing.

There might be a middle ground here, where applications can provide a minimum buffer size, but there are no requirements placed on what that size is. Applications which don’t know any minimum buffer sizes ahead of time can supply a value of “0” which would effectively mean “no minimum.”

When an implementation compiles a shader, these minimum buffer sizes would be provided as an additional input, and any accesses which lie outside of these minimum sizes would be guarded. That way, an application which knows its minimum size can supply it to gain additional speed, but an application which doesn’t know it would omit it, and all accesses would be guarded, reducing speed.

One tenant of this idea is that these minimum buffer size values would be supplied in createShaderModule() instead of createPipeline().

This means that if an application wants to use the same shader module in two different contexts, one where it knows the minimum buffer sizes and one where it doesn’t, the application would have to compile their shader module multiple times. The duplicate compilation makes sense, because the compilation process is the one which produces the OOB guards. The multiple compilations would happen explicitly at the author’s request, not invisibly under-the-hood. Or, if they don’t want to compile multiple times, they could just re-use the shader that includes all the guards, if they are willing to sacrifice shader runtime performance to improve shader compile-time performance.

@Kangz
Copy link
Contributor

Kangz commented Apr 7, 2020

Here's a quote from #33 that's an old issue talking about the same problem. It shows how to do robust access in SPIR-V assuming the following:

  • The top level of image views isn't empty.
  • Buffer views contain enough space for the "sized" part of the structures, and at least one element of the unsized part if present.

That is, the CPU-side validation ensures that the buffer is big enough, either through draw/dispatch-time validation or with a mechanism like minimumBufferSize. We discussed this briefly with @dneto0 and it doesn't seem very tractable to guard all memory accesses in case the CPU-side can't guarantee a minimum size. (plus I think that GPUs often prefetch uniform buffers?)

How about having minimumBufferSize but if it's not present we could do draw-time validation?

@kainino0x
Copy link
Contributor

I'd prefer to start by requiring minimumBufferSize, and consider relaxing it later (making it optional) later, as a nonbreaking change, if early users find that it's necessary.

Given that we can't guard accesses to structured members, I think that a relaxed minimumBufferSize should be either present and large enough for the struct (validated at pipeline creation time), or not present, as I don't think anything in between is useful.

I'm still not sure whether providing space for one element of unsized arrays is needed or not.

@RafaelCintron
Copy link
Contributor

What validation will we do to ensure that minimumBufferSize is a value for which we can actually skip out-of-bounds checks? If we can validate it, why can't we determine the value ourselves?

If we're taking untrusted values from the user and using it to skip out-of-bounds checks, that seems like a bad thing.

@kainino0x
Copy link
Contributor

minimumBufferSize is added to the BindGroupLayout, but it can't be computed until Pipeline (or maybe ShaderModule) creation. We would validate the the two are compatible as part of the Pipeline-PipelineLayout matching rules.

@kdashg
Copy link
Contributor

kdashg commented Apr 11, 2020

Well-formed content won't bind too-small buffers though, will it?

What if we failed operations if the buffers are smaller than the size of part (1)?
This would treat all content as if minimumBufferSize were always size-of-part-1.

Is there a use case for minimumBufferSize smaller or larger than size-of-part-1 that I'm not seeing?

@kainino0x
Copy link
Contributor

This change lets us do that validation at bind group creation time, which means at draw time we only have to validate the layout of the pipeline against the layouts of the bound bind groups (which we have to do anyway)

@kainino0x kainino0x added this to Needs Discussion in Main Apr 14, 2020
@dneto0
Copy link
Contributor

dneto0 commented Apr 23, 2020

I'm in the middle of catching up on this.

First, I strongly agree that the buffer size should at least cover the fixed portion of the arg-buffer. The (1) part. I can't think of a good use case. It would also add complexity to the index-clamping transforms, I think.

I asked why people care about clamping in #546 (comment) and got no responses, so I guess I'm asking about the same here. Why do we need to support clamping? Does it have any advantages over zeroing?

When clamping, the pattern is to still do the access after computing the clamped address:

Before:

    buf[i]

After:

    buf[clamp(0,i,buf.length())]

As @litherum noted in #546 (comment):

Clamping for OOB accesses doesn't work for zero-sized buffers.

If there is no backing store for the 0 element of the buffer, then it's an out of bounds access. I think it's a poor tradeoff to have to inject extra code to deliberately skip the access:

    if (buf.length() > 0) {   
          buf[clamp(0,i,buf.length())];
    }

When you say "zeroing" do you mean having the implementation automatically expand the underlying storage, I think to include at least one element of the unsized array (2), and filling it with zero?

From the perspective of injection of index-clamping shader transform, that's fine. But does it break a contract with the programmer? When they specified a buffer that implies a zero-sized runtime-array, do they mind that the actual length reported in the shader is more than zero? (GLSL .length() method, or SPIR-V OpArrayLength instruction)

@kainino0x
Copy link
Contributor

kainino0x commented Apr 23, 2020

When you say "zeroing" do you mean having the implementation automatically expand the underlying storage, I think to include at least one element of the unsized array (2), and filling it with zero?

I'm pretty sure "zeroing" meant (i + sizeof(buf[i]) <= buf.length()) ? buf[i] : 0.

[EDIT: fixed -- wait still not fixed -- ok now fixed]

@dneto0
Copy link
Contributor

dneto0 commented Jun 1, 2020

I'm pretty sure "zeroing" meant (i + sizeof(buf[i]) <= buf.length()) ? buf[i] : 0.

That introduces control flow at every memory access. It seems like a lot for good code to pay for the bad cases.

@kvark
Copy link
Contributor Author

kvark commented Jun 1, 2020

I thought we could mask out the OOB access based on the index, but now I realize that accessing itself is UB in Metal (for example), so branching is necessary, and I agree with @dneto0 's concern about it being heavy...

@kvark
Copy link
Contributor Author

kvark commented Jun 1, 2020

For the OpArrayLength, we need to talk about how it's implemented on MSL and HLSL/DXIL (and the respected backends). If it needs to be emulated anyway, we might as well emulate it everywhere to have full control over what length the shader sees.

@litherum
Copy link
Contributor

litherum commented Jun 8, 2020

Discussed on the June 8th 2020 call

@kainino0x
Copy link
Contributor

kainino0x commented Jun 8, 2020

Resolved: add minimumBufferSize but make it optional. If specified it must be >= the size required by the shader code. TBD whether unspecified defaults to 0 or undefined.

Default GPUPipelineLayout is generated with BGLs with minimumBufferSize.

@kainino0x kainino0x moved this from Needs Discussion to For Editors in Main Jun 8, 2020
@JusSn JusSn closed this Jun 8, 2020
@JusSn JusSn reopened this Jun 8, 2020
@kainino0x kainino0x moved this from For Editors to Needs Discussion in Main Jun 8, 2020
@kainino0x
Copy link
Contributor

kainino0x commented Jun 8, 2020

We discussed this in the editors meeting but had no strong feelings. I have a slight preference for 0 based on the following:

Argument in favor of 0: Semantically, it really does mean "the minimum buffer size, for a buffer in a bind group with this bind group layout, is 0." So in that sense 0 is not being used as a sentinel value but as a real one, in the interface between GPUBindGroupLayout and GPUBindGroup.

Argument against 0 and in favor of undefined: This is a special case because it implies the validation occurs at draw time (since it can't happen at createBindGroup time).

Argument against undefined: That's not exactly a special case involving GPUBindGroupLayout, but instead a special case of interaction between the bound bind groups and the bound pipeline (if the bound buffers aren't large enough for the pipeline - which, as it so happens, can only occur if minimumBufferSize was zero).

@litherum, thoughts?


EDIT: with #851 (comment) I might prefer a sentinel value instead, since we'd use one across all three of these dictionary members

@kvark kvark requested review from kainino0x and JusSn June 8, 2020 22:39
@kvark
Copy link
Contributor Author

kvark commented Jun 8, 2020

This is now rebased and updated, also linking to #851

spec/index.bs Outdated Show resolved Hide resolved
spec/index.bs Show resolved Hide resolved
spec/index.bs Outdated Show resolved Hide resolved
spec/index.bs Outdated Show resolved Hide resolved
spec/index.bs Outdated Show resolved Hide resolved
spec/index.bs Outdated Show resolved Hide resolved
Copy link
Contributor

@JusSn JusSn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. some grammar nits.

kvark and others added 2 commits June 15, 2020 17:15
Co-authored-by: Justin Fan <jussnf@gmail.com>
Co-authored-by: Justin Fan <jussnf@gmail.com>
@kvark kvark merged commit 0e437f2 into gpuweb:master Jun 15, 2020
@kvark kvark deleted the min-buf-size branch June 15, 2020 21:27
bors bot added a commit to gfx-rs/wgpu that referenced this pull request Jun 17, 2020
726: Basic support for minBufferBindingSize r=cwfitzgerald a=kvark

**Connections**
Has basic (partial) implementation of gpuweb/gpuweb#678
wgpu-rs update - gfx-rs/wgpu-rs#377

**Description**
This change allows users to optionally specify the expected minimum binding size for buffers. We are then validating this against both the pipelines and bind groups.
If it's not provided, we'll need to validate at draw time - this PR doesn't do this (focus on API changes first).
It also moves out the `read_spirv`, since wgpu-types wasn't the right home for it ever.

**Testing**
Tested on wgpu-rs examples

Co-authored-by: Dzmitry Malyshau <kvarkus@gmail.com>
@kainino0x kainino0x moved this from Needs Discussion to Needs Testing in Main Jul 13, 2020
@kainino0x kainino0x moved this from Needs CTS Issue to Specification Done in Main Jan 15, 2022
ben-clayton pushed a commit to ben-clayton/gpuweb that referenced this pull request Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Main
Specification Done
Development

Successfully merging this pull request may close these issues.

None yet

8 participants