Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2nd, 3rd, and 4th components of sampled depth/stencil textures are undefined in Metal #1198

Closed
austinEng opened this issue Oct 30, 2020 · 53 comments · Fixed by #1978
Closed
Assignees
Labels
wgsl WebGPU Shading Language Issues
Projects
Milestone

Comments

@austinEng
Copy link
Contributor

austinEng commented Oct 30, 2020

In WGSL, performing a textureSample or textureLoad operation on a texture_sampled_2d<type> yields a vec4<type>. This is true even for textures that have fewer than 4 components ex.) r8uint, r16float, rg8unorm. It is possible to bind a texture with n components to texture_sampled_2d<type> as long as the texture's component type matches the type in the shader. You can bind a texture of type depth32float to texture_sampled_2d<float>.

Sampling the "unspecified" components yields default values. i.e. accessing the b component of an rg texture yields some default value.

According to the Metal Shading Language Specification (EDIT:) 2.3 section 6.10 / 2.4 section 6.12:

For unspecified color components in a pixel format, the default values are:
• zero, for components other than alpha.
• one, for the alpha component.
...
...
For a texture with depth or stencil pixel format
(such as MTLPixelFormatDepth24Unorm_Stencil8 or MTLPixelFormatStencil8), the
default value for an unspecified component is undefined

On the contrary, in all of the other APIs (Vulkan, D3D12, OpenGL too), when sampling a depth texture (it has 1 component), accessing the "unspecified" components .g, .b, and .a yield 0, 0, and 1, respectively.

In recent versions of MacOS (10.15+) and iOS (13.0+), MTLTextureSwizzleChannels can be set on texture view creation to force the unspecified components to some component index, or 0, or 1. Unfortunately, on older Apple platforms, mitigating the undefined values in Metal would involve binding extra data to the shader to indicate whether at draw-time a texture is a depth/stencil texture in order to emulate correct values for the unspecified components. We could also recompile pipelines and pass the data as MTLFunctionConstants, but that may be a worse solution.

If we want to avoid binding extra data per-sampled-texture or recompiling pipelines, an alternative solution could be to add even more binding types and WGSL types: sampled-depth-texture and sampled-stencil-texture. This way, a WGSL compiler knows at compile-time if a texture is a depth/stencil texture and can insert shader code to emulate the unspecified components.

Note: When I say "sampling a depth texture" I mean sampling the values out of a depth texture bound to texture_sampled_2d<f32>. This is not the same as using texture_depth_2d which is for comparison sampling.

@austinEng
Copy link
Contributor Author

austinEng commented Oct 30, 2020

@litherum @grorg do you or the Metal team have more insight into this issue?

i.e. "the default value for an unspecified component is undefined". Does this mean it is actually undefined and may leak information from other GPU memory? Or does it simply mean that it may return a "safe" value that differs depending on the Metal implementation / texture compression?

@dneto0 dneto0 added the wgsl WebGPU Shading Language Issues label Oct 31, 2020
@dneto0 dneto0 added this to the MVP milestone Oct 31, 2020
@grorg grorg added this to Under Discussion in WGSL Nov 3, 2020
@kvark
Copy link
Contributor

kvark commented Nov 3, 2020

A strawman proposal: all depth & stencil textures have to be seen as "texture_depth_xx" by the shader (name up to bikeshed with regards to stencil presense). So WGSL -> MSL compiler can figure out that only one component is sampled, and replace the rest with 0, 0, 0, 1.

In addition, we can add stencilLoad() function specifically to get the stencil value, if any, from a depth texture. The extra benefit is that it can return u32 instead of a float (like depth textures do).

@kvark
Copy link
Contributor

kvark commented Nov 9, 2020

Talked about this with editors. Here is an expanded suggestion.
First, we modify this bit of the spec:

    // Texture is used with comparison sampling only.
    "depth-comparison"

It's already misleading for saying the textures can only be compare-sampled. I think we can instead do something like:

    // Textures of depth/stencil formats.
    "depth-stencil"

So this binding type would be the only binding you could have for a texture that has a depth/stencil format. The shader compiler would therefore know precisely when it's invalid to get the 2nd, 3rd, etc component from a texture.

For stencil, we could add a separate function like stencilLoad returning u32, that would be called on those depth/stencil textures only.

I believe these measures would fully address the problem @austinEng found.

@austinEng
Copy link
Contributor Author

#970 says that:

we have a situation that texture bindings that are used with comparison samplers have to be used only with comparison sampling

so we may also need a compile-time check that you don't use a texture with both a comparison-sampler and a non-comparison-sampler

@austinEng
Copy link
Contributor Author

extra data point: This turns out to also be a problem on D3D12 for some GPUs/drivers @RafaelCintron. See https://bugs.chromium.org/p/dawn/issues/detail?id=574

@kainino0x
Copy link
Contributor

kainino0x commented Nov 10, 2020

we have a situation that texture bindings that are used with comparison samplers have to be used only with comparison sampling

I'm pretty sure this is not actually true, see archaeology here: #970 (comment)

@kvark
Copy link
Contributor

kvark commented Nov 30, 2020

The latest buzz (which is actually quite old) here is that for stencil textures the users will need to create a view with "stencil" aspect and bind it as "uint" binding. Given that some platforms don't correctly sample 2nd/3rd/4th components, the WebGPU implementations would have to provide an extra bit of data (for each uint texture binding in the layout) saying whether it's for stencil or not. (credit goes to @Kangz for suggesting that). I think this is doable. @austinEng are there any concerns left unaddressed by this path?

@austinEng
Copy link
Contributor Author

I think such a workaround would work though it is a bit unfortunate given that it needs to be done for every uint texture, regardless of whether it's stencil or not. This seems fine though given that this behavior is wrong on D3D12 and should hopefully get fixed in drivers, and newer versions of MacOS can workaround the issue with swizzling.

One remaining clarification: Is it allowed to bind a depth32float texture or the depth aspect of a depth32float-stencil8 texture to an unfilterable float texture binding? It sounds like the answer is "no", and you have to use the newly renamed "depth" binding type. This is inconsistent with how stencil is sampled.

@kainino0x
Copy link
Contributor

it's inconsistent, yes, but depth is necessary while adding a new stencil binding point is more restrictive and isn't technically necessary since we can do this workaround instead. OTOH, I don't think it's a very problematic restriction to make the layout specify whether it's uint or stencil. I doubt many pipelines would care to be generic over those two texture types.

@austinEng
Copy link
Contributor Author

Well, depth is only necessary for comparison. For sampling as an unfilterable float, it's not actually necessary. Regardless, I'd also be okay with adding a stencil binding point. Implementations that don't have the issue with the 2nd, 3rd, 4th components could also deduplicate such pipelines if shader explosion becomes a problem.

@kainino0x
Copy link
Contributor

That's true, I guess we could allow binding depth to unfilterable-float with the same trick for the 2nd/3rd/4th components.

@Kangz
Copy link
Contributor

Kangz commented Dec 1, 2020

I thought depth was only to make sure that comparison samplers where used with depth textures, but there's nothing preventing us from just sampling depth as unfilterable-float (however Vulkan doesn't require that depth can be filtered and it seems unavailable on a bunch of Android devices, I'm not sure what OpenGL ES allows). Eventually we should have an optional features (flexibility1?) that allows using it with filterable-float.

Then stencil aspects can just be used as uint. The goal here is to not overcomplexify things because of missing things today: we can work around stuff and see that in the future the workaround won't be needed.

@kvark
Copy link
Contributor

kvark commented Dec 1, 2020

I felt like there was a reason to force us using "depth" bind point for depth textures, but I'm not able to recall/find it now.

@kainino0x
Copy link
Contributor

I think it was just the ,0,0,1 issue.

@kvark
Copy link
Contributor

kvark commented Dec 1, 2020

Well, it's still sort of an issue then. Supporting this edge case means we'll have to provide that extra bit of information on every float texture (as well as uint), and we'd need to have code in the shader to actually check for it and fill up the components. This means overhead on pretty much all the textures on affected platforms, which is undesirable.

@Kangz
Copy link
Contributor

Kangz commented Dec 1, 2020

IMHO as long as yzw is made of "safe" values and we know that eventually the data will be 0, 0, 1 on all platforms, I think this is fine. Unfortunately WebGPU cannot paper over every hardware difference, especially in its first version. However we can steer the ecosystem towards fixing this differences.

WebGL has a similar thing were some tests are not required to pass to be conformant, because of hardware differences or driver bugs.

@kvark
Copy link
Contributor

kvark commented Jun 2, 2021

I think this discussion got dropped but is unfinished. The concern is about the cost of ensuring this 0, 0, 1 on all "float" textures, given that they may actually be depth textures in disguise. If we allow depth to be bound as "float", then we'd have to 1) provide a bit of information for every texture, and 2) do the run-time check every time YZW are accessed on it.

This is both run-time and implementation time overhead (on Metal).
Notably, we still have to do it for stencil textures, but we expect most shaders to not sport any uint textures, so for most shaders this will not apply.

The alternative is to say that the depth format textures (with depth aspect) can only be bound as depth sample component. I think that would still allow people to do everything they need?

@Kangz
Copy link
Contributor

Kangz commented Jun 2, 2021

Another alternative is to eat the slight non-portability with text saying sampling depth/stencil textures in disguise SHOULD produce (X, 0, 0, 1). Then on all platforms that support texture swizzling make it happen, and over time the few macOS versions that don't support it will disappear, at which point we can change to MUST.

@austinEng
Copy link
Contributor Author

I though that one of the reasons we added texture_2d_depth to WGSL is because we were going to disallow using texture_2d<f32> with depth textures. If that's the case, then we don't have a problem here. Also why we have GPUTextureSampleType = "depth"

@kvark
Copy link
Contributor

kvark commented Jun 3, 2021

texture_2d_depth is a type-level thing that allows us to do comparison sampling in WGSL. The GPUTextureSampleType is accompanying this type, making sure we can check the actual texture type at BG and pipeline creation, as opposed to draw time.

@austinEng
Copy link
Contributor Author

Comparison sampling, but I thought it was also required for normal sampling of depth textures? If we have type information that it's a depth texture, then we can also prevent loading any component other than depth. In fact, sampling texture_depth_2d only returns a single float.

We determine whether it is used for comparison or not based on static analysis of whether it is used with sampler or sampler_comparison.

@litherum
Copy link
Contributor

litherum commented Jun 15, 2021

Texture swizzling is not compatible with MTLTextureUsageRenderTarget, and for most use cases of stencil textures you'll want it to be a render target. So I suppose option 4 doesn't really work.

@litherum
Copy link
Contributor

litherum commented Jun 16, 2021

@kvark for option (7), why a validation error? Why not just have the implementation set the yzw channels in this case?

With this modification, option (7) seems better than option (3). Compilers compiling from other languages to WGSL wouldn't have to care, since there would be no difference in the shading language. Then, when we compile WGSL to MSL, we'll do it with the pipeline layout, so we'll know which texture requires having its yzw channels clobbered. (It would be yet one more analysis where we have to figure out which texture operations occur on which texture, though.)

That being said, we're leaning toward option 5. There aren't many use cases of stencil buffers which require actually sampling from it in a shader. Most use cases just use it as a render target, which would continue to work. And, if we discover a use case, we can always add it in later.

@kainino0x
Copy link
Contributor

@kvark for option (7), why a validation error? Why not just have the implementation set the yzw channels in this case?

That would be a transformation in the shader, but the shader may have already been generated.

My very tentative preference is toward 5 now, and 7 later? (but maybe any of 1/3/6/7.) I would like 3/6 best, but I agree with the issue Myles raised in the meeting; they require WGSL generators to synthesize type data they don't normally have. 7 is a bit fancier than the analysis we already require for sampler/texture pairs, but seems feasible. (We would have to be strict and require that the yzw channels be discarded immediately, though; after that they immediately become very hard to track through the program.)

@kvark
Copy link
Contributor

kvark commented Jun 16, 2021

Not sure why I haven't considered this, but yes, @litherum , we should just make the generated shader to set the proper yzw channels. I think this makes (7) even better than I anticipated, and we should go with it.
Forbidding stencil reading entirely would be unfortunate, since it's something allowed in all of the APIs we target, and is not anything new. I worked on titles that used this in production, it's definitely expected to be available.

@kainino0x
Copy link
Contributor

Ah, I forgot this only affects Metal, where we always have the pipeline layout at shader generation time. Option 7b, I'll call it, sounds good to me. (Generate .xxxx or vec4(x,0,0,1) or whatever)

Slightly concerned well run into this behavior on other backends as well, but if that would indeed be out of spec, then we can just do workarounds.

@Kangz
Copy link
Contributor

Kangz commented Jun 16, 2021

Is 1) completely dismissed? This is a small corner case that developers are not very likely to run into (programs work on Metal today) that should resolve itself overtime. It would be unfortunate to make our binding model even more inflexible because of it. For example 2) 3) 6) and 7) prevent having pipelines agnostic at whether data comes from a stencil textures or an r8uint texture.

Texture swizzling is not compatible with MTLTextureUsageRenderTarget, and for most use cases of stencil textures you'll want it to be a render target. So I suppose option 4 doesn't really work.

Where do you see this? I can't find anything in the Metal docs.

@kvark
Copy link
Contributor

kvark commented Jun 16, 2021

@Kangz the amount of extra work for 7b is rather small, and I don't think the "having pipelines agnostic" argument is particularly strong, because they are already very limited for depth textures.
If @litherum comment about MTLTextureUsageRenderTarget is correct, then there isn't hope that this will magically be solved in time by itself.

@litherum
Copy link
Contributor

Where do you see this? I can't find anything in the Metal docs.

It’s caught by the validation layer. It’s probably in the docs somewhere, too.

@litherum
Copy link
Contributor

litherum commented Jun 16, 2021

I'm trying to figure out how the analysis would work for 7B. I think it's fundamentally a different kind of analysis that the one for uniformity or filterable textures.

For the uniformity analysis, you gather constraints. You determine all the constraints inside a function, see if any of them conflict, summarize those constraints at all the call sites of the function.

For filterable textures, you also gather constraints. If both the texture and sampler being used with it are global variables, then you can check for a conflict immediately. If either one of them is an function parameter, you then reform that constraint in terms of the calling function, and repeat this all the way to the entry point.

For the analysis in 7b, we're not gathering constraints. Instead, we're trying to figure out, at the site of every sample() and load() call, whether we should clobber the yzw channels of the result.

I think we've allowed authors to take pointers to textures, so one could write something like this:

fn foo(myTexture : ptr<texture_2d<i32>, ...>) {
    let result = textureSample(*myTexture, mySampler, vec2<f32>(0.5, 0.5));
    ....
}

foo(&myColorTexture);
foo(&myStencilTexture);

So, when we emit code for the foo function, should we emit code that clobbers the yzw channels of result?

Specializing the function seems unfortunate, because of the exponential blowup.

Another option would be to internally map WGSL's texture_2d type to a struct of a MSL texture and a bool, and add a branch. (I guess you could do it without a branch, if you use a mask instead of a bool, but the point still stands.) This kind of defeats the whole purpose of doing the analysis in the first place.

Another thing we could do is forbid WGSL authors from taking pointers to textures, but being able to do that is pretty useful from an author's perspective. So it would be a big shame to lose a large, useful language feature just because of this corner case that most authors aren't going to hit anyway.

Philosophically, the way you know what code to emit for a given expression is represented in the type system. So, I kind of feel like, if we wanted to statically know, at every call site to sample() or load(), what code to emit, that information has to be included in WGSL's type system.

@litherum
Copy link
Contributor

litherum commented Jun 16, 2021

Regarding option 1, the spec says "the default value for an unspecified component is undefined" (emphasis mine). So, if the WGSL author writes a program with this:

output = myArray[sample(myStencilTexture, ...).y];

The downstream MSL compiler would be totally within its rights to entirely eliminate the bounds check of the array access. Regardless of whether WGSL specifies what the yzw channels are, a compiler that produces MSL must inject some code here to make sure that the value being passed into the bounds check is defined.

But, it would be really bad if every call to sample() had to have this extra code, as it's super rare for authors to sample from stencil textures in shaders.

(In reality, there's no way for the MSL compiler to know whether or not the texture is a stencil texture or not, so it's probably safe. It's unsafe if you follow the letter of the law of C++, though.)

So, I think the conclusion is: the stencil-ness (or possibly "scalar-ness" re: option 6) needs to either be part of WGSL's type system, or we forbid stencil textures for ending up as resources in a shader.

@Kangz
Copy link
Contributor

Kangz commented Jun 17, 2021

@Kangz the amount of extra work for 7b is rather small, and I don't think the "having pipelines agnostic" argument is particularly strong, because they are already very limited for depth textures.

The argument is that we need to be pragmatic as to what to paper over and what to fix by adding inflexibility in the API. For depth textures we don't have a choice because MSL requires a different type be used. For sampling of stencil textures the only problem is undefined values for the .yzw components.

This is a minuscule portability issue and we're going to add more garbage to our already baroque binding model to fix it? It doesn't seem worth the effort when there are many other much bigger portability pitfalls in the API (the robust access and invalid draw behavior for example, but also just rasterization, filtering and viewport precision, etc). Let's be pragmatic and live this one alone, we collectively already spend too much time on it when we could have shrugged and moved on to more important things.

(In reality, there's no way for the MSL compiler to know whether or not the texture is a stencil texture or not, so it's probably safe. It's unsafe if you follow the letter of the law of C++, though.)

Yeah the compiler can't take advantage of that UB, so really it isn't a safety problem, just the tiny portability issue.

@kdashg
Copy link
Contributor

kdashg commented Jun 22, 2021

It's Complicated: https://llvm.org/docs/LangRef.html#undefined-values (NB: This is LLVM not C++ per se)

@dneto0
Copy link
Contributor

dneto0 commented Jun 25, 2021

It's Complicated: https://llvm.org/docs/LangRef.html#undefined-values (NB: This is LLVM not C++ per se)

I'd emphasize that distinction!

I looked through the C++14 spec.
It has a definition for "undefined", in [defns.undefined], which is for "undefined behaviour".
It uses the word "undefined" 247 times.
In the main body, undefined only refers to undefined behavior. C++ does not have the concept of an undefined value.

(There are some close calls.

First: 26.6.2.8 for the description of valarray::min

The value returned for an array of length 0 is undefined.

Second: For A::fetch_key

For address types, the result may be an undefined address, but the operations
otherwise have no undefined behavior.

)

So, that chimeric behaviour of LLVM "undef", where each time it's observed it may have a different value, is very much LLVM-specific.

Since WGSL can't know that a stencil value is bound to something being accessed, then the MSL compiler can't know. And so the LLVM "undef"-ness shouldn't kick in.

I'm with @Kangz on this one. Metal isn't even compatible with itself, so that's a strong argument to me there isn't a burning need to cement this particular corner in WebGPU. I think we should note the issue, and exhort Apple to fix Metal implementations.

@austinEng
Copy link
Contributor Author

If we choose to not do anything about working around this aspect of Metal, do we even need the texture_depth_2d type in WGSL? An earlier comment by @Kangz suggests it was added to also ensure that only depth textures are used with comparison samplers. However, I think the same could be achieved by inspecting the shader for usages of comparison samplers w/ textures, and then validating that the GPUBindGroupLayoutEntry for the texture has GPUTextureSampleType "depth". We already need to do this type of shader inspection for float/unfilterable-float.

@kvark
Copy link
Contributor

kvark commented Jun 25, 2021

Depth textures are a different league. They have different types in SPIR-V and MSL (versus float).

If you want to consider removing the binding type depth textures, you might as well remove the whole sampleType from the binding layout entirely. It's all in the same bin, see #851. I think it would make the API more fragile, since more classes of user errors would only show up at draw time. And it would increase the draw validation cost.

@kvark
Copy link
Contributor

kvark commented Jul 13, 2021

Echoing something I said on the call, the appeal of (5) is that we can have more time to introduce this nicely (possibly, as an optional feature). It's not hard-blocking user workflows, since they can do a copy from stencil to an R8Uint texture and sample from it. The amount of data transferred is just 1/4 of an RGBA blit, so it could be tolerable for applications going for it in the short term.

@kdashg
Copy link
Contributor

kdashg commented Jul 13, 2021

WGSL meeting minutes 2021-07-13
  • MM: (listed 3 main directions)
      1. Allow deviant behaviour for those components.
      • 1b. Allow two distinct behaviours (the ones that are observable today)
      1. Make a stencil texture in shader language.
      1. Don’t allow binding stencil textures like this (as a shader resource)
  • JG: Could constrain to a handful of behaviours. (like out of bounds)
  • AB: How do you know which texture type to do it on.
  • MM: Jeff is saying do it without any extra shader code. And running the code on all GPUs out there will result in one of two behaviours.
  • JB: Concern that “undefined behaviour” in practice is going to be stable / safe in the future.
  • DM: It’s compiler undef behaviour. The compiler just doesn’t know what kind of thing is bound.
  • JG: Push the guarantee upstream?
  • DM: It’s definable on non-Metal APIs. And for Metal it’s only a concern for old software versions (??)
  • MM: Texture swizzling is not compatible with render targets. And stencil are usual with render target. And don’t alias stenciling with other kinds of bindings.
  • MM: Depth textures are not a problem here because there’s a different WGSL type for them. It’s only for stencil textures. And depth texture is a different binding type in the API too.
  • JG: Think modern API should have sampling from stencil texture.
  • DN: option 2 is my least favorite.
  • DM: Banning (i.e. option 3), is not the end of the world. Can always copy the texture to an R8 texture and sample that.
  • (out of time, regroup for decision next week)

@kvark
Copy link
Contributor

kvark commented Jul 15, 2021

Interesting note from #1924 (comment) - the decision here is not just about Stencil, as we thought. It also affects depth textures visible to shaders as regular textures, which may be even a bigger deal than Stencil views (e.g. it's related to the ability to resolve MSAA depth).

@kainino0x
Copy link
Contributor

kainino0x commented Jul 19, 2021

Resolved (I think?): For both depth and stencil, just have the G/B/A channels have unspecified values, don't try to make this 100% portable. Same as WebGL. Try to tighten it up over time with newer devices.

@jimblandy
Copy link
Contributor

WGSL meeting notes:

(previously) MM: (listed 3 main directions)

  1. Allow deviant behaviour for those components.
    1b. Allow two distinct behaviours (the ones that are observable today)
  2. Make a stencil texture type in shader language.
  3. Don’t allow binding stencil textures like this (as a shader resource)

MM: Didn’t we resolve this in yesterday’s WebGPU meeting? We agreed to leave this unspecified.
AB: We’re okay with that.
DM: Arguably it’s not a shading language question
MM: Right, because the shading language has no concept of the format inside the texture, so it would be an API issue, instead of a shading language issue
JG: Prior discussion had been in this meeting, so I’m surprised it didn’t stay in this meeting

@kainino0x
Copy link
Contributor

FWIW I think it was in the WGSL agenda because of option 2:

  1. Make a stencil texture type in shader language.

@kvark kvark moved this from Needs Discussion to Resolved: Needs Specification Work in WGSL Jul 20, 2021
@kvark kvark removed this from Resolved: Needs Specification Work in WGSL Jul 20, 2021
@kvark kvark added this to Needs Specification in Main Jul 20, 2021
@kainino0x kainino0x moved this from Needs Specification to Specification Done in Main Jan 19, 2022
ben-clayton added a commit to ben-clayton/gpuweb that referenced this issue Sep 6, 2022
We were testing over 2 million cases. This was, unsurprisingly, slow.

Reduce down to 24,000 cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wgsl WebGPU Shading Language Issues
Projects
No open projects
Main
Specification Done
Development

Successfully merging a pull request may close this issue.

9 participants