Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

D3D11 and Vulkan depth clamp incorrectly without clip distance or geometry shading support #11399

Open
unknownbrackets opened this issue Sep 18, 2018 · 17 comments
Labels

Comments

@unknownbrackets
Copy link
Collaborator

unknownbrackets commented Sep 18, 2018

Based on #11260 and #11393, the behavior of Vulkan's depth clamping is a bit wrong.

Affected games:

On the PSP, you've got the following depth parameters:

  • Viewport: not the clip area, just the scale and offset for vertex z values.
  • Min z and max z: depth range to clip polygons to. Portions of triangles outside this range are not drawn.
  • Depth clamp: if enabled, depth outside [0, 65535.5) (after viewport) causes portions of triangles outside depth to flatten against the near/far plane. Otherwise, such triangles are culled.

Importantly: depth clamp always clamps to the full possible range of Z, regardless of viewport or clipping. Clipping happens after, so if you clip to a narrower range like [10000, 50000], then clamping does nothing useful.

Vulkan does not work the same way. Vulkan clamps depth to the depth clipping range. This is probably causing the depth clamp part of #11216, and also causes an issue in #11260, as noted in #11393.

The simple workaround in #11393 is just to disable clamping when both minz and maxz are set. The troublesome case is when only one direction is clamped, and the other is clipped.

When Vulkan doesn't support depth clamp, we use a "buffer" area around depth instead. This works, but only to a point: after the buffer is exhausted, depth clamp doesn't work. This causes #9545 in Direct3D 11 (and Vulkan without depth clamp.)

To use depth clamp successfully in this case, we'd need to:

  1. Always specify depth range as 0-65535 when clamping, even when one of minz/maxz are narrower.
  2. Continue setting depth clamp enable as we do.
  3. Make sure the depth rounding is getting the right values still.
  4. Clip some other way when depth is outside minz/maxz.
    4.1. Perhaps using two user-defined clip distances, minz and maxz.
    4.2. Perhaps using fragment discard (yuck.)

-[Unknown]

@unknownbrackets
Copy link
Collaborator Author

It's a shame GL_AMD_depth_clamp_separate is not more widely supported outside AMD. It seems like that would allow this to work perfectly. It seems like desktop GL 3.2+ clamping works the same as Vulkan.

I'm not sure if we could use a separate varying for depth - maybe it's an option but wouldn't that deform triangles AND break early z tests?

-[Unknown]

@unknownbrackets unknownbrackets changed the title Vulkan depth clamps incorrectly D3D11 and Vulkan depth clamp incorrectly Sep 21, 2018
@unknownbrackets
Copy link
Collaborator Author

unknownbrackets commented Sep 21, 2018

Added note above that Driver 76 is affected by this: it uses minz=00000, maxz=0012c, depth clamp=on.

We could change the heuristic to gstate.getDepthRangeMin() == 0 && gstate.getDepthRangeMax() == 65535 which would at least help that game, but probably trading bugs in other games (might be a good trade, not sure.)

OpenGL draws it wrong too, but because it doesn't even really handle minz/maxz at all (skips "accurate depth".) If we used depth clamping on desktop GL3+, it would have this same bug.

-[Unknown]

@unknownbrackets
Copy link
Collaborator Author

I guess the Right Way to implement this is using a geometry shader. It should happen before clipping, so we can clamp and let the rest be clipped. This of course wouldn't work on anything below ES 3.1, or in most cases ES 3.2.

In theory, we could generate up to 64 triangles per input. And more devices support geometry shaders than depth clamping, as far as I can tell...

The other benefit is, we could implement culling here, which might work more consistently..

-[Unknown]

@Anuskuss
Copy link
Contributor

Anuskuss commented Oct 7, 2019

I'm back with more Pursuit Force issues 😉 This time it only happens in D3D11 and Vulkan so I'm pretty sure this is the right issue to report that.

First of all, take a look at these gifs:

OpenGL

Vulkan

I'm holding the stick down and at the 23th frame, part of the helicopter (I'm assuming) covers the whole scene (minus the HUD).

GE dump: UCES00019.ppdmp at 0x09799B7C (1109/1348)

P.S. I can move this to #11216 if you want.

Edit: Welp, I reread the thread and it seems to be this again. Something between 06340bfa9..025e478d2 happend which fixed the yellow roadblock (and most importantly the Lost the Vulkan device! error) but did not fix this unfortunately.

@Anuskuss
Copy link
Contributor

Anuskuss commented Jan 31, 2020

@unknownbrackets FYI, the purposed workaround does fix my issue as well. Can we get that into master until a proper fix is created?

Edit: Just checked master, it's already done but with an OR instead? Why's that?

if (gstate.getDepthRangeMin() == 0 || gstate.getDepthRangeMax() == 65535) {

@unknownbrackets
Copy link
Collaborator Author

Well, && and || are both wrong, it's just a question of which is wrong in more cases. Since you're trying to fix a specific issue, it's obvious there are cases where the || is wrong, but won't know where the && is wrong without testing it out in a bunch of games.

I think if we changed master to &&, there are some games which would show a black screen instead of videos, have triangle cutouts in some scenes with missing background parts, etc. But I'm not sure the extent of problems it would cause.

To state it more clearly:

  • Games can essentially clamp either the low or high ends of Z (clamp min when minz=0, clamp max when maz=65535.)
  • Vulkan can either clamp both ends, or neither. It's not capable of only clamping one side like the PSP is.
  • || means, if we need to clamp either side, do it. It views clamping as the less risky operation.
  • && means, only clamp if both sides need it. It views clamping as the more risky operation.

The danger of not clamping something that needs to be clamped is that it won't be drawn or parts of it will not be. This could be missing UI elements, videos, objects, parts of scenery, etc. That's the danger of &&.

The danger of clamping incorrectly (the danger of ||) is that objects may be deformed. For example, if I'm drawing a tree, it may be squashed or something if it's clamped incorrectly. But they may also cover the screen like in your example. It's because the wrong side is being clamped too.

I think the only way to not be wrong is a geometry shader as mentioned above. That would do the clamping manually, working around Vulkan's lack of support.

-[Unknown]

@hrydgard
Copy link
Owner

hrydgard commented Feb 29, 2020

I'm not sure, but possibly https://github.com/KhronosGroup/Vulkan-Docs/blob/master/appendices/VK_EXT_depth_range_unrestricted.txt would let us do more of this in the fragment shader instead. It's unfortunately less widely supported than geometry shaders.

Geometry shader support is actually appearing on many recent mobile devices (and nearly all desktop devices), so might finally be worth implementing a path for that. I think we are seldomly vertex limited so even though geometry shaders are known to be pretty slow it might be okay.

@hrydgard
Copy link
Owner

hrydgard commented Nov 11, 2020

EDIT: I missed that you already posted this above unknown! Had completely forgot about it. Still, a good reminder!

It would in fact not deform triangles I believe, though yes of course, early z test breaks.

Old post:

I was recently made aware of the following trick, which lets us arbitrarily emulate clamping on any hardware that can write to Z in the fragment shader, which is pretty much everything we support,

https://stackoverflow.com/questions/5960757/how-to-emulate-gl-depth-clamp-nv

Since you say you're using OpenGL ES, but also mentioned trying to clamp gl_FragDepth, I'm assuming you're using OpenGL ES 2.0, so here's a shader trick:
You can emulate ARB_depth_clamp by using a separate varying for the z-component.
Vertex Shader:

varying float z;
void main()
{
    gl_Position = ftransform();

    // transform z to window coordinates
    z = gl_Position.z / gl_Position.w;
    z = (gl_DepthRange.diff * z + gl_DepthRange.near + gl_DepthRange.far) * 0.5;

    // prevent automatic z-clipping between vertex and pixel shader
    gl_Position.z = 0.0;
}

Fragment shader:

varying float z;
void main()
{
    gl_FragColor = vec4(vec3(z), 1.0);  // or whatever
    gl_FragDepth = clamp(z, 0.0, 1.0);  // can replace with only clamping in one direction, etc.
}

Now, writing to Z has the unfortunate side effect of turning off depth buffer optimization on a lot of hardware, but still better than rendering wrong. We should be able to use this to emulate everything the PSP does - although it won't be easy to get it exactly right.

@Anuskuss
Copy link
Contributor

@hrydgard Do you know why D11/VK is different from OGL in this case? Is it a limitation of VK and/or any different in D12? Solving this with shaders (or at all really) would be nice but what's the root cause here?

@unknownbrackets
Copy link
Collaborator Author

unknownbrackets commented Nov 14, 2020

It would in fact not deform triangles I believe, though yes of course, early z test breaks.

Depending on the values of z and w, for a non-flat triangle (i.e. where its vertices have different z values), wouldn't forcing z to a static (flat) value in the vertex shader deform the triangle? Well, maybe nothing after projection can matter for z?

hrydgard Do you know why D11/VK is different from OGL in this case? Is it a limitation of VK and/or any different in D12? Solving this with shaders (or at all really) would be nice but what's the root cause here?

We have two ways of handling depth: the original way, and the more accurate way. They both have problems. OpenGL uses the original way by default. The original way doesn't work in other APIs because they expect depth ranges to be strictly ordered.

In the original (GL) method:

  • We "apply" the depth viewport Z offset/scale using depth range.
    • GL actually clamps this range to [-1, 1]. This effectively clamps the viewport transform (which is weird and wrong, and both causes and works around other depth bugs.)
    • Other APIs fail if you send an invalid depth range, and won't flip it for you automatically. In theory, we could use the original depth path in other APIs by flipping depth in the shader only.
    • Luckily most games use a viewport that fits within valid depth, so this only causes weird problems in games or scenes that don't.
  • We ignore minz/maxz, which are supposed to clip the near and far.
  • We ignore the depth clamping flag and don't even try to apply it.

That implementation is wrong in a bunch of ways, but it accidentally gets some things right.

For example, imagine you and I were both trying to emulate "x = a * b". My formula for this ends up actually being "x = a * b + b". And your formula is the simpler "x = a + b".

Now, if a and b were both 2, I'd calculate 6. Darn it. And you, even though you think multiplication and addition are the same thing, would get 4 - the correct value. This might seem to prove to you that, indeed, multiplication and addition are identical operations. Instead, you just got lucky with the values of a and b.

If a and b were both 100, I'd get 10100, which is only 1% off. But you'd get 200, which is wildly wrong.

The fact is that games most often use values that both formulas get right, or close enough. And maybe just as many use ones that are wrong in the one as wrong in the other. It's annoying.

We could enable "accurate depth" on GL. However, we don't support depth clamping there (and many implementations don't either), so we use a fallback way of emulating depth clamping there, which is almost a third way. If you imagine depth as a line:

 \/ values clamped to [                           \/ clamp to ]
     [========================================]
     ^ 0.0                                    ^ 1.0

That's real PSP depth, which always performs any clamping, if enabled, to [0, 1] (something none of our graphics backends can do accurately.) Any value outside the bar gets clamped when enabled.

When we don't have native depth clamping, the workaround we use is this:

      \/ clamp to [                \/ clamp to ]
     ---------------[==========]---------------
 ^ clipped (lost)   ^ 0.0      ^ 1.0             ^ clipped (lost)

So we scale down the bar. We lose values outside the bar (we would no matter what, that's because native depth clamping is not available), but we reserve some of the hardware's bar to fake clamping.

We try our best to treat the values below 0.0 the same as 0.0, and the values above 1.0 the same as 1.0. Effectively, this means if a depth value is within 150% of the normal depth range, it'll "clamp" correctly. Values outside this range will be clipped, which is wrong.

Since I'm illustrating, what this issue (#11399) is about is how native clamping interacts with clipping. PSP clamping works like this:

\/ clip    \/ minz clip                             \/ clamp to ]
     [XXXXXX==================================]
     ^ 0.0                                    ^ 1.0

Basically, clamping doesn't matter on the side you enable clipping. It still clamps the other side. X here represents clipped values.

But here's what we end up with:

   \/ clamp to [                                  \/ clamp to ]
           [==================================]
     ^ 0.0                                    ^ 1.0

Oops. We're just clamping everything. There's no clipping happening at all. It works fine if both sides clip, but hardware depth clipping/clamping don't let us have both sides work in this scenario. We either clip both sides, or clamp both sides. The PSP lets games control those independently, effectively.

I'm pretty sure we could sacrifice early z tests (potentially on a lot of draws, which would probably motivate a compat ini flag) to dodge this, i.e. by clipping with discard or finagling w to avoid the deform. I do still think a geometry shader could avoid that, i.e. by producing triangles clipped properly, which could then safely use hardware depth clamp.

See #6660 for more detail. See #6660 (comment) specifically.

-[Unknown]

@hrydgard
Copy link
Owner

hrydgard commented Nov 15, 2020

That's a really great summary, thank you! I think I'm gonna copy paste it into GPUStateUtils.cpp or maybe a broken out DepthUtils.cpp one, if you don't mind?

The problem with geometry shaders is that on mobile it's only supported on the latest couple of generations, and ARM strongly recommends not using them. On the other hand our vertex counts are usually not that high compared to modern games so might not be an issue.

I actually think the mentioned technique might work without distorting triangles, without any further finagling. Let's think about how the coordinates of gl_Position are used.

  • First in the VS we apply the projection matrix, save Z in a varying and set gl_Position.z to 0.
  • After the VS, the GPU performs clipping. Since we messed with Z, it'll no longer clip to the Z=0 and Z=1 planes (Z=-1, 1 in GL), but the w=0 clip will still be intact I think, I so I don't think we'll have an issue with triangles crossing the w=0 plane.
  • Then, the X,Y and Z are divided by W and scaled/offset by the viewport. Note that us zapping Z did NOT affect the final X and Y screen coordinates, and the perspective varying interpolation only uses X, Y and W. Our Z is now 0, but that doesn't matter - in the fragment shader, we read our saved Z and recompute exactly what its value would have been. Here we can discard to simulate minz/maxz if we want or modify Z to clamp it before it's written to the Z buffer.

Would have to try it though to be sure :) And the use of discard would make it even worse on mobile chips so geoshaders might still win.

@unknownbrackets
Copy link
Collaborator Author

unknownbrackets commented Aug 14, 2021

I think that might work... just worried about the number of games that use depth clamp kinda all the time, and the early depth tests.

That said, I had another idea that might be able to preserve early Z tests and fix this issue: gl_ClipDistance (combined with depth clamp.)

So suppose:

  • Depth clamp is enabled, only on one side - let's say [0.5, 1.0].
  • Normally, this would disable clipping below 0.5, and is very detectable.
  • We set a uniform flag to indicate if zero-direction or one-direction user clipping is needed.
  • If the flag is set, we set gl_ClipDistance[0] to the z value, or 1.0 - z.
  • We already adjust the projection matrix for the depth range, so in theory 0.0 in our example is already 0.5.

For draws where early depth tests would apply, they should still apply (provided they still work with depth clamp enabled on the device.) This would just add an additional clip plane, effectively re-enabling Z clipping.

I think Direct3D 11 has this too for SV_ClipDistance (as a vertex out.)

Potentially we could combine this with the non-depth-clamp example described above, too. If we mess with z, we could still apply user clipping on one side to clip appropriately.

-[Unknown]

@hrydgard
Copy link
Owner

That's an interesting idea. Unfortunately, gl_ClipDistance is another thing that's not universally supported - according to vulkan.gpuinfo.org, ARM GPUs seems to have a maxClipDistances of 0.

Doesn't mean the idea isn't useful, of course - could still be an alternative on other GPUs.

@Anuskuss
Copy link
Contributor

I think this can be closed as of #14833.

@unknownbrackets
Copy link
Collaborator Author

unknownbrackets commented Oct 19, 2021

Well, that doesn't work everywhere, and also depth is still being clamped incorrectly. That change only alters some of the more visible effects of this, but the ASCII art I did above is still the case.

#14833 doesn't change the way depth clamp is handled, it just changes the behavior of guardband cull when clamping. Anything that was noted as a depth clamp issue (this issue) that was fixed by that was actually a guardband issue (or clipping issue, which in my mind is related to the guardband - but not strictly the clamping itself.)

-[Unknown]

unknownbrackets added a commit to unknownbrackets/ppsspp that referenced this issue Sep 18, 2022
Helps situations like hrydgard#11216, where only one side should be clamped.
Keeps depth clamp (i.e. hrydgard#7932) working.  See hrydgard#11399.
@ghost
Copy link

ghost commented Sep 21, 2022

How about this now after 3ff400e

@unknownbrackets
Copy link
Collaborator Author

unknownbrackets commented Sep 21, 2022

It should be fixed for devices that support clip distance, which not all do. There's also a note that Test Drive Unlimited is still having problems in #16069.

-[Unknown]

@unknownbrackets unknownbrackets changed the title D3D11 and Vulkan depth clamp incorrectly D3D11 and Vulkan depth clamp incorrectly without clip distance or geometry shading support Oct 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants