Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Pack/Unpack for HLSL #2353

Merged
merged 21 commits into from
Jun 23, 2023
Merged

Implement Pack/Unpack for HLSL #2353

merged 21 commits into from
Jun 23, 2023

Conversation

Elabajaba
Copy link
Contributor

@Elabajaba Elabajaba commented May 25, 2023

These are currently untested, and I'm not sure how to do the signed versions.

edit: These have now all been tested against a re-implementation of the CTS tests here (set WGPU_BACKEND env variable to dx12 and run the naga-valid example, it'll print out any failed tests).

Status:

  • extractBits
  • insertBits
  • pack4x8snorm
  • pack4x8unorm
  • pack2x16snorm
  • pack2x16unorm
  • pack2x16float
  • unpack4x8snorm
  • unpack4x8unorm
  • unpack2x16snorm
  • unpack2x16unorm
  • unpack2x16float (was already implemented)

Related: #1449
Closes #1929

@Elabajaba
Copy link
Contributor Author

Elabajaba commented May 29, 2023

The HLSL output for what I've implemented is now valid (I converted a wgsl file containing all the functions to hlsl, then copied that into https://shader-playground.timjones.io/ to make sure that it'd compile with DXC and FXC), but I'm not sure how to test that the output is correct.

@teoxoy
Copy link
Member

teoxoy commented May 30, 2023

Please add HLSL here

Targets::SPIRV | Targets::METAL | Targets::GLSL | Targets::WGSL,

so that we have and check the HLSL snapshots. CI will only make sure the output is valid HLSL but not run the file.

but I'm not sure how to test that the output is correct.

We aren't currently testing behavior in naga, what I'd do is create a compute shader to test the behavior and open a PR on wgpu (that temporarily links to your rev of naga).

@Elabajaba
Copy link
Contributor Author

I hacked up a compute shader to test with (though I'm currently running it as an example and not a test, see the naga-valid example https://github.com/Elabajaba/wgpu/tree/mynagatest).

I've run into an issue where my pack4x8snorm and pack2x16snorm implementations are off by 1 compared to the spec when given -0.5 (for pack2x16snorm([-0.5, -0.5]), spec: 0xc001c001, mine: 0xc000c000), but I get the same off by 1 result when running it with Vulkan (Windows + AMD).

packing spec: https://github.com/gpuweb/cts/blob/main/src/unittests/conversion.spec.ts
unpacking spec: https://github.com/gpuweb/cts/blob/main/src/unittests/floating_point.spec.ts

@teoxoy
Copy link
Member

teoxoy commented Jun 1, 2023

The issue is that the WGSL spec defines the behavior of pack2x16snorm as follows:

Component e[i] of the input is converted to a 16-bit twos complement integer value ⌊ 0.5 + 32767 × min(1, max(-1, e[i])) ⌋ which is then placed in bits 16 × i through 16 × i + 15 of the result.

We need to floor instead of round which is the behavior of SPV and what the PR is currently implementing.

Having to polyfill this for SPV as well is unfortunate but I guess necessary since the rounding behavior might not always be the same across implementations of SPV.

The fraction 0.5 rounds in a direction chosen by the implementation, presumably the direction that is fastest.

@teoxoy
Copy link
Member

teoxoy commented Jun 1, 2023

I wonder how MSL behaves since they don't specify this in their spec.

src/back/hlsl/writer.rs Outdated Show resolved Hide resolved
@teoxoy
Copy link
Member

teoxoy commented Jun 1, 2023

I filed gpuweb/gpuweb#4159 to hopefully iron out the behavior. Maybe we can wait a bit and see how to proceed.

@Elabajaba
Copy link
Contributor Author

I could've saved so much time had I known about gpuweb/gpuweb#1189 (comment)...

@teoxoy
Copy link
Member

teoxoy commented Jun 2, 2023

I feel you 😅 - the spec repo usually contains a lot of info in PRs/issues since the decisions are taken based on what's in the APIs/shading languages it targets. I find myself using git log -S on the spec repo quite often.

@Elabajaba
Copy link
Contributor Author

insertBits is passing every test except the 32 count tests, and I'm not sure how to fix it.

insertBits(e: 0, newbits: 0b11111111111111111111111111111111, offset: 0, count: 32)
result:   0b00000000000000000000000000000000,
expected: 0b11111111111111111111111111111111

readable version of what I'm doing:

    let mask = ((((1u << count) - 1u) << offset) & 0xffffffffu);
    return = ((newbits << offset) & mask) | (e & ~mask);

@magcius
Copy link
Collaborator

magcius commented Jun 7, 2023

Could you do let mask = (0xFFFFFFFFu >> (32 - count)) << offset instead?

@Elabajaba
Copy link
Contributor Author

Elabajaba commented Jun 7, 2023

Could you do let mask = (0xFFFFFFFFu >> (32 - count)) << offset instead?

Fails the count = 0 tests :(

edit: Added a ternary to guard against count == 0, now it works.

@Elabajaba
Copy link
Contributor Author

The spec also calls for us to error on insertbits if we try and insert too much https://www.w3.org/TR/WGSL/#insertBits-builtin

If count + offset is greater than w (w is the bitwidth of T, where T is i32, u32, vecN<i32>, or vecN<u32>), then:

It is a shader-creation error if count and offset are const-expressions.

It is a pipeline-creation error if count and offset are override-expressions.

We don't currently do that for HLSL or Vulkan, and I don't know how to implement that.

@teoxoy
Copy link
Member

teoxoy commented Jun 7, 2023

We can take care of that validation later once we got const-expressions fully working.

src/back/hlsl/writer.rs Outdated Show resolved Hide resolved
@Elabajaba Elabajaba marked this pull request as ready for review June 8, 2023 04:27
Copy link
Member

@teoxoy teoxoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not yet looked at the behavior but still left some comments.

Also, it seems the resolution of gpuweb/gpuweb#4159 is that the rounding mode should have been unspecified so, we can use round (same as here gpuweb/gpuweb#1189 (comment)) instead of floor(0.5 + x) for the packing functions.

src/back/hlsl/writer.rs Outdated Show resolved Hide resolved
src/back/hlsl/writer.rs Outdated Show resolved Hide resolved
github-merge-queue bot pushed a commit to bevyengine/bevy that referenced this pull request Jun 18, 2023
![image](https://github.com/bevyengine/bevy/assets/47158642/dbb62645-f639-4f2b-b84b-26fd915c186d)

# Objective

- Add Screen space ambient occlusion (SSAO). SSAO approximates
small-scale, local occlusion of _indirect_ diffuse light between
objects. SSAO does not apply to direct lighting, such as point or
directional lights.
- This darkens creases, e.g. on staircases, and gives nice contact
shadows where objects meet, giving entities a more "grounded" feel.
- Closes #3632.

## Solution

- Implement the GTAO algorithm.
-
https://www.activision.com/cdn/research/Practical_Real_Time_Strategies_for_Accurate_Indirect_Occlusion_NEW%20VERSION_COLOR.pdf
-
https://blog.selfshadow.com/publications/s2016-shading-course/activision/s2016_pbs_activision_occlusion.pdf
- Source code heavily based on [Intel's
XeGTAO](https://github.com/GameTechDev/XeGTAO/blob/0d177ce06bfa642f64d8af4de1197ad1bcb862d4/Source/Rendering/Shaders/XeGTAO.hlsli).
- Add an SSAO bevy example.

## Algorithm Overview
* Run a depth and normal prepass
* Create downscaled mips of the depth texture (preprocess_depths pass)
* GTAO pass - for each pixel, take several random samples from the
depth+normal buffers, reconstruct world position, raytrace in screen
space to estimate occlusion. Rather then doing completely random samples
on a hemisphere, you choose random _slices_ of the hemisphere, and then
can analytically compute the full occlusion of that slice. Also compute
edges based on depth differences here.
* Spatial denoise pass - bilateral blur, using edge detection to not
blur over edges. This is the final SSAO result.
* Main pass - if SSAO exists, sample the SSAO texture, and set occlusion
to be the minimum of ssao/material occlusion. This then feeds into the
rest of the PBR shader as normal.

---

## Future Improvements
- Maybe remove the low quality preset for now (too noisy)
- WebGPU fallback (see below)
- Faster depth->world position (see reverted code)
- Bent normals 
- Try interleaved gradient noise or spatiotemporal blue noise
- Replace the spatial denoiser with a combined spatial+temporal denoiser
- Render at half resolution and use a bilateral upsample
- Better multibounce approximation
(https://drive.google.com/file/d/1SyagcEVplIm2KkRD3WQYSO9O0Iyi1hfy/view)

## Far-Future Performance Improvements
- F16 math (missing naga-wgsl support
https://github.com/gfx-rs/naga/issues/1884)
- Faster coordinate space conversion for normals
- Faster depth mipchain creation
(https://github.com/GPUOpen-Effects/FidelityFX-SPD) (wgpu/naga does not
currently support subgroup ops)
- Deinterleaved SSAO for better cache efficiency
(https://developer.nvidia.com/sites/default/files/akamai/gameworks/samples/DeinterleavedTexturing.pdf)

## Other Interesting Papers
- Visibility bitmask
(https://link.springer.com/article/10.1007/s00371-022-02703-y,
https://cdrinmatane.github.io/posts/cgspotlight-slides/)
- Screen space diffuse lighting
(https://github.com/Patapom/GodComplex/blob/master/Tests/TestHBIL/2018%20Mayaux%20-%20Horizon-Based%20Indirect%20Lighting%20(HBIL).pdf)

## Platform Support
* SSAO currently does not work on DirectX12 due to issues with wgpu and
naga:
  * gfx-rs/wgpu#3798
  * gfx-rs/naga#2353
* SSAO currently does not work on WebGPU because r16float is not a valid
storage texture format
https://gpuweb.github.io/gpuweb/wgsl/#storage-texel-formats. We can fix
this with a fallback to r32float.

---

## Changelog

- Added ScreenSpaceAmbientOcclusionSettings,
ScreenSpaceAmbientOcclusionQualityLevel, and
ScreenSpaceAmbientOcclusionBundle

---------

Co-authored-by: IceSentry <c.giguere42@gmail.com>
Co-authored-by: IceSentry <IceSentry@users.noreply.github.com>
Co-authored-by: Daniel Chia <danstryder@gmail.com>
Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com>
Co-authored-by: Robert Swain <robert.swain@gmail.com>
Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
Co-authored-by: Brandon Dyer <brandondyer64@gmail.com>
Co-authored-by: Edgar Geier <geieredgar@gmail.com>
Co-authored-by: Nicola Papale <nicopap@users.noreply.github.com>
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
NoahShomette pushed a commit to NoahShomette/bevy that referenced this pull request Jun 19, 2023
![image](https://github.com/bevyengine/bevy/assets/47158642/dbb62645-f639-4f2b-b84b-26fd915c186d)

# Objective

- Add Screen space ambient occlusion (SSAO). SSAO approximates
small-scale, local occlusion of _indirect_ diffuse light between
objects. SSAO does not apply to direct lighting, such as point or
directional lights.
- This darkens creases, e.g. on staircases, and gives nice contact
shadows where objects meet, giving entities a more "grounded" feel.
- Closes bevyengine#3632.

## Solution

- Implement the GTAO algorithm.
-
https://www.activision.com/cdn/research/Practical_Real_Time_Strategies_for_Accurate_Indirect_Occlusion_NEW%20VERSION_COLOR.pdf
-
https://blog.selfshadow.com/publications/s2016-shading-course/activision/s2016_pbs_activision_occlusion.pdf
- Source code heavily based on [Intel's
XeGTAO](https://github.com/GameTechDev/XeGTAO/blob/0d177ce06bfa642f64d8af4de1197ad1bcb862d4/Source/Rendering/Shaders/XeGTAO.hlsli).
- Add an SSAO bevy example.

## Algorithm Overview
* Run a depth and normal prepass
* Create downscaled mips of the depth texture (preprocess_depths pass)
* GTAO pass - for each pixel, take several random samples from the
depth+normal buffers, reconstruct world position, raytrace in screen
space to estimate occlusion. Rather then doing completely random samples
on a hemisphere, you choose random _slices_ of the hemisphere, and then
can analytically compute the full occlusion of that slice. Also compute
edges based on depth differences here.
* Spatial denoise pass - bilateral blur, using edge detection to not
blur over edges. This is the final SSAO result.
* Main pass - if SSAO exists, sample the SSAO texture, and set occlusion
to be the minimum of ssao/material occlusion. This then feeds into the
rest of the PBR shader as normal.

---

## Future Improvements
- Maybe remove the low quality preset for now (too noisy)
- WebGPU fallback (see below)
- Faster depth->world position (see reverted code)
- Bent normals 
- Try interleaved gradient noise or spatiotemporal blue noise
- Replace the spatial denoiser with a combined spatial+temporal denoiser
- Render at half resolution and use a bilateral upsample
- Better multibounce approximation
(https://drive.google.com/file/d/1SyagcEVplIm2KkRD3WQYSO9O0Iyi1hfy/view)

## Far-Future Performance Improvements
- F16 math (missing naga-wgsl support
https://github.com/gfx-rs/naga/issues/1884)
- Faster coordinate space conversion for normals
- Faster depth mipchain creation
(https://github.com/GPUOpen-Effects/FidelityFX-SPD) (wgpu/naga does not
currently support subgroup ops)
- Deinterleaved SSAO for better cache efficiency
(https://developer.nvidia.com/sites/default/files/akamai/gameworks/samples/DeinterleavedTexturing.pdf)

## Other Interesting Papers
- Visibility bitmask
(https://link.springer.com/article/10.1007/s00371-022-02703-y,
https://cdrinmatane.github.io/posts/cgspotlight-slides/)
- Screen space diffuse lighting
(https://github.com/Patapom/GodComplex/blob/master/Tests/TestHBIL/2018%20Mayaux%20-%20Horizon-Based%20Indirect%20Lighting%20(HBIL).pdf)

## Platform Support
* SSAO currently does not work on DirectX12 due to issues with wgpu and
naga:
  * gfx-rs/wgpu#3798
  * gfx-rs/naga#2353
* SSAO currently does not work on WebGPU because r16float is not a valid
storage texture format
https://gpuweb.github.io/gpuweb/wgsl/#storage-texel-formats. We can fix
this with a fallback to r32float.

---

## Changelog

- Added ScreenSpaceAmbientOcclusionSettings,
ScreenSpaceAmbientOcclusionQualityLevel, and
ScreenSpaceAmbientOcclusionBundle

---------

Co-authored-by: IceSentry <c.giguere42@gmail.com>
Co-authored-by: IceSentry <IceSentry@users.noreply.github.com>
Co-authored-by: Daniel Chia <danstryder@gmail.com>
Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com>
Co-authored-by: Robert Swain <robert.swain@gmail.com>
Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
Co-authored-by: Brandon Dyer <brandondyer64@gmail.com>
Co-authored-by: Edgar Geier <geieredgar@gmail.com>
Co-authored-by: Nicola Papale <nicopap@users.noreply.github.com>
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
james7132 pushed a commit to james7132/bevy that referenced this pull request Jun 19, 2023
![image](https://github.com/bevyengine/bevy/assets/47158642/dbb62645-f639-4f2b-b84b-26fd915c186d)

# Objective

- Add Screen space ambient occlusion (SSAO). SSAO approximates
small-scale, local occlusion of _indirect_ diffuse light between
objects. SSAO does not apply to direct lighting, such as point or
directional lights.
- This darkens creases, e.g. on staircases, and gives nice contact
shadows where objects meet, giving entities a more "grounded" feel.
- Closes bevyengine#3632.

## Solution

- Implement the GTAO algorithm.
-
https://www.activision.com/cdn/research/Practical_Real_Time_Strategies_for_Accurate_Indirect_Occlusion_NEW%20VERSION_COLOR.pdf
-
https://blog.selfshadow.com/publications/s2016-shading-course/activision/s2016_pbs_activision_occlusion.pdf
- Source code heavily based on [Intel's
XeGTAO](https://github.com/GameTechDev/XeGTAO/blob/0d177ce06bfa642f64d8af4de1197ad1bcb862d4/Source/Rendering/Shaders/XeGTAO.hlsli).
- Add an SSAO bevy example.

## Algorithm Overview
* Run a depth and normal prepass
* Create downscaled mips of the depth texture (preprocess_depths pass)
* GTAO pass - for each pixel, take several random samples from the
depth+normal buffers, reconstruct world position, raytrace in screen
space to estimate occlusion. Rather then doing completely random samples
on a hemisphere, you choose random _slices_ of the hemisphere, and then
can analytically compute the full occlusion of that slice. Also compute
edges based on depth differences here.
* Spatial denoise pass - bilateral blur, using edge detection to not
blur over edges. This is the final SSAO result.
* Main pass - if SSAO exists, sample the SSAO texture, and set occlusion
to be the minimum of ssao/material occlusion. This then feeds into the
rest of the PBR shader as normal.

---

## Future Improvements
- Maybe remove the low quality preset for now (too noisy)
- WebGPU fallback (see below)
- Faster depth->world position (see reverted code)
- Bent normals 
- Try interleaved gradient noise or spatiotemporal blue noise
- Replace the spatial denoiser with a combined spatial+temporal denoiser
- Render at half resolution and use a bilateral upsample
- Better multibounce approximation
(https://drive.google.com/file/d/1SyagcEVplIm2KkRD3WQYSO9O0Iyi1hfy/view)

## Far-Future Performance Improvements
- F16 math (missing naga-wgsl support
https://github.com/gfx-rs/naga/issues/1884)
- Faster coordinate space conversion for normals
- Faster depth mipchain creation
(https://github.com/GPUOpen-Effects/FidelityFX-SPD) (wgpu/naga does not
currently support subgroup ops)
- Deinterleaved SSAO for better cache efficiency
(https://developer.nvidia.com/sites/default/files/akamai/gameworks/samples/DeinterleavedTexturing.pdf)

## Other Interesting Papers
- Visibility bitmask
(https://link.springer.com/article/10.1007/s00371-022-02703-y,
https://cdrinmatane.github.io/posts/cgspotlight-slides/)
- Screen space diffuse lighting
(https://github.com/Patapom/GodComplex/blob/master/Tests/TestHBIL/2018%20Mayaux%20-%20Horizon-Based%20Indirect%20Lighting%20(HBIL).pdf)

## Platform Support
* SSAO currently does not work on DirectX12 due to issues with wgpu and
naga:
  * gfx-rs/wgpu#3798
  * gfx-rs/naga#2353
* SSAO currently does not work on WebGPU because r16float is not a valid
storage texture format
https://gpuweb.github.io/gpuweb/wgsl/#storage-texel-formats. We can fix
this with a fallback to r32float.

---

## Changelog

- Added ScreenSpaceAmbientOcclusionSettings,
ScreenSpaceAmbientOcclusionQualityLevel, and
ScreenSpaceAmbientOcclusionBundle

---------

Co-authored-by: IceSentry <c.giguere42@gmail.com>
Co-authored-by: IceSentry <IceSentry@users.noreply.github.com>
Co-authored-by: Daniel Chia <danstryder@gmail.com>
Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com>
Co-authored-by: Robert Swain <robert.swain@gmail.com>
Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
Co-authored-by: Brandon Dyer <brandondyer64@gmail.com>
Co-authored-by: Edgar Geier <geieredgar@gmail.com>
Co-authored-by: Nicola Papale <nicopap@users.noreply.github.com>
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
Copy link
Member

@teoxoy teoxoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Left a few minor comments.

src/back/hlsl/writer.rs Outdated Show resolved Hide resolved
src/back/hlsl/writer.rs Outdated Show resolved Hide resolved
src/back/hlsl/writer.rs Outdated Show resolved Hide resolved
src/back/hlsl/writer.rs Outdated Show resolved Hide resolved
src/back/hlsl/writer.rs Outdated Show resolved Hide resolved
src/back/hlsl/writer.rs Outdated Show resolved Hide resolved
src/back/hlsl/writer.rs Outdated Show resolved Hide resolved
src/back/hlsl/writer.rs Outdated Show resolved Hide resolved
@teoxoy
Copy link
Member

teoxoy commented Jun 20, 2023

image

I wanted to point out that for insertBits and extractBits we are not capping the offset and count.
Considering the MSL and SPIR-V backends also require this it might be better for the WGSL frontend to inject those.

We'll also end up with a lot of duplication if we want to do it in-line as opposed to calling a function we injected previously (this is how we deal with constructors currently).

@teoxoy
Copy link
Member

teoxoy commented Jun 20, 2023

Note for a future reader (including myself): HLSL seems to use arithmetic right shifts for signed integers even if not specified in their docs - found this out by looking at the output of FXC and DXC.

Elabajaba and others added 5 commits June 20, 2023 16:58
Co-authored-by: Teodor Tanasoaia <28601907+teoxoy@users.noreply.github.com>
Co-authored-by: Teodor Tanasoaia <28601907+teoxoy@users.noreply.github.com>
Copy link
Member

@teoxoy teoxoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for implementing this, it looks great!✨

@teoxoy teoxoy merged commit adf1cca into gfx-rs:master Jun 23, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing extractBits and insertBits implementations for HLSL
3 participants