Skip to content

Commit

Permalink
Add an optional components that generates hierarchical Z-buffers, in
Browse files Browse the repository at this point in the history
preparation for GPU occlusion culling.

Two-phase occlusion culling [1], which is generally considered the
state-of-the-art occlusion culling technique. We already use two-phase
occlusion culling for meshlets, but we don't for other 3D objects.
Two-phase occlusion culling requires the construction of a *hierarchical
Z-buffer*. This patch implements an opt-in set of passes to generate
that and so is a step along the way to implementing two-phase occlusion
culling, alongside GPU frustum culling (bevyengine#12889).

This commit copies the hierarchical Z-buffer building code from meshlets
into `bevy_core_pipeline`. Adding the new `HierarchicalDepthBuffer`
component to a camera enables the feature. This code should be usable
as-is for third-party plugins that might want to implement two-phase
occlusion culling, but of course we would like to have two-phase
occlusion culling implemented directly in Bevy in the near future.

Two-phase occlusion culling will be implemented using the following
procedure:

1. Render all meshes that would have been visible in the previous frame
   to the depth buffer (with no fragment shader), using the previous
   frame's hierarchical Z-buffer, the previous frame's view matrix (cf.
   bevyengine#12902), and each model's previous view input uniform.

2. Downsample the Z-buffer to produce a hierarchical Z-buffer ("early",
   in the language of this patch).

3. Perform occlusion culling of all meshes against the Hi-Z buffer,
   using a screen space AABB test.

4. If a prepass is in use, render it now, using the occlusion culling
   results from (3). Note that if *only* a depth prepass is in use, then
   we can avoid rendering meshes that we rendered in phase (1), since
   they're already in the depth buffer.

5. Render main passes, using the occlusion culling results from (3).

6. Downsample the Z-buffer to produce a hierarchical Z-buffer again
   ("late", in the language of this patch). This readies the Z-buffer
   for step (1) of the next frame. It differs from the hierarchical
   Z-buffer produced in (2) because it includes meshes that weren't
   visible last frame, but became visible this frame.

This commit adds steps (1), (2), and (6) to the pipeline, when the
`HierarchicalDepthBuffer` component is present. It doesn't add step (3),
because step (3) depends on bevyengine#12889 which in turn depends on bevyengine#12773, and
both of those patches are still in review.

Unlike meshlets, we have to handle the case in which the depth buffer is
multisampled. This is the source of most of the extra complexity, since
we can't use the Vulkan extension [2] that allows us to easily resolve
multisampled depth buffers using the min operation.

At Jasmine's request, I haven't touched the meshlet code except to do
some very minor refactoring; the code is generally copied in.

[1]: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501

[2]: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkSubpassDescriptionDepthStencilResolveKHR.html
  • Loading branch information
pcwalton committed Apr 7, 2024
1 parent 6c485c8 commit fb0c8bd
Show file tree
Hide file tree
Showing 12 changed files with 1,008 additions and 91 deletions.
53 changes: 42 additions & 11 deletions crates/bevy_core_pipeline/src/core_3d/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,11 @@ pub mod graph {
#[derive(Debug, Hash, PartialEq, Eq, Clone, RenderLabel)]
pub enum Node3d {
MsaaWriteback,
OcclusionCullingDepthPrepass,
Prepass,
DeferredPrepass,
CopyDeferredLightingId,
EarlyDownsampleDepthBuffer,
EndPrepasses,
StartMainPass,
MainOpaquePass,
Expand All @@ -31,6 +33,7 @@ pub mod graph {
Fxaa,
Upscaling,
ContrastAdaptiveSharpening,
LateDownsampleDepthBuffer,
EndMainPassPostProcessing,
}
}
Expand Down Expand Up @@ -73,6 +76,7 @@ use nonmax::NonMaxU32;

use crate::{
core_3d::main_transmissive_pass_3d_node::MainTransmissivePass3dNode,
culling::HierarchicalDepthBuffer,
deferred::{
copy_lighting_id::CopyDeferredLightingIdNode, node::DeferredGBufferPrepassNode,
AlphaMask3dDeferred, Opaque3dDeferred, DEFERRED_LIGHTING_PASS_ID_FORMAT,
Expand Down Expand Up @@ -495,18 +499,27 @@ pub fn extract_camera_prepass_phase(
Has<NormalPrepass>,
Has<MotionVectorPrepass>,
Has<DeferredPrepass>,
Has<HierarchicalDepthBuffer>,
),
With<Camera3d>,
>,
>,
) {
for (entity, camera, depth_prepass, normal_prepass, motion_vector_prepass, deferred_prepass) in
cameras_3d.iter()
for (
entity,
camera,
depth_prepass,
normal_prepass,
motion_vector_prepass,
deferred_prepass,
hierarchical_depth_buffer,
) in cameras_3d.iter()
{
if camera.is_active {
let mut entity = commands.get_or_spawn(entity);

if depth_prepass || normal_prepass || motion_vector_prepass {
if depth_prepass || normal_prepass || motion_vector_prepass || hierarchical_depth_buffer
{
entity.insert((
BinnedRenderPhase::<Opaque3dPrepass>::default(),
BinnedRenderPhase::<AlphaMask3dPrepass>::default(),
Expand Down Expand Up @@ -542,7 +555,13 @@ pub fn prepare_core_3d_depth_textures(
msaa: Res<Msaa>,
render_device: Res<RenderDevice>,
views_3d: Query<
(Entity, &ExtractedCamera, Option<&DepthPrepass>, &Camera3d),
(
Entity,
&ExtractedCamera,
Has<DepthPrepass>,
Has<HierarchicalDepthBuffer>,
&Camera3d,
),
(
With<BinnedRenderPhase<Opaque3d>>,
With<BinnedRenderPhase<AlphaMask3d>>,
Expand All @@ -552,21 +571,25 @@ pub fn prepare_core_3d_depth_textures(
>,
) {
let mut render_target_usage = HashMap::default();
for (_, camera, depth_prepass, camera_3d) in &views_3d {
for (_, camera, depth_prepass, hierarchical_depth_buffer, camera_3d) in &views_3d {
// Default usage required to write to the depth texture
let mut usage: TextureUsages = camera_3d.depth_texture_usages.into();
if depth_prepass.is_some() {
// Required to read the output of the prepass
// Required to read the output of the prepass
if depth_prepass {
usage |= TextureUsages::COPY_SRC;
}
// Required to build a hierarchical Z-buffer
if hierarchical_depth_buffer {
usage |= TextureUsages::COPY_SRC | TextureUsages::TEXTURE_BINDING;
}
render_target_usage
.entry(camera.target.clone())
.and_modify(|u| *u |= usage)
.or_insert_with(|| usage);
}

let mut textures = HashMap::default();
for (entity, camera, _, camera_3d) in &views_3d {
for (entity, camera, _, _, camera_3d) in &views_3d {
let Some(physical_target_size) = camera.physical_target_size else {
continue;
};
Expand Down Expand Up @@ -730,6 +753,7 @@ pub fn prepare_prepass_textures(
Has<NormalPrepass>,
Has<MotionVectorPrepass>,
Has<DeferredPrepass>,
Has<HierarchicalDepthBuffer>,
),
Or<(
With<BinnedRenderPhase<Opaque3dPrepass>>,
Expand All @@ -744,8 +768,15 @@ pub fn prepare_prepass_textures(
let mut deferred_textures = HashMap::default();
let mut deferred_lighting_id_textures = HashMap::default();
let mut motion_vectors_textures = HashMap::default();
for (entity, camera, depth_prepass, normal_prepass, motion_vector_prepass, deferred_prepass) in
&views_3d
for (
entity,
camera,
depth_prepass,
normal_prepass,
motion_vector_prepass,
deferred_prepass,
hierarchical_depth_buffer,
) in &views_3d
{
let Some(physical_target_size) = camera.physical_target_size else {
continue;
Expand All @@ -757,7 +788,7 @@ pub fn prepare_prepass_textures(
height: physical_target_size.y,
};

let cached_depth_texture = depth_prepass.then(|| {
let cached_depth_texture = (depth_prepass || hierarchical_depth_buffer).then(|| {
depth_textures
.entry(camera.target.clone())
.or_insert_with(|| {
Expand Down
16 changes: 16 additions & 0 deletions crates/bevy_core_pipeline/src/culling/downsample_depth.wgsl
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#import bevy_core_pipeline::fullscreen_vertex_shader::FullscreenVertexOutput

@group(0) @binding(0) var input_depth: texture_2d<f32>;
@group(0) @binding(1) var samplr: sampler;

/// Performs a 2x2 downsample on a depth texture to generate the next mip level of a hierarchical depth buffer.

@fragment
fn downsample_depth(in: FullscreenVertexOutput) -> @location(0) vec4<f32> {
let depth_quad = textureGather(0, input_depth, samplr, in.uv);
let downsampled_depth = min(
min(depth_quad.x, depth_quad.y),
min(depth_quad.z, depth_quad.w),
);
return vec4(downsampled_depth, 0.0, 0.0, 0.0);
}

0 comments on commit fb0c8bd

Please sign in to comment.