Add an optional components that generates hierarchical Z-buffers, in

preparation for GPU occlusion culling. Two-phase occlusion culling [1], which is generally considered the state-of-the-art occlusion culling technique. We already use two-phase occlusion culling for meshlets, but we don't for other 3D objects. Two-phase occlusion culling requires the construction of a *hierarchical Z-buffer*. This patch implements an opt-in set of passes to generate that and so is a step along the way to implementing two-phase occlusion culling, alongside GPU frustum culling (bevyengine#12889). This commit copies the hierarchical Z-buffer building code from meshlets into `bevy_core_pipeline`. Adding the new `HierarchicalDepthBuffer` component to a camera enables the feature. This code should be usable as-is for third-party plugins that might want to implement two-phase occlusion culling, but of course we would like to have two-phase occlusion culling implemented directly in Bevy in the near future. Two-phase occlusion culling will be implemented using the following procedure: 1. Render all meshes that would have been visible in the previous frame to the depth buffer (with no fragment shader), using the previous frame's hierarchical Z-buffer, the previous frame's view matrix (cf. bevyengine#12902), and each model's previous view input uniform. 2. Downsample the Z-buffer to produce a hierarchical Z-buffer ("early", in the language of this patch). 3. Perform occlusion culling of all meshes against the Hi-Z buffer, using a screen space AABB test. 4. If a prepass is in use, render it now, using the occlusion culling results from (3). Note that if *only* a depth prepass is in use, then we can avoid rendering meshes that we rendered in phase (1), since they're already in the depth buffer. 5. Render main passes, using the occlusion culling results from (3). 6. Downsample the Z-buffer to produce a hierarchical Z-buffer again ("late", in the language of this patch). This readies the Z-buffer for step (1) of the next frame. It differs from the hierarchical Z-buffer produced in (2) because it includes meshes that weren't visible last frame, but became visible this frame. This commit adds steps (1), (2), and (6) to the pipeline, when the `HierarchicalDepthBuffer` component is present. It doesn't add step (3), because step (3) depends on bevyengine#12889 which in turn depends on bevyengine#12773, and both of those patches are still in review. Unlike meshlets, we have to handle the case in which the depth buffer is multisampled. This is the source of most of the extra complexity, since we can't use the Vulkan extension [2] that allows us to easily resolve multisampled depth buffers using the min operation. At Jasmine's request, I haven't touched the meshlet code except to do some very minor refactoring; the code is generally copied in. [1]: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501 [2]: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkSubpassDescriptionDepthStencilResolveKHR.html
pcwalton · Apr 7, 2024 · fb0c8bd · fb0c8bd
1 parent 6c485c8
commit fb0c8bd
Show file tree

Hide file tree

Showing 12 changed files with 1,008 additions and 91 deletions.
diff --git a/crates/bevy_core_pipeline/src/core_3d/mod.rs b/crates/bevy_core_pipeline/src/core_3d/mod.rs
@@ -16,9 +16,11 @@ pub mod graph {
     #[derive(Debug, Hash, PartialEq, Eq, Clone, RenderLabel)]
     pub enum Node3d {
         MsaaWriteback,
+        OcclusionCullingDepthPrepass,
         Prepass,
         DeferredPrepass,
         CopyDeferredLightingId,
+        EarlyDownsampleDepthBuffer,
         EndPrepasses,
         StartMainPass,
         MainOpaquePass,
@@ -31,6 +33,7 @@ pub mod graph {
         Fxaa,
         Upscaling,
         ContrastAdaptiveSharpening,
+        LateDownsampleDepthBuffer,
         EndMainPassPostProcessing,
     }
 }
@@ -73,6 +76,7 @@ use nonmax::NonMaxU32;
 
 use crate::{
     core_3d::main_transmissive_pass_3d_node::MainTransmissivePass3dNode,
+    culling::HierarchicalDepthBuffer,
     deferred::{
         copy_lighting_id::CopyDeferredLightingIdNode, node::DeferredGBufferPrepassNode,
         AlphaMask3dDeferred, Opaque3dDeferred, DEFERRED_LIGHTING_PASS_ID_FORMAT,
@@ -495,18 +499,27 @@ pub fn extract_camera_prepass_phase(
                 Has<NormalPrepass>,
                 Has<MotionVectorPrepass>,
                 Has<DeferredPrepass>,
+                Has<HierarchicalDepthBuffer>,
             ),
             With<Camera3d>,
         >,
     >,
 ) {
-    for (entity, camera, depth_prepass, normal_prepass, motion_vector_prepass, deferred_prepass) in
-        cameras_3d.iter()
+    for (
+        entity,
+        camera,
+        depth_prepass,
+        normal_prepass,
+        motion_vector_prepass,
+        deferred_prepass,
+        hierarchical_depth_buffer,
+    ) in cameras_3d.iter()
     {
         if camera.is_active {
             let mut entity = commands.get_or_spawn(entity);
 
-            if depth_prepass || normal_prepass || motion_vector_prepass {
+            if depth_prepass || normal_prepass || motion_vector_prepass || hierarchical_depth_buffer
+            {
                 entity.insert((
                     BinnedRenderPhase::<Opaque3dPrepass>::default(),
                     BinnedRenderPhase::<AlphaMask3dPrepass>::default(),
@@ -542,7 +555,13 @@ pub fn prepare_core_3d_depth_textures(
     msaa: Res<Msaa>,
     render_device: Res<RenderDevice>,
     views_3d: Query<
-        (Entity, &ExtractedCamera, Option<&DepthPrepass>, &Camera3d),
+        (
+            Entity,
+            &ExtractedCamera,
+            Has<DepthPrepass>,
+            Has<HierarchicalDepthBuffer>,
+            &Camera3d,
+        ),
         (
             With<BinnedRenderPhase<Opaque3d>>,
             With<BinnedRenderPhase<AlphaMask3d>>,
@@ -552,21 +571,25 @@ pub fn prepare_core_3d_depth_textures(
     >,
 ) {
     let mut render_target_usage = HashMap::default();
-    for (_, camera, depth_prepass, camera_3d) in &views_3d {
+    for (_, camera, depth_prepass, hierarchical_depth_buffer, camera_3d) in &views_3d {
         // Default usage required to write to the depth texture
         let mut usage: TextureUsages = camera_3d.depth_texture_usages.into();
-        if depth_prepass.is_some() {
-            // Required to read the output of the prepass
+        // Required to read the output of the prepass
+        if depth_prepass {
             usage |= TextureUsages::COPY_SRC;
         }
+        // Required to build a hierarchical Z-buffer
+        if hierarchical_depth_buffer {
+            usage |= TextureUsages::COPY_SRC | TextureUsages::TEXTURE_BINDING;
+        }
         render_target_usage
             .entry(camera.target.clone())
             .and_modify(|u| *u |= usage)
             .or_insert_with(|| usage);
     }
 
     let mut textures = HashMap::default();
-    for (entity, camera, _, camera_3d) in &views_3d {
+    for (entity, camera, _, _, camera_3d) in &views_3d {
         let Some(physical_target_size) = camera.physical_target_size else {
             continue;
         };
@@ -730,6 +753,7 @@ pub fn prepare_prepass_textures(
             Has<NormalPrepass>,
             Has<MotionVectorPrepass>,
             Has<DeferredPrepass>,
+            Has<HierarchicalDepthBuffer>,
         ),
         Or<(
             With<BinnedRenderPhase<Opaque3dPrepass>>,
@@ -744,8 +768,15 @@ pub fn prepare_prepass_textures(
     let mut deferred_textures = HashMap::default();
     let mut deferred_lighting_id_textures = HashMap::default();
     let mut motion_vectors_textures = HashMap::default();
-    for (entity, camera, depth_prepass, normal_prepass, motion_vector_prepass, deferred_prepass) in
-        &views_3d
+    for (
+        entity,
+        camera,
+        depth_prepass,
+        normal_prepass,
+        motion_vector_prepass,
+        deferred_prepass,
+        hierarchical_depth_buffer,
+    ) in &views_3d
     {
         let Some(physical_target_size) = camera.physical_target_size else {
             continue;
@@ -757,7 +788,7 @@ pub fn prepare_prepass_textures(
             height: physical_target_size.y,
         };
 
-        let cached_depth_texture = depth_prepass.then(|| {
+        let cached_depth_texture = (depth_prepass || hierarchical_depth_buffer).then(|| {
             depth_textures
                 .entry(camera.target.clone())
                 .or_insert_with(|| {

diff --git a/crates/bevy_core_pipeline/src/culling/downsample_depth.wgsl b/crates/bevy_core_pipeline/src/culling/downsample_depth.wgsl
@@ -0,0 +1,16 @@
+#import bevy_core_pipeline::fullscreen_vertex_shader::FullscreenVertexOutput
+
+@group(0) @binding(0) var input_depth: texture_2d<f32>;
+@group(0) @binding(1) var samplr: sampler;
+
+/// Performs a 2x2 downsample on a depth texture to generate the next mip level of a hierarchical depth buffer.
+
+@fragment
+fn downsample_depth(in: FullscreenVertexOutput) -> @location(0) vec4<f32> {
+    let depth_quad = textureGather(0, input_depth, samplr, in.uv);
+    let downsampled_depth = min(
+        min(depth_quad.x, depth_quad.y),
+        min(depth_quad.z, depth_quad.w),
+    );
+    return vec4(downsampled_depth, 0.0, 0.0, 0.0);
+}