Skip to content

Merge frustum culling into compute, convert to storage buffers, fix >16M splat dispatch#8561

Merged
mvaligursky merged 2 commits intomainfrom
mv-culling-compute-storage
Mar 31, 2026
Merged

Merge frustum culling into compute, convert to storage buffers, fix >16M splat dispatch#8561
mvaligursky merged 2 commits intomainfrom
mv-culling-compute-storage

Conversation

@mvaligursky
Copy link
Copy Markdown
Contributor

@mvaligursky mvaligursky commented Mar 30, 2026

Merge frustum culling from a separate render pass into the interval compaction compute
shader, convert culling data from textures to storage buffers, and fix a dispatch overflow
for scenes with more than 16M visible splats.

Changes:

  • Merge frustum culling directly into the interval cull compute pass. With the removal
    of the non-compute renderers using culling, culling no longer needs to run as a render pass
    and can be fully compute-based. This eliminates the separate GSplatNodeCullRenderPass
    and the intermediate bit-packed nodeVisibilityTexture. Frustum planes are computed on
    CPU and passed as uniforms; the sphere-vs-frustum test runs inline per interval.
  • Convert bounding sphere, transform index, and world matrix data from GPU textures to
    storage buffers (StorageBuffer), simplifying GPU addressing and reducing overhead.
  • Fix dispatch overflow for >16M visible splats: the write-indirect-args shader now
    computes 2D dispatch layouts to stay within maxComputeWorkgroupsPerDimension (65535).
    Previously, the keygen workgroup count was packed entirely into the X dimension, causing
    silent clamping above ~16.7M splats.
  • Extract calcDispatch2D into a reusable WGSL chunk (dispatch-core.js), replacing
    duplicated inline math in 3 shaders.
  • Delete obsolete GSplatNodeCullRenderPass class and its GLSL/WGSL fragment shaders.

Performance:

  • Eliminates one GPU render pass and one intermediate texture for frustum culling
  • Storage buffers use direct array indexing instead of 2D texture coordinate math
  • No regression for scenes under 16M splats; fixes incorrect rendering above that threshold

…16M splat dispatch

With the removal of the non-compute (global) renderer, culling no longer
needs to run as a render pass and can be fully compute-based. Merge the
frustum test into the interval cull compute shader, convert culling data
to storage buffers, fix dispatch overflow for >16M splats, and extract a
reusable calcDispatch2D WGSL chunk.

Made-with: Cursor
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR moves GSplat frustum culling fully into the compute-driven interval compaction path, replaces several GPU texture-based data feeds with storage buffers, and fixes indirect compute dispatch overflow when visible splat counts exceed the per-dimension workgroup limit.

Changes:

  • Merge frustum culling into compute-gsplat-interval-cull (compute-only path) and remove the dedicated node-cull render pass + its shaders.
  • Convert bounds/transform data from textures to storage buffers and pass frustum planes via uniforms.
  • Fix >16M visible splat indirect dispatch by writing 2D dispatch dimensions and extracting calcDispatch2D into a shared WGSL chunk.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/scene/shader-lib/wgsl/chunks/gsplat/frag/gsplatNodeCulling.js Removed obsolete WGSL fragment shader for node visibility render-pass culling.
src/scene/shader-lib/glsl/chunks/gsplat/frag/gsplatNodeCulling.js Removed obsolete GLSL fragment shader for node visibility render-pass culling.
src/scene/shader-lib/wgsl/chunks/gsplat/compute-gsplat-write-indirect-args.js Writes 2D indirect dispatch dimensions using shared calcDispatch2D to avoid workgroup clamping.
src/scene/shader-lib/wgsl/chunks/gsplat/compute-gsplat-local-copy.js Reuses shared calcDispatch2D for chunk-sort indirect args.
src/scene/shader-lib/wgsl/chunks/gsplat/compute-gsplat-local-classify.js Reuses shared calcDispatch2D for local renderer indirect dispatch args.
src/scene/shader-lib/wgsl/chunks/gsplat/compute-gsplat-interval-cull.js Inlines sphere-vs-frustum test per interval and reads bounds/transforms from storage buffers.
src/scene/shader-lib/wgsl/chunks/common/comp/dispatch-core.js New shared WGSL helper to compute safe 2D dispatch sizes.
src/scene/gsplat-unified/gsplat-node-cull-render-pass.js Removed render-pass class that generated the node visibility texture.
src/scene/gsplat-unified/gsplat-manager.js Switches culling setup/compaction API from visibility texture to frustum culler + plane computation.
src/scene/gsplat-unified/gsplat-interval-compaction.js Updates bind groups/uniforms for buffer-based culling and 2D indirect dispatch sizing.
src/scene/gsplat-unified/gsplat-frustum-culler.js Replaces culling textures/render pass with storage buffers and CPU frustum-plane computation.
examples/src/examples/gaussian-splatting/lod-streaming.controls.mjs Expands splat budget UI max from 20 to 40.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 827 to 828
// Bounds and transforms textures are needed for frustum culling.
// These index splats sequentially, so always use the full splats array.
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment still refers to “Bounds and transforms textures”, but culling data is now stored in storage buffers (boundsBuffer/transformsBuffer). Update the comment to avoid misleading future changes/debugging.

Suggested change
// Bounds and transforms textures are needed for frustum culling.
// These index splats sequentially, so always use the full splats array.
// Bounds and transforms culling data (stored in bounds/transforms storage buffers)
// index splats sequentially, so always update them using the full boundsGroups set.

Copilot uses AI. Check for mistakes.
/**
* Runs GPU frustum culling: updates the transforms texture and renders the
* node visibility pass, producing the bit-packed nodeVisibilityTexture.
* Prepares frustum culling data: updates the transforms texture and computes
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This JSDoc says the frustum culler “updates the transforms texture”, but the implementation now updates storage buffers via updateTransformsData. Please update wording to match the new buffer-based pipeline.

Suggested change
* Prepares frustum culling data: updates the transforms texture and computes
* Prepares frustum culling data: updates the GPU transform buffers and computes

Copilot uses AI. Check for mistakes.
* Runs the full interval compaction pipeline: cull+count, prefix sum, scatter.
*
* @param {Texture|null} nodeVisibilityTexture - Bit-packed visibility texture (when culling).
* @param {GSplatFrustumCuller|null} frustumCuller - Frustum culler with textures and planes (when culling).
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dispatchCompact parameter doc says “textures and planes”, but GSplatFrustumCuller now exposes storage buffers (boundsBuffer/transformsBuffer) plus frustumPlanes. Update the description to avoid confusion about required resources.

Suggested change
* @param {GSplatFrustumCuller|null} frustumCuller - Frustum culler with textures and planes (when culling).
* @param {GSplatFrustumCuller|null} frustumCuller - Frustum culler providing bounds/transforms storage buffers and frustum planes (when culling).

Copilot uses AI. Check for mistakes.
Comment on lines 413 to +416
if (cullingEnabled) {
cullCompute.setParameter('nodeVisibilityTexture', nodeVisibilityTexture);
cullCompute.setParameter('boundsBuffer', frustumCuller.boundsBuffer);
cullCompute.setParameter('transformsBuffer', frustumCuller.transformsBuffer);
cullCompute.setParameter('frustumPlanes[0]', frustumCuller.frustumPlanes);
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When cullingEnabled is true, this code dereferences frustumCuller without validating it. Since the method signature allows null, add a Debug.assert / early return (or tighten the signature) to prevent a runtime crash if a caller passes (null, true).

Copilot uses AI. Check for mistakes.
Comment on lines +142 to 150
if (numMatrices > this._allocatedTransformCount) {
this.transformsBuffer?.destroy();
this._allocatedTransformCount = numMatrices;
// 3 vec4f per matrix = 12 floats = 48 bytes
this.transformsBuffer = new StorageBuffer(this.device, numMatrices * 12 * 4, BUFFERUSAGE_COPY_DST);
}

const data = this.transformsTexture.lock();
const data = new Float32Array(numMatrices * 12);

Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateTransformsData allocates a new Float32Array every call. Since this runs whenever frustum planes are updated (and can be frequent), consider caching/reusing a typed array sized to _allocatedTransformCount * 12 to reduce GC pressure and improve frame stability.

Copilot uses AI. Check for mistakes.
…ped arrays

- Update comments/JSDoc referencing "textures" to "storage buffers"
- Add Debug.assert for frustumCuller when cullingEnabled is true
- Cache Float32Array/Uint32Array allocations in updateBoundsData and
  updateTransformsData to reduce per-frame GC pressure

Made-with: Cursor
@mvaligursky mvaligursky merged commit 1d87b0e into main Mar 31, 2026
8 checks passed
@mvaligursky mvaligursky deleted the mv-culling-compute-storage branch March 31, 2026 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: graphics Graphics related issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants