Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InstancedMesh: Proposal to support sorting, frustum culling #27170

Open
gkjohnson opened this issue Nov 10, 2023 · 24 comments
Open

InstancedMesh: Proposal to support sorting, frustum culling #27170

gkjohnson opened this issue Nov 10, 2023 · 24 comments

Comments

@gkjohnson
Copy link
Collaborator

gkjohnson commented Nov 10, 2023

Description

With the current InstancedMesh implementation it's not possible to easily or quickly sort individual instances to improve opaque overdraw for performance or transparency sorting.

Here are a couple posts referring to the transparency sorting issues with InstancedMesh: ref1, ref2

Solution

Similar to how the WebGL_multi_draw extension is used for BatchedMesh, there are "instanced" variants of the multidraw functions that allow for sorting the order in which instanced draws are performed while retaining a single draw call.

If desired we could switch InstancedMesh over to use the multidraw extension, as well, and provide support for sorting objects within a single InstancedMesh. See #27168 (adds sorting objects for BatchedMesh) for a reference on how this might be implemented for InstancedMesh.

The drawbacks are still that WebGL_multi_draw is not supported in Firefox so fully switching over to multidraw would remove InstancedMesh support for FF.

Alternatives

  • Add a separate MultiDrawInstancedMesh
  • Do nothing 🤷

Additional context

cc @WestLangley @donmccurdy

Repository owner deleted a comment from jabri62018 Nov 10, 2023
@donmccurdy
Copy link
Collaborator

...there are "instanced" variants of the multidraw functions that allow for sorting the order in which instanced draws are performed while retaining a single draw call.

We can technically do this with InstancedMesh, by sorting the instance attributes. Not sure if I like that solution — either we break user's indices, or require some indirection between the user-facing index and the index on GPU. But for the sake of discussion... how would that differ from using WEBGL_multi_draw?

The drawbacks are still that WebGL_multi_draw is not supported in Firefox so fully switching over to multidraw would remove InstancedMesh support for FF.

Ouch, that's a pretty high cost... 😕

Alternatives

Add a separate MultiDrawInstancedMesh

A similar alternative ... could we allow BatchedMesh to reuse a geometry without duplication? One thing I like about the THREE.BatchedMesh name and API is that we aren't committed to a specific method of implementation.

@gkjohnson
Copy link
Collaborator Author

gkjohnson commented Nov 13, 2023

We can technically do this with InstancedMesh, by sorting the instance attributes. Not sure if I like that solution — either we break user's indices, or require some indirection between the user-facing index and the index on GPU. But for the sake of discussion... how would that differ from using WEBGL_multi_draw?

This would require sorting all per-instanced attribute data including color, matrices, and (in the long term) others which would be more cost prohibitive. A fully featured instanced mesh implementation would afford any material attribute to be defined per-instance - all of which would have to be rearranged. And in order to sort we'd need to double-store all sortable data in addition to breaking the GPU index. For simpler cases this might work but it's not ideal. The multi draw range approach is much cleaner.

Ouch, that's a pretty high cost... 😕

BatchedMesh will only work with the extension in the next release, as well, meaning FF will not be supported. Firefox is the only platform that doesn't support the extension so I'm hoping there's some pressure that can be applied to get it added.

A similar alternative ... could we allow BatchedMesh to reuse a geometry without duplication? One thing I like about the THREE.BatchedMesh name and API is that we aren't committed to a specific method of implementation.

Some of the implementation could be reused in a common class but overall I think trying to make instancing and batching work in a single API will be more convoluted than it's worth.

@gkjohnson
Copy link
Collaborator Author

gkjohnson commented Nov 13, 2023

Alternatives

Add a separate MultiDrawInstancedMesh

A similar alternative ...

I guess another alternative is to support both code paths - the current implementation that doesn't support sorting or frustum culling and then a multi draw one that does. Until FF supports the extension, that is.

@RenaudRohlinger
Copy link
Collaborator

If it can help this is how I handle multidraw fallback for firefox. Then in the BatchedMesh class a simple bufferattribute drawID to emulate gl_DrawID when not available.

	function renderMultiDraw( starts, counts, drawCount ) {

		const extension = extensions.get( 'WEBGL_multi_draw' );

		if ( extension === null ) {

			for ( let i = 0; i < starts.length; i ++ ) {

				this.render( starts[ i ] / bytesPerElement, counts[ i ] );

				info.update( counts[ i ], mode, 1 );

			}

		}

		extension.multiDrawElementsWEBGL( mode, counts, 0, type, starts, 0, drawCount );

		const sum = counts.reduce( ( partialSum, a ) => partialSum + a, 0 );

		info.update( sum, mode, 1 );

	}

	function renderInstancesMultiDraw( starts, counts, instanceCounts, primCount ) {

		const extension = extensions.get( 'WEBGL_multi_draw' );

		if ( extension === null ) {

			for ( let i = 0; i < starts.length; i ++ ) {

				this.renderInstances( starts[ i ] / bytesPerElement, counts[ i ], instanceCounts[ i ] );

				info.update( counts[ i ], mode, instanceCounts[ i ] );

			}

		} else {

			extension.multiDrawElementsInstancedWEBGL( mode, counts, 0, type, starts, 0, instanceCounts, 0, primCount );

			const sum = counts.reduce( ( b, a ) => b + a, 0 );
			const sumInstances = instanceCounts.reduce( ( b, a ) => b + a, 0 );

			info.update( sum, mode, sumInstances );

		}

	}

@gkjohnson
Copy link
Collaborator Author

gkjohnson commented Nov 13, 2023

If it can help this is how I handle multidraw fallback for firefox.

Looks like this removes a lot of the benefit of current instancing (makes a new drawcall for each mesh) but it would still do away with a lot of the overhead from updating shader uniforms compared to the naive approach of a bunch of meshes. This seems like a good middle-ground to me while keeping the code simple. If @mrdoob, @Mugen87, and / or @donmccurdy agree maybe this can be added to InstancedMesh

Then in the BatchedMesh class a simple bufferattribute drawID to emulate gl_DrawID when not available.

BatchedMesh is already using a batch id attribute - I expect that with sorting and filtering of individual meshes the gl_DrawID value wouldn't be correct, anyway (I don't know how it could be). I may be missing something but for some reason I can't get the extensions working in the shader to check, though 🤔

It's unclear what the behavior of InstanceID will be with multi draw, as well.

@donmccurdy
Copy link
Collaborator

@gkjohnson I'm happy with whatever you prefer here -- thank you!

@mrdoob
Copy link
Owner

mrdoob commented Nov 17, 2023

If the goal is to sort instances by transparency...
Could this be done in THREE.InstanceMesh instead by adding a .sortInstances( camera )?

@gkjohnson
Copy link
Collaborator Author

If the goal is to sort instances by transparency...
Could this be done in THREE.InstanceMesh instead by adding a .sortInstances( camera )?

The goal was to enable quickly and transparently sorting instances. If we just add it to the existing implementation then a bunch of data has to be re arranged in the vertex buffers and the user indices would no longer line up, either.

It looks like I misunderstood the capabilities of multidraw's instanced draw function, though. It looks like it's not possible to specify subranges of the InstancedBufferAttributes to draw meaning my plan for sorting similar to BatchedMesh won't work as easily. Instead multiDrawArraysInstancedWEBGL and multiDrawElementsInstancedWEBGL functions would basically let you draw multiple meshes as instanced within a single draw call. Basically from a user perspective you could merge multiple InstancedMeshes that draws multiple objects in different places into a single object and therefore a single draw call. I think the returns are diminishing at this point. @RenaudRohlinger how are you using the these functions? And how are you determining which transform matrices to use for different objects?

In order to get the kind of sorting and filtering I'm suggesting I think we'd have to change the structure of InstancedMesh to use a DataTexture to store transform matrices and a separate one for colors then we can use a sorted instanced buffer with indices into those data textures to adjust which objects we draw and in what order. This is more aligned with how BatchedMesh works, as well. The downside is that we have to use texture slots for different material properties like colors which becomes prohibitive at some point. TextureArrays could help with this but still. The Node material system may help make this less complicated.

I'm not exactly sure what we should do here..

@gkjohnson gkjohnson changed the title InstancedMesh: Proposal to use multidraw InstancedMesh: Proposal to sorting, frustum culling Nov 21, 2023
@gkjohnson
Copy link
Collaborator Author

gkjohnson commented Nov 21, 2023

For another data point - it looks as thought Babylon is performing sorting on the CPU by rearranging the matrices in a buffer. The matrix data is double-stored, though. Frustum culling could be implemented in a similar way.

https://github.com/BabylonJS/Babylon.js/blob/dc2c856f5b028d59ed34200993c55fb4c3b68415/packages/dev/core/src/Meshes/mesh.ts#L1942-L1971

@gkjohnson gkjohnson changed the title InstancedMesh: Proposal to sorting, frustum culling InstancedMesh: Proposal to support sorting, frustum culling Nov 21, 2023
@gkjohnson
Copy link
Collaborator Author

we'd have to change the structure of InstancedMesh to use a DataTexture to store transform matrices and a separate one for colors then we can use a sorted instanced buffer with indices into those data textures to adjust which objects we draw and in what order.

Here's an example of storing material information in a data texture and referencing it based on instance id (or batchId in this case). Here everything is rendered with one draw call while still retaining a unique color, emission, roughness, and metalness per object. This based on BatchedMesh but the theory is the same for InstancedMesh.

If this is how InstancedMesh stored colors and matrices then we could just sort the InstancedMesh.ids buffer attribute which would afford unique, per-object materials and transforms while retaining a reasonable sort / filter speed.

From a technical perspective - this is done by packing the four above material properties for each object into a texture ordered by id and then modifying the MeshStandardMaterial shader in onBeforeCompile to read from the data texture values rather than the material uniforms. Using a TextureArray to store normal, roughness, metalness, etc textures and then packing the texture index into the material properties texture you could afford unique maps per object, as well.

Demo here

Repo here

image

Using a technique like this with InstancedMesh and BatchedMesh you could render a whole, fairly complex scene with hundreds of different materials in 1 draw call as long as they all use the same shader.

@mrdoob
Copy link
Owner

mrdoob commented Nov 28, 2023

That demo runs super slow here (MacBook M2 Pro)... 300ms per frame 🤔

@gkjohnson
Copy link
Collaborator Author

gkjohnson commented Nov 28, 2023

That demo runs super slow here (MacBook M2 Pro)... 300ms per frame 🤔

How fun 😅 It's running at 120fps on my older 2021 M1 Pro machine...

Are there any hints at what's causing it in the profiler? CPU? GPU? I've updated the demo to include toggles to disable each of the fields mapped to datatexture values. Disabling all of them should fall back to a normal, unmodified MeshStandardMaterial shader. Does toggling some of those fields fix the framerate at all?

image

@mrdoob
Copy link
Owner

mrdoob commented Dec 8, 2023

Sorry for the delay..

Screenshot 2023-12-08 at 11 17 22 AM

If I disable all the settings it runs fast:

Screenshot 2023-12-08 at 11 09 41 AM

And then if I enable one the frame rate tanks again:

Screenshot 2023-12-08 at 11 12 11 AM

Maybe this helps...?

Screenshot 2023-12-08 at 11 18 18 AM

@mrdoob
Copy link
Owner

mrdoob commented Dec 8, 2023

I made a recording of enabling/disabling color:

Screenshot 2023-12-08 at 11 25 12 AM

@mrdoob
Copy link
Owner

mrdoob commented Dec 8, 2023

Here are some recordings:

@gkjohnson
Copy link
Collaborator Author

O_O

4 seconds of GPU time for both just the color field and all fields...

image

This must be some kind of compiler or hardware idiosyncrasy. I'll try one more thing in a bit to see if it addresses this. Currently the the shader works by declaring local variables in the "main" function with the same name as the material uniforms to "override" the value with the texture value. Perhaps this the local variables masking the uniform variables is the cause of this odd performance?

@gkjohnson
Copy link
Collaborator Author

gkjohnson commented Dec 8, 2023

Okay stunning... I just updated Chrome to the latest version and now runs extremely slowly on my M1 Macbook, as well... So this is just a browser regression...

@CodyJasonBennett
Copy link
Contributor

I have an M2 14" to test on, but what jumps out at me is the use of FloatType. I wonder if { type: HalfFloatType } or { format: 'RGB', type: FloatType, internalFormat: 'R11F_G11F_B10F' } would fare better or if there's a better packing strategy. I vaguely remember @0beqz efficiently packing material properties into a single texture g-buffer for https://github.com/0beqz/realism-effects. I can't speak to sampling performance though.

@gkjohnson
Copy link
Collaborator Author

gkjohnson commented Dec 8, 2023

It turns out this was related to using the flat modifier with a varying field. Removing it fixes the issue entirely... It seems I can only recreate this using BatchedMesh 🤔 (and therefore the multi draw extension), though. Anyway here's a fiddle - adding and removing the "flat" modifier causes the performance to change dramatically:

https://jsfiddle.net/g9q4cswr/3/

flat varying float vBatchId; // bad

//

varying float vBatchIed; // good

I've also updated the demo to remove the "flat" modifier and now performance is better again, so you can see the proof of concept for rendering an arbitrary number of material properties in a single draw call:

https://gkjohnson.github.io/batched-material-properties-demo/

I have an M2 14" to test on, but what jumps out at me is the use of FloatType.

At least it didn't turn out to be something as critical as a Float texture that caused the problem 😬

@mrdoob
Copy link
Owner

mrdoob commented Dec 8, 2023

Solved the issue here too yep!

@RenaudRohlinger
Copy link
Collaborator

It turns out this was related to using the flat modifier with a varying field. Removing it fixes the issue entirely... It seems I can only recreate this using BatchedMesh 🤔 (and therefore the multi draw extension), though. Anyway here's a fiddle - adding and removing the "flat" modifier causes the performance to change dramatically:

https://jsfiddle.net/g9q4cswr/3/

flat varying float vBatchId; // bad



//



varying float vBatchIed; // good

I've also updated the demo to remove the "flat" modifier and now performance is better again, so you can see the proof of concept for rendering an arbitrary number of material properties in a single draw call:

https://gkjohnson.github.io/batched-material-properties-demo/

I have an M2 14" to test on, but what jumps out at me is the use of FloatType.

At least it didn't turn out to be something as critical as a Float texture that caused the problem 😬

Is using a flat varying integer instead of a flat varying float fixes the issue though? I never experienced that issue with flat int AFAIK.

@gkjohnson
Copy link
Collaborator Author

Is using a flat varying integer instead of a flat varying float fixes the issue though?

The issue still happens with flat varying int. But overall this issue only happens when using BatchedMesh (so likely related to multi draw).

@CodyJasonBennett
Copy link
Contributor

Saw this recently in private which I wonder if related due to Chrome update using Metal by default:

On Metal, Angle has to rewrite the index buffer if you're using flat attributes in your shader, because Metal and GL's provoking vertex convention is different (first vs. last). This is why we were seeing such high mem use. Using the WEBGL_provoking_vertex extension to change to first vertex convention resolves the issue (our flat attributes are constant across all verts, so first vs. last doesn't matter to us)

@agargaro
Copy link
Contributor

Hello, I have written a class that extends InstancedMesh and handles frustum culling on each object.
It creates objects similar to Object3D based on the count and this allows for easier handling and simplified calculations, not needing to take the matrices from the array but already having the position, rotation and scales available.
It manages visibility by putting the matrices of the objects to be hidden in the InstancedMatrix array at the end, trying to avoid cache miss problems.

https://discourse.threejs.org/t/instancedmesh2-easy-handling-and-frustum-culling/58622

I would like to help if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants