Suggestion: spatial index for better performance when culling and sorting #5571

Usnul · 2014-11-04T14:07:37Z

I've been looking through the renderer code, and wondering about perofmance overhead of using a plethora of individual meshes. Looking at the code, there are 2 major improvements to be had:

culling
depth sorting

if we look at culling, we can see full traversal at work:

three.js/src/renderers/WebGLRenderer.js

Lines 3464 to 3535 in 2d59713

    
           function projectObject( scene, object ) { 
        
           	if ( object.visible === false ) return; 
        
           	if ( object instanceof THREE.Scene || object instanceof THREE.Group ) { 
        
           		// skip 
        
           	} else { 
        
           		initObject( object, scene ); 
        
           		if ( object instanceof THREE.Light ) { 
        
           			lights.push( object ); 
        
           		} else if ( object instanceof THREE.Sprite ) { 
        
           			sprites.push( object ); 
        
           		} else if ( object instanceof THREE.LensFlare ) { 
        
           			lensFlares.push( object ); 
        
           		} else { 
        
           			var webglObjects = _webglObjects[ object.id ]; 
        
           			if ( webglObjects && ( object.frustumCulled === false || _frustum.intersectsObject( object ) === true ) ) { 
        
           				updateObject( object, scene ); 
        
           				for ( var i = 0, l = webglObjects.length; i < l; i ++ ) { 
        
           					var webglObject = webglObjects[i]; 
        
           					unrollBufferMaterial( webglObject ); 
        
           					webglObject.render = true; 
        
           					if ( _this.sortObjects === true ) { 
        
           						if ( object.renderDepth !== null ) { 
        
           							webglObject.z = object.renderDepth; 
        
           						} else { 
        
           							_vector3.setFromMatrixPosition( object.matrixWorld ); 
        
           							_vector3.applyProjection( _projScreenMatrix ); 
        
           							webglObject.z = _vector3.z; 
        
           						} 
        
           					} 
        
           				} 
        
           			} 
        
           		} 
        
           	} 
        
           	for ( var i = 0, l = object.children.length; i < l; i ++ ) { 
        
           		projectObject( scene, object.children[ i ] ); 
        
           	} 
        
           }

And similarly, if we look at sorting

three.js/src/renderers/WebGLRenderer.js

Lines 3368 to 3373 in 2d59713

    
           if ( _this.sortObjects === true ) { 
        
           	opaqueObjects.sort( painterSortStable ); 
        
           	transparentObjects.sort( reversePainterSortStable ); 
        
           }

if we had a spatial index such as an octree with bounding boxes - we could traverse it to get ojects in almost sorted order (local swaps would still be required).

For culling, we could similarly drop large amount of interation

mrdoob · 2014-11-05T00:33:30Z

if we had a spatial index such as an octree with bounding boxes - we could traverse it to get ojects in almost sorted order (local swaps would still be required).

But then we need to update that octree when something changes. Also, I was told years ago that BVH is much more performant for this stuff.

Usnul · 2014-11-05T01:31:55Z

@mrdoob
I think it boils down to implementation, BVH is more intelligent than basic octree, you can associate a volume with a meaningful object or a group, for octree it's subdivision pure and simple. Because of that octree can make valid assumptions towards space partitioning, if x is past middle of a Cube - you can ignore half of the children already, if y is past the middle - you can ignore another. With BVH you still have to consider all children.

With BVH we could mark updated leafs as dirty and resize ancestors accordingly, it's a log(n) penalty on bounding box volume update which is pretty cheap to begin with. Considering sorting N things every frame - you already have O( n_log(n) ) overhead for that, replacing that with ocasional updates that have n_log(n) worst-case overhead for full update is a pretty decent trade.

As side benefits:

If we had a spatial index - raytracer would be a lot faster too for free.
to implement signed distance fields - we'd need a spatial index, and having bounding volume one for meshes would be a start (link: Better shadow maps #5554)

joeedh · 2014-11-05T23:28:00Z

@mrdoob that's what DAG solvers are for. :) They're dead simple, too (academia needlessly overcomplicates 'em):

#returns a flat list, where parents precede children
def sort_dag(nodes):
  list = []

  def sort(node):
    if node.done: return

    node.done = true

    #walk up parent chain
    for parent in node.parents:
      if not parent.done:
        sort(parent)

    list.append(node)

  for node in nodes:
    sort(node)

  return list;


list = sort_dag(scene.objects);
for ob in list:
  ob.graph_execute()

Usnul · 2014-11-06T23:01:00Z

Been playing around with BVH as per @mrdoob suggestion. Filling it in efficiently is a real nightmare in terms of performance. I'm only at the stage of having a static BVH so far. Here are some results:

generating BVH for 100,000 faces of lucy this on my machine takes 3385.362ms which is nothing to write home about. This is done with no re-balancing, using a cost function to recursively find tightest fitting volume for new element being inserted.

In terms of construction BVH is a lot more delicate it appears than kd or octree as they partition the space instead of working as clustering mechanisms. The real joy comes when we build a balanced BVH with surface area heuristic (SAH), the algorithm is O(n^3) complexity, so it takes absolute forever. Adding octree-inspired splitting, respecting bounding volumes and adjusting bounds is yielding better results and takes 7517.297ms on my machine to compute, for the same problem.

to demonstrate the difference, here's a 2d projection of a set of 1000 random boxes being inserted into the tree:

Left is the incremental insert, one element at a time. Cost is very low, but you end up with rather bad clustering, there are too many large volumes, and they don't fit very tightly. On the right you have a much better clustering, and a nice balanced tree. Performance-wise here are some numbers:
boxes / incremental insert / good insert

1000 / 6.805ms / 9.224ms
10,000 / 55.391ms / 361.883ms
100,000 / 893.285ms / 37,143.153ms

although this is somewhat bad representation of actual numbers, since my tests intentionally generally very dense collection of boxes that are hard to cluster. Interestingly, more sparse sets yield opposite results, making complete clustering run fast, and incremental inserts go slow:

100,000 / 4849.039ms / 1154.688ms
for another metric, summing volumes of all nodes in the tree. Fast method uses 56.89 times more volume than the "good" one.

Dynamic use-case doesn't seem very difficult in terms of adjusting bounding boxes, since every adjustment is only proportional to depth in the tree, and will most likely terminate propagation only a node or two up. Having a volume change so much that every ancestor up to the root needs to be adjusted would be pretty rare.

I'm thinking of working in incremental direction, having partial optimization routines, so we can improve the tree a little by little over time. In highly dynamic scenes you will probably end up with a tree that's almost no better than a list for searching.

Future work:

Get dynamic use-case working, allowing bounds to be modified after insertion.
Do some benchmarking against other existing implementations (such as the octree written for THREE)
Incremental tree optimization
better tests to better represent actual usecases.
implement frustum traversal on three tree (for culling)

mrdoob · 2014-11-07T01:13:11Z

@Usnul Great stuff! Have a look at @BenediktS work here too btw: #5444 (comment)

Usnul · 2014-11-07T02:01:38Z

Thanks, i'll try the z-curve sorting. Looks promising, it's where I currently spend up-to 50% of the time.

Usnul · 2014-11-07T18:12:15Z

Using z-curve does help a little, i'm not sure about the quality of the tree for searches, but new performance figures are as follows for complete builds:

10K boxes - 19.371ms
100K boxes - 240.267ms
1M boxes - 3.134s

Lucy takes 368.394ms to generate BVH for (all numbers on chrome)

Doidel · 2015-01-20T14:04:02Z

You can also find a k-d tree implementation in the three.js examples. I created an in-memory version of it. And here the theoretical values of operations on k-d trees: http://en.wikipedia.org/wiki/K-d_tree#Complexity (haven't read or tested whether it suits your needs, just happened to stumble over this issue)

Usnul · 2015-01-22T08:35:22Z

KD is very bad for resizing, the reason BVH seems to be the go-to solution for dynamic scenes is it's hierarchical structure. In BVH to translate 100,000 vertices that are clustered together you may simply move the volume bounding them, in KD this is not as simple. KD is a great solution for fast building times and it has better prunning performance for good trees compares to BVH, however - for dynamic scenes this means very little. There is a KD implementation for webGL that allows queries on GPU, but tree still has to be built on CPU. In the same key, using binary trees you can store BVH on GPU for querying.

Hope this help to clarify things a bit.

mrdoob added the Suggestion label Nov 5, 2014

mrdoob closed this as completed Sep 3, 2015

Usnul mentioned this issue Dec 3, 2018

Expose objects' frustum culled state #15339

Closed

3 tasks

Usnul mentioned this issue Apr 5, 2019

Raycaster performance drops dramatically #16153

Closed

13 tasks

Usnul mentioned this issue Dec 27, 2020

BVH creation time? Usnul/meep#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: spatial index for better performance when culling and sorting #5571

Suggestion: spatial index for better performance when culling and sorting #5571

Usnul commented Nov 4, 2014

mrdoob commented Nov 5, 2014

Usnul commented Nov 5, 2014

joeedh commented Nov 5, 2014

Usnul commented Nov 6, 2014

mrdoob commented Nov 7, 2014

Usnul commented Nov 7, 2014

Usnul commented Nov 7, 2014

Doidel commented Jan 20, 2015

Usnul commented Jan 22, 2015

Suggestion: spatial index for better performance when culling and sorting #5571

Suggestion: spatial index for better performance when culling and sorting #5571

Comments

Usnul commented Nov 4, 2014

mrdoob commented Nov 5, 2014

Usnul commented Nov 5, 2014

joeedh commented Nov 5, 2014

Usnul commented Nov 6, 2014

mrdoob commented Nov 7, 2014

Usnul commented Nov 7, 2014

Usnul commented Nov 7, 2014

Doidel commented Jan 20, 2015

Usnul commented Jan 22, 2015