Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: spatial index for better performance when culling and sorting #5571

Closed
Usnul opened this issue Nov 4, 2014 · 9 comments
Closed

Comments

@Usnul
Copy link
Contributor

Usnul commented Nov 4, 2014

I've been looking through the renderer code, and wondering about perofmance overhead of using a plethora of individual meshes. Looking at the code, there are 2 major improvements to be had:

  • culling
  • depth sorting

if we look at culling, we can see full traversal at work:

function projectObject( scene, object ) {
if ( object.visible === false ) return;
if ( object instanceof THREE.Scene || object instanceof THREE.Group ) {
// skip
} else {
initObject( object, scene );
if ( object instanceof THREE.Light ) {
lights.push( object );
} else if ( object instanceof THREE.Sprite ) {
sprites.push( object );
} else if ( object instanceof THREE.LensFlare ) {
lensFlares.push( object );
} else {
var webglObjects = _webglObjects[ object.id ];
if ( webglObjects && ( object.frustumCulled === false || _frustum.intersectsObject( object ) === true ) ) {
updateObject( object, scene );
for ( var i = 0, l = webglObjects.length; i < l; i ++ ) {
var webglObject = webglObjects[i];
unrollBufferMaterial( webglObject );
webglObject.render = true;
if ( _this.sortObjects === true ) {
if ( object.renderDepth !== null ) {
webglObject.z = object.renderDepth;
} else {
_vector3.setFromMatrixPosition( object.matrixWorld );
_vector3.applyProjection( _projScreenMatrix );
webglObject.z = _vector3.z;
}
}
}
}
}
}
for ( var i = 0, l = object.children.length; i < l; i ++ ) {
projectObject( scene, object.children[ i ] );
}
}

And similarly, if we look at sorting

if ( _this.sortObjects === true ) {
opaqueObjects.sort( painterSortStable );
transparentObjects.sort( reversePainterSortStable );
}

if we had a spatial index such as an octree with bounding boxes - we could traverse it to get ojects in almost sorted order (local swaps would still be required).

For culling, we could similarly drop large amount of interation

@mrdoob
Copy link
Owner

mrdoob commented Nov 5, 2014

if we had a spatial index such as an octree with bounding boxes - we could traverse it to get ojects in almost sorted order (local swaps would still be required).

But then we need to update that octree when something changes. Also, I was told years ago that BVH is much more performant for this stuff.

@Usnul
Copy link
Contributor Author

Usnul commented Nov 5, 2014

@mrdoob
I think it boils down to implementation, BVH is more intelligent than basic octree, you can associate a volume with a meaningful object or a group, for octree it's subdivision pure and simple. Because of that octree can make valid assumptions towards space partitioning, if x is past middle of a Cube - you can ignore half of the children already, if y is past the middle - you can ignore another. With BVH you still have to consider all children.

With BVH we could mark updated leafs as dirty and resize ancestors accordingly, it's a log(n) penalty on bounding box volume update which is pretty cheap to begin with. Considering sorting N things every frame - you already have O( n_log(n) ) overhead for that, replacing that with ocasional updates that have n_log(n) worst-case overhead for full update is a pretty decent trade.

As side benefits:

  • If we had a spatial index - raytracer would be a lot faster too for free.
  • to implement signed distance fields - we'd need a spatial index, and having bounding volume one for meshes would be a start (link: Better shadow maps #5554)

@joeedh
Copy link

joeedh commented Nov 5, 2014

@mrdoob that's what DAG solvers are for. :) They're dead simple, too (academia needlessly overcomplicates 'em):

#returns a flat list, where parents precede children
def sort_dag(nodes):
  list = []

  def sort(node):
    if node.done: return

    node.done = true

    #walk up parent chain
    for parent in node.parents:
      if not parent.done:
        sort(parent)

    list.append(node)

  for node in nodes:
    sort(node)

  return list;


list = sort_dag(scene.objects);
for ob in list:
  ob.graph_execute()

@Usnul
Copy link
Contributor Author

Usnul commented Nov 6, 2014

Been playing around with BVH as per @mrdoob suggestion. Filling it in efficiently is a real nightmare in terms of performance. I'm only at the stage of having a static BVH so far. Here are some results:
2014-11-06 22_42_09-localhost_8081

generating BVH for 100,000 faces of lucy this on my machine takes 3385.362ms which is nothing to write home about. This is done with no re-balancing, using a cost function to recursively find tightest fitting volume for new element being inserted.

In terms of construction BVH is a lot more delicate it appears than kd or octree as they partition the space instead of working as clustering mechanisms. The real joy comes when we build a balanced BVH with surface area heuristic (SAH), the algorithm is O(n^3) complexity, so it takes absolute forever. Adding octree-inspired splitting, respecting bounding volumes and adjusting bounds is yielding better results and takes 7517.297ms on my machine to compute, for the same problem.

to demonstrate the difference, here's a 2d projection of a set of 1000 random boxes being inserted into the tree:
2014-11-06 23_03_30-localhost_8081
Left is the incremental insert, one element at a time. Cost is very low, but you end up with rather bad clustering, there are too many large volumes, and they don't fit very tightly. On the right you have a much better clustering, and a nice balanced tree. Performance-wise here are some numbers:
boxes / incremental insert / good insert

  • 1000 / 6.805ms / 9.224ms
  • 10,000 / 55.391ms / 361.883ms
  • 100,000 / 893.285ms / 37,143.153ms

although this is somewhat bad representation of actual numbers, since my tests intentionally generally very dense collection of boxes that are hard to cluster. Interestingly, more sparse sets yield opposite results, making complete clustering run fast, and incremental inserts go slow:
2014-11-06 23_14_15-localhost_8081

  • 100,000 / 4849.039ms / 1154.688ms
  • for another metric, summing volumes of all nodes in the tree. Fast method uses 56.89 times more volume than the "good" one.

Dynamic use-case doesn't seem very difficult in terms of adjusting bounding boxes, since every adjustment is only proportional to depth in the tree, and will most likely terminate propagation only a node or two up. Having a volume change so much that every ancestor up to the root needs to be adjusted would be pretty rare.

I'm thinking of working in incremental direction, having partial optimization routines, so we can improve the tree a little by little over time. In highly dynamic scenes you will probably end up with a tree that's almost no better than a list for searching.

Future work:

  • Get dynamic use-case working, allowing bounds to be modified after insertion.
  • Do some benchmarking against other existing implementations (such as the octree written for THREE)
  • Incremental tree optimization
  • better tests to better represent actual usecases.
  • implement frustum traversal on three tree (for culling)

@mrdoob
Copy link
Owner

mrdoob commented Nov 7, 2014

@Usnul Great stuff! Have a look at @BenediktS work here too btw: #5444 (comment)

@Usnul
Copy link
Contributor Author

Usnul commented Nov 7, 2014

Thanks, i'll try the z-curve sorting. Looks promising, it's where I currently spend up-to 50% of the time.

@Usnul
Copy link
Contributor Author

Usnul commented Nov 7, 2014

Using z-curve does help a little, i'm not sure about the quality of the tree for searches, but new performance figures are as follows for complete builds:

  • 10K boxes - 19.371ms
  • 100K boxes - 240.267ms
  • 1M boxes - 3.134s

Lucy takes 368.394ms to generate BVH for (all numbers on chrome)

@Doidel
Copy link
Contributor

Doidel commented Jan 20, 2015

You can also find a k-d tree implementation in the three.js examples. I created an in-memory version of it. And here the theoretical values of operations on k-d trees: http://en.wikipedia.org/wiki/K-d_tree#Complexity (haven't read or tested whether it suits your needs, just happened to stumble over this issue)

@Usnul
Copy link
Contributor Author

Usnul commented Jan 22, 2015

KD is very bad for resizing, the reason BVH seems to be the go-to solution for dynamic scenes is it's hierarchical structure. In BVH to translate 100,000 vertices that are clustered together you may simply move the volume bounding them, in KD this is not as simple. KD is a great solution for fast building times and it has better prunning performance for good trees compares to BVH, however - for dynamic scenes this means very little. There is a KD implementation for webGL that allows queries on GPU, but tree still has to be built on CPU. In the same key, using binary trees you can store BVH on GPU for querying.

Hope this help to clarify things a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants