Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working version using swaps #44

Merged
merged 19 commits into from
Dec 20, 2018
Merged

Working version using swaps #44

merged 19 commits into from
Dec 20, 2018

Conversation

mqp
Copy link
Contributor

@mqp mqp commented Dec 13, 2018

I worked toward making an implementation of reordering the list of tris, as suggested in #42 and as we discussed earlier.

The code as written does this by leaving the original geometry object alone, and reordering and keeping a reference to the origTris (now BVHConstructionContext.tris) array. This seems slightly faster during construction, but actually results in using more memory in the resulting tree in this implementation -- it's referring to this one big array in each node, plus an offset and count, as opposed to the old one, in which each node referred to a small slice of the big array. So really this is just a start.

The obvious next step is to reorder the actual geometry object per the reordered tris array, so that we don't have to keep this "array of pointers" representation around next to the final tree. That will really be a nice improvement over the old version.

Other changes in this implementation:

  • I went back to recursing directly in createNode instead of using a queue of nodes. I like doing this because it means we're doing the tree construction left-to-right, depth-first, making it more comprehensible to e.g. reorder geometry incrementally when we have an edge node to construct, because we'll never have to look at the triangles in that edge node again later in tree construction. I'm not concerned about a stack overflow because we control the maximum tree depth anyway. It's also nice not to construct garbage closures in each createNode invocation.

  • I split some MeshBVH construction functionality into a BVHConstructionContext object, and moved some of the GeometryUtilities stuff which required precomputed pieces of construction data (e.g. centroids, bounds) into class methods on that. I'm not sure whether I can really say this is better, it just made it easier for me to keep everything organized when I was making this change. I'm open to changing it back or further reorganizing.

Open question: How serious are we about supporting non-BufferGeometry geometries? It will make this reordering idea rather more complicated if we have to care about multiple geometry representations. I personally would not bother supporting them, at least as an output. (In the case of outputting an indexed BufferGeometry, we have it especially easy, because we can just change the index and leave all the attributes alone; otherwise, we need to reorder all the attribute data as well.)

@gkjohnson
Copy link
Owner

This pull request introduces 2 alerts when merging b47fcfd into 96a159b - view on LGTM.com

new alerts:

  • 2 for Unused variable, import, function or class

Comment posted by LGTM.com

@gkjohnson
Copy link
Owner

This pull request introduces 2 alerts when merging e8a7622 into 96a159b - view on LGTM.com

new alerts:

  • 2 for Unused variable, import, function or class

Comment posted by LGTM.com

@gkjohnson
Copy link
Owner

Hey @mquander!

This is great. I know it's still a work in progress but I'll go through the code in a bit and add a few comments.

To answer a few of your questions:

... but actually results in using more memory in the resulting tree in this implementation...

I think this is okay for now -- enabling the re-ordering of the geometry indices directly in the long run should is a good plan.

I split some MeshBVH construction functionality into a BVHConstructionContext object... I'm open to changing it back or further reorganizing.

Separating the build logic out into a separate file is a good change, in my opionion. The tree constructor was getting pretty unwieldly. I do think I find keeping the construction operations as functional as possible to be a bit more traceable, especially if there's no internal caching going on. getCentroids(tris, geo, getValFunc) vs BVHConstructionContext.computeCentroids(), for example. It's definitely a matter of personal preference, though.

How serious are we about supporting non-BufferGeometry geometries?

I'm comfortable removing support for regular THREE Geometry. There's been talk of deprecating THREE.Geometry and I think it's generally a fine assumption that BufferGeometry should be used for anything moderately complicated.

... we need to reorder all the attribute data as well.

I would think generating a triangle index if one doesn't exist rather than re-ordering the attributes is an okay plan, for now. That sounds like a lot of complexity that would be great to avoid.

lib/MeshBVH.js Outdated

// create the two new child nodes
if ( ! left.length || ! right.length ) {
if ( index === offset || index === offset + count ) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically the case where all of the triangles wind up on one side of the partition, right? Shouldn't it be possible to know this ahead of time and avoid partitioning? Granted I'm not sure if there's a lot to gain from checking that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it feels like there is some duplication of concerns between this and the splitting code (which also can choose to return no-split-required.) Perhaps I will try to clean it up.

Copy link
Contributor Author

@mqp mqp Dec 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On reflection, I think this is actually not just awkward, but wrong. Consider the AVERAGE and CENTER strategies. If it happens to be that all of the triangle centroids wind up on one side of the average of the longest edge -- which seems strange but possible given that the longest edge computation is using the full bounding box, but partitioning is using the centroid -- this code will terminate tree construction, even though maybe if you did a split again you would continue to make useful splits. So something should be different here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be a problem in the AVERAGE split? In that case the average of all centroids are taken which would have to mean that some of the centroids are left on either side of the split. In the case of CENTER it would seem that this is just an artifact of choosing a less smart split strategy.

Ultimately I think this is okay.

@mqp
Copy link
Contributor Author

mqp commented Dec 14, 2018

This is getting closer to something I endorse. I made some other small fixes that were easy and hopefully uncontroversial.

The failing test looks like it's because our test data has an "ambiguous" result where there are two equally close triangles that intersect the ray? I will look at it further.

This version now soundly cleans up on all the benchmarks while also using less memory for the tree structure (especially if the geometry was already indexed.) For whatever reason, it speeds up the in-browser example bounds tree creation a lot more than it speeds up the benchmark bounds tree creation -- in the example in Firefox it seems to go from 110 ms to 60 ms, whereas on the benchmark it goes from about 30 ms to 25 ms.

@gkjohnson
Copy link
Owner

This pull request introduces 3 alerts when merging ff1abdf into 96a159b - view on LGTM.com

new alerts:

  • 3 for Unused variable, import, function or class

Comment posted by LGTM.com

@gkjohnson
Copy link
Owner

This pull request introduces 1 alert when merging 305013c into 96a159b - view on LGTM.com

new alerts:

  • 1 for Unused variable, import, function or class

Comment posted by LGTM.com

@mqp mqp force-pushed the reorder-tris branch 2 times, most recently from 008d3de to 1d2e3dd Compare December 14, 2018 09:44
@gkjohnson
Copy link
Owner

I'm gonna go through and see how everything looks but here are the benchmarks I ran for master against this branch so far:

Benchmark Before

Compute Bounds Tree      : 28.12 ms
Default Raycast          : 1.452785 ms
BVH Raycast              : 0.16852 ms
First Hit Raycast        : 0.08895 ms

Total Memory Usage       : 2633.052 kb
Geometry Memory Usage    : 1134.216 kb
Bounds Tree Memory Usage : 1498.836 kb

Benchmark After

Compute Bounds Tree      : 24.84 ms
Default Raycast          : 1.525407 ms
BVH Raycast              : 0.099526 ms
First Hit Raycast        : 0.10233 ms

Index Memory Usage       : 300.184 kb
BVH Memory Usage         : 758.626 kb

The memory improvements look great but I'm a little confused by the discrepancy between BVH Raycast and First Hit Raycast -- "BVH Raycast" seems to consistently be faster than "First Hit", which makes me feel like something is wrong. "First Hit" also seems to have gotten slower and "BVH" significantly faster, which I wouldn't have expected. Are you seeing seeing similar results?


node.tris = tris;
ctx.writeReorderedIndices( offset, count, indices );
node.offset = offset;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably minimal savings but is it necessary to store the offset and count fields for anything other than leaf nodes?

Copy link
Contributor Author

@mqp mqp Dec 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but this branch is making a leaf node, so it seems to me it should have them. (Although as I mentioned above, I'm not a fan of this branch.)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it 👍

I see now that the branch below is the internal node case and does not have these fields added.

this.bounds = computeBounds( geo );

// a list of every available triangle index
this.tris = new Array( getTriangleCount ( geo ) );
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the value of the tris array? I don't see it being changed anywhere so it seems to always hold the current index for that position in the array.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind I'm seeing that partition() updates the tris array

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next related question: Why not just operate directly on the index array rather than creating a new tris array and then writing the new order to the index buffer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need each invocation of createNode to not only refer to the correct vertices, but also the correct precomputed bounding boxes and precomputed centroids. tris is an index into all three of those things at once. If we didn't have tris, each swap operation would have to not only swap three elements of the index (one per vertex), but six elements in the list of triangle bounding boxes and three elements in the list of centroids. I didn't measure it, but I suspected this was likely to be slower and was sure to be uglier than keeping this extra level of indirection.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I see, now. This is more of an index into the index array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's tricky to think about.

lib/MeshBVH.js Outdated
const n = createNode( origTris, ( new THREE.Box3() ).copy( geo.boundingBox ), this );
while ( queue.length ) queue.pop()();
const n = createNode( boundsToArray( geo.boundingBox ), this, 0, ctx.tris.length );
n.index = new THREE.BufferAttribute( indices, 1 );
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just take over the existing index array or generate one if it doesn't exist and use that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the way we construct the new index array is by incrementally filling it up from left to right as we bottom out at leaf nodes. If we modified the old array instead, we might be overwriting indices that we still need to read from for future tree construction (because the old array isn't in sorted-triangle-order -- see the above explanation of how we're using tris to keep track of the sort order instead of modifiying the index directly.)

It's a good point that we wouldn't have to allocate a new index if we swapped the index data instead of using tris.

this.tris = null;
this.children = null;
this.splitAxis = - 1;
// internal nodes have boundingData, children, and splitAxis
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there's a performance trade-off here by not defining all the fields on every node. Here's a benchmark that looks into the performance implications of using differently shaped objects:
https://github.com/davidmarkclements/v8-perf#polymorphic-vs-monomorphic-code

Copy link
Contributor Author

@mqp mqp Dec 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, having an extra type in the inline caches must cost something, but changing this is basically cutting the tree size in half, which is so much that I suspect it must be faster (i.e. due to more nodes fitting in a page.) I didn't carefully measure it, though, because even if it were a little slower I would still want to do it to cut the memory usage.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a little bit of testing and didn't see any significant changes so this seems okay.

If you really wanted to save more memory you could store count and offset in a more optimally sized (UInt16Array, UInt32Array, etc) array based on the length of the index array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. But the long-term approach for making things more efficient from here is to serialize the whole output tree into one big typed array, so this intermediate step isn't super appealing to me.

lib/GeometryUtilities.js Outdated Show resolved Hide resolved
@gkjohnson
Copy link
Owner

I was digging in a bit more into the performance discrepancy between raycastFirst and raycast and it seems that this line may be incorrect:

const leftToRight = ray.direction[ xyzFields[ this.splitAxis ] ] <= 0;

If the "left" side is assumed to be negative and the "right" positive then a positive value should be considered moving left the right. Changing the line to use >= makes raycastFirst faster.

It's a little concerning that the tests pass when in either case here but it makes sense. If the further child is checked first then the closer bounds will always checked next because the surface is closer and that intersection will be returned.

For whatever reason, it speeds up the in-browser example bounds tree creation a lot more than it speeds up the benchmark bounds tree creation

The knot geometry used in the benchmark is different so it's possible that the structure of the geometry causes it to reach max depth or max triangles per leaf nodes faster in a way that doesn't show the speed improvements as much. I wouldn't be against changing the geometry in the benchmark.

@mqp
Copy link
Contributor Author

mqp commented Dec 19, 2018

Wow, thanks for finding that! Talk about defeating the whole purpose of raycastFirst.

@mqp
Copy link
Contributor Author

mqp commented Dec 19, 2018

Now I get benchmark results that make more sense, with regard to the proportion between full raycast and first-hit-only:

Compute Bounds Tree      : 22.94 ms
Default Raycast          : 1.635422 ms
BVH Raycast              : 0.126518 ms
First Hit Raycast        : 0.063944 ms

Index Memory Usage       : 300.184 kb
BVH Memory Usage         : 758.626 kb

(Remember when this benchmark looked like this? 😮)

@mqp
Copy link
Contributor Author

mqp commented Dec 19, 2018

So, now that this is approaching its final form, do you still like the distinction between MeshBVH.createNode and the separate BVHConstructionContext class?

In my judgment, I'm tempted to make BVHConstructionContext a value type that just hangs onto tris, centroids, bounds, and sahplanes (the big things that we will throw out after tree construction), move functions like partition, split, and shrinkBoundsTo back onto MeshBVH, and have them take the context as an argument. I like that structure because I care about clarifying the distinction between runtime-tree-data and construction-time-tree-data, but it seems weird to have createNode be in a different module than all the subroutines it relies on.

@mqp
Copy link
Contributor Author

mqp commented Dec 20, 2018

OK, this branch seems like a Pareto improvement over the existing code now, so I'm happy to merge whenever you are satisfied with it. Once it's merged I might actually put this version in Hubs because I think it's probably fast and memory-efficient enough to be better than nothing for typical medium-sized models. (Of course, it will still be a very appealing improvement to web-workerize it, and to serialize the whole tree. I think the benchmark drastically underestimates the actual size of the tree in memory because it's not taking into account the kind of overhead that the JS interpreter actually has per object, like prototype chains and stuff -- for example, you can check out the Chrome memory profiler's opinion on the example page.)

@gkjohnson
Copy link
Owner

This looks great! The benchmarks have come a long way.

In my judgment, I'm tempted to make BVHConstructionContext a value type that just hangs onto tris

This sounds like a fine change to me. I think it might make things easier to follow, too. It's a little unclear atm when a function belongs on the context and when it belongs on the MeshBVH class. We can re-org in a separate effort, though.

I might actually put this version in Hubs because I think it's probably fast and memory-efficient enough to be better than nothing for typical medium-sized models

I'd be excited to see this. I'd like to get this up on NPM, as well -- there are few things that have to happen before it's ready for that I think, though.

Of course, it will still be a very appealing improvement to web-workerize it, and to serialize the whole tree.

I was thinking about this a bit more and it occurred to me that there might be some complexity in sending the position and index buffers over to the worker. We'd have to either have both indices stored as SharedArrayBuffers or copy an ArrayBuffer to send it over, right? It's workable but the copy is a bit of an annoyance.

I think the benchmark drastically underestimates the actual size of the tree in memory because it's not taking into account the kind of overhead that the JS interpreter actually has per object...

I'm wondering if we can do anything to improve the estimate.

@mqp
Copy link
Contributor Author

mqp commented Dec 20, 2018

I was thinking about this a bit more and it occurred to me that there might be some complexity in sending the position and index buffers over to the worker. We'd have to either have both indices stored as SharedArrayBuffers or copy an ArrayBuffer to send it over, right? It's workable but the copy is a bit of an annoyance.

That may be accurate. The dream would be to put the model loader also into a worker together with the BVH generation, in which case we can just transfer it all over when that worker is done. I see no reason why that shouldn't be possible. Right now, though, our model loading isn't done in a worker because the three.js mechanism to workerize that is super confusing to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants