-
-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Working version using swaps #44
Conversation
This pull request introduces 2 alerts when merging b47fcfd into 96a159b - view on LGTM.com new alerts:
Comment posted by LGTM.com |
This pull request introduces 2 alerts when merging e8a7622 into 96a159b - view on LGTM.com new alerts:
Comment posted by LGTM.com |
Hey @mquander! This is great. I know it's still a work in progress but I'll go through the code in a bit and add a few comments. To answer a few of your questions:
I think this is okay for now -- enabling the re-ordering of the geometry indices directly in the long run should is a good plan.
Separating the build logic out into a separate file is a good change, in my opionion. The tree constructor was getting pretty unwieldly. I do think I find keeping the construction operations as functional as possible to be a bit more traceable, especially if there's no internal caching going on.
I'm comfortable removing support for regular THREE Geometry. There's been talk of deprecating
I would think generating a triangle index if one doesn't exist rather than re-ordering the attributes is an okay plan, for now. That sounds like a lot of complexity that would be great to avoid. |
lib/MeshBVH.js
Outdated
|
||
// create the two new child nodes | ||
if ( ! left.length || ! right.length ) { | ||
if ( index === offset || index === offset + count ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically the case where all of the triangles wind up on one side of the partition, right? Shouldn't it be possible to know this ahead of time and avoid partitioning? Granted I'm not sure if there's a lot to gain from checking that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it feels like there is some duplication of concerns between this and the splitting code (which also can choose to return no-split-required.) Perhaps I will try to clean it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On reflection, I think this is actually not just awkward, but wrong. Consider the AVERAGE and CENTER strategies. If it happens to be that all of the triangle centroids wind up on one side of the average of the longest edge -- which seems strange but possible given that the longest edge computation is using the full bounding box, but partitioning is using the centroid -- this code will terminate tree construction, even though maybe if you did a split again you would continue to make useful splits. So something should be different here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be a problem in the AVERAGE split? In that case the average of all centroids are taken which would have to mean that some of the centroids are left on either side of the split. In the case of CENTER it would seem that this is just an artifact of choosing a less smart split strategy.
Ultimately I think this is okay.
This is getting closer to something I endorse. I made some other small fixes that were easy and hopefully uncontroversial. The failing test looks like it's because our test data has an "ambiguous" result where there are two equally close triangles that intersect the ray? I will look at it further. This version now soundly cleans up on all the benchmarks while also using less memory for the tree structure (especially if the geometry was already indexed.) For whatever reason, it speeds up the in-browser example bounds tree creation a lot more than it speeds up the benchmark bounds tree creation -- in the example in Firefox it seems to go from 110 ms to 60 ms, whereas on the benchmark it goes from about 30 ms to 25 ms. |
This pull request introduces 3 alerts when merging ff1abdf into 96a159b - view on LGTM.com new alerts:
Comment posted by LGTM.com |
This pull request introduces 1 alert when merging 305013c into 96a159b - view on LGTM.com new alerts:
Comment posted by LGTM.com |
008d3de
to
1d2e3dd
Compare
I'm gonna go through and see how everything looks but here are the benchmarks I ran for master against this branch so far: Benchmark Before
Benchmark After
The memory improvements look great but I'm a little confused by the discrepancy between |
|
||
node.tris = tris; | ||
ctx.writeReorderedIndices( offset, count, indices ); | ||
node.offset = offset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably minimal savings but is it necessary to store the offset and count fields for anything other than leaf nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but this branch is making a leaf node, so it seems to me it should have them. (Although as I mentioned above, I'm not a fan of this branch.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it 👍
I see now that the branch below is the internal node case and does not have these fields added.
lib/BVHConstructionContext.js
Outdated
this.bounds = computeBounds( geo ); | ||
|
||
// a list of every available triangle index | ||
this.tris = new Array( getTriangleCount ( geo ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the value of the tris
array? I don't see it being changed anywhere so it seems to always hold the current index for that position in the array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind I'm seeing that partition()
updates the tris
array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next related question: Why not just operate directly on the index
array rather than creating a new tris
array and then writing the new order to the index buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need each invocation of createNode
to not only refer to the correct vertices, but also the correct precomputed bounding boxes and precomputed centroids. tris
is an index into all three of those things at once. If we didn't have tris
, each swap operation would have to not only swap three elements of the index (one per vertex), but six elements in the list of triangle bounding boxes and three elements in the list of centroids. I didn't measure it, but I suspected this was likely to be slower and was sure to be uglier than keeping this extra level of indirection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I see, now. This is more of an index into the index array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's tricky to think about.
lib/MeshBVH.js
Outdated
const n = createNode( origTris, ( new THREE.Box3() ).copy( geo.boundingBox ), this ); | ||
while ( queue.length ) queue.pop()(); | ||
const n = createNode( boundsToArray( geo.boundingBox ), this, 0, ctx.tris.length ); | ||
n.index = new THREE.BufferAttribute( indices, 1 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just take over the existing index array or generate one if it doesn't exist and use that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the way we construct the new index array is by incrementally filling it up from left to right as we bottom out at leaf nodes. If we modified the old array instead, we might be overwriting indices that we still need to read from for future tree construction (because the old array isn't in sorted-triangle-order -- see the above explanation of how we're using tris
to keep track of the sort order instead of modifiying the index directly.)
It's a good point that we wouldn't have to allocate a new index if we swapped the index data instead of using tris
.
this.tris = null; | ||
this.children = null; | ||
this.splitAxis = - 1; | ||
// internal nodes have boundingData, children, and splitAxis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if there's a performance trade-off here by not defining all the fields on every node. Here's a benchmark that looks into the performance implications of using differently shaped objects:
https://github.com/davidmarkclements/v8-perf#polymorphic-vs-monomorphic-code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, having an extra type in the inline caches must cost something, but changing this is basically cutting the tree size in half, which is so much that I suspect it must be faster (i.e. due to more nodes fitting in a page.) I didn't carefully measure it, though, because even if it were a little slower I would still want to do it to cut the memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a little bit of testing and didn't see any significant changes so this seems okay.
If you really wanted to save more memory you could store count
and offset
in a more optimally sized (UInt16Array
, UInt32Array
, etc) array based on the length of the index array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. But the long-term approach for making things more efficient from here is to serialize the whole output tree into one big typed array, so this intermediate step isn't super appealing to me.
I was digging in a bit more into the performance discrepancy between const leftToRight = ray.direction[ xyzFields[ this.splitAxis ] ] <= 0; If the "left" side is assumed to be negative and the "right" positive then a positive value should be considered moving left the right. Changing the line to use It's a little concerning that the tests pass when in either case here but it makes sense. If the further child is checked first then the closer bounds will always checked next because the surface is closer and that intersection will be returned.
The knot geometry used in the benchmark is different so it's possible that the structure of the geometry causes it to reach max depth or max triangles per leaf nodes faster in a way that doesn't show the speed improvements as much. I wouldn't be against changing the geometry in the benchmark. |
Wow, thanks for finding that! Talk about defeating the whole purpose of |
Now I get benchmark results that make more sense, with regard to the proportion between full raycast and first-hit-only:
(Remember when this benchmark looked like this? 😮) |
So, now that this is approaching its final form, do you still like the distinction between In my judgment, I'm tempted to make |
OK, this branch seems like a Pareto improvement over the existing code now, so I'm happy to merge whenever you are satisfied with it. Once it's merged I might actually put this version in Hubs because I think it's probably fast and memory-efficient enough to be better than nothing for typical medium-sized models. (Of course, it will still be a very appealing improvement to web-workerize it, and to serialize the whole tree. I think the benchmark drastically underestimates the actual size of the tree in memory because it's not taking into account the kind of overhead that the JS interpreter actually has per object, like prototype chains and stuff -- for example, you can check out the Chrome memory profiler's opinion on the example page.) |
This looks great! The benchmarks have come a long way.
This sounds like a fine change to me. I think it might make things easier to follow, too. It's a little unclear atm when a function belongs on the context and when it belongs on the MeshBVH class. We can re-org in a separate effort, though.
I'd be excited to see this. I'd like to get this up on NPM, as well -- there are few things that have to happen before it's ready for that I think, though.
I was thinking about this a bit more and it occurred to me that there might be some complexity in sending the position and index buffers over to the worker. We'd have to either have both indices stored as
I'm wondering if we can do anything to improve the estimate. |
That may be accurate. The dream would be to put the model loader also into a worker together with the BVH generation, in which case we can just transfer it all over when that worker is done. I see no reason why that shouldn't be possible. Right now, though, our model loading isn't done in a worker because the three.js mechanism to workerize that is super confusing to me. |
I worked toward making an implementation of reordering the list of tris, as suggested in #42 and as we discussed earlier.
The code as written does this by leaving the original geometry object alone, and reordering and keeping a reference to the
origTris
(nowBVHConstructionContext.tris
) array. This seems slightly faster during construction, but actually results in using more memory in the resulting tree in this implementation -- it's referring to this one big array in each node, plus an offset and count, as opposed to the old one, in which each node referred to a small slice of the big array. So really this is just a start.The obvious next step is to reorder the actual geometry object per the reordered
tris
array, so that we don't have to keep this "array of pointers" representation around next to the final tree. That will really be a nice improvement over the old version.Other changes in this implementation:
I went back to recursing directly in
createNode
instead of using a queue of nodes. I like doing this because it means we're doing the tree construction left-to-right, depth-first, making it more comprehensible to e.g. reorder geometry incrementally when we have an edge node to construct, because we'll never have to look at the triangles in that edge node again later in tree construction. I'm not concerned about a stack overflow because we control the maximum tree depth anyway. It's also nice not to construct garbage closures in eachcreateNode
invocation.I split some
MeshBVH
construction functionality into aBVHConstructionContext
object, and moved some of theGeometryUtilities
stuff which required precomputed pieces of construction data (e.g. centroids, bounds) into class methods on that. I'm not sure whether I can really say this is better, it just made it easier for me to keep everything organized when I was making this change. I'm open to changing it back or further reorganizing.Open question: How serious are we about supporting non-
BufferGeometry
geometries? It will make this reordering idea rather more complicated if we have to care about multiple geometry representations. I personally would not bother supporting them, at least as an output. (In the case of outputting an indexedBufferGeometry
, we have it especially easy, because we can just change the index and leave all the attributes alone; otherwise, we need to reorder all the attribute data as well.)