-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast-BVH Updates #1
Comments
Will have a look, thank you for the PR. |
I cannot get Fast-BVH to render the correct picture. See attached renderings. The correct one is from this library, the other one is from Fast-BVH. This might be due to backface culling on your end, but I'm not really sure. It could also come from an incorrect BVH or an incorrect traversal routine. Before we go any further, please make sure to get the same picture with the same scene (I suggest https://github.com/jimmiebergmann/Sponza --- this is not the version I used in the pictures below or in the table on the front page but I cannot find the sources of that anymore, as I'm not on my work machine ATM, and that will last until May in the best case scenario). The camera settings are exactly the same as in the modified benchmark in the |
Thanks for the response! I'll fix this in a short bit |
Fixed! It was a problem with the intersection code. I have a benchmark program at https://github.com/tay10r/Fast-BVH-benchmark. The intersection code in the original repository needs a PR to work, so you can use my benchmark program or my fork of the project until it gets merged to upstream. Here's a picture of the result. If this ends up being a mistake on my part and your code still performs faster, I may end up just porting your code to Fast-BVH. Since PR 15, Fast-BVH supports multiple build algorithms. I was already planning on adding at least one other build algorithm, so yours could be added as well if it works better in some scenarios. |
So, here are the numbers for 8000x8000, updated for the scene I mentioned in the post above, same POV:
I suspect that you were running the benchmark tool without the If you plan on working on Fast-BVH to improve its performance, allow me to give you a bit of advice:
Regarding the traversal implementation, you can also get more performance by dropping |
I did use the Here's a screenshot, showing the compiler flags as well. It could be that I only have a 4 core CPU, and that the benchmark on the test is 8 core. Regarding your notes about Fast-BVH. Currently, work is being done to implement LBVH as well as a LDC BVHs as described here both with and without Morton coding (I suspect without Morton codes, the BVH quality might be higher though the build time would be much slower.)
Fast-BVH is a library meant for more than one application. I believe having multiple algorithms to choose from would be best. Full sweep SAH may be too expensive for real-time rendering. I plan on adding the algorithm that best suites real-time rending for my own needs, and anyone requiring higher-quality BVHs for static scenes can implement their own. |
Well, this must then mean that your construction algorithm does not scale well this higher core counts. A simple LBVH should give you the same performance as splitting in the middle, with faster build times when properly implemented, and more importantly, much better scaling due to its bottom-up nature. As an alternative, there are also algorithms that are adaptive, i.e, that can generate BVHs of varying quality given a control parameter. This allows you to keep using the same construction algorithm for real-time vs offline rendering. For instance, there's Parallel BVH Construction using Progressive Hierarchical Refinement. I think PLOC has a control parameter as well, but increasing it was not guaranteed to increase performance reliably across the tested scenes in the paper if I recall properly. |
Also note that your current construction algorithm only considers the largest axis. This is not the case in this library, because it has a significant impact on traversal performance. If you disable that, you should observe similar times, even on your machine, because binning construction is essentially similar to a top-down middle-split strategy, algorithm-wise, it's just a linear partitioning step at every level of the recursion. |
Not my construction algorithm, but you're right it only uses one core.
Interesting, I haven't seen that before. I don't plan on doing much more work on Fast-BVH once I get have a BVH algorithm that suites my needs, so I may not get around to implementing a BVH like the one you're mentioning. I'll keep it in mind if I ever end up needing it though. At this point, I think the "issue" has been resolved. The discrepancy seems to be due to a difference in threading. |
Alright. Closing the issue then. |
Also, thank you for the advice. I'll have a look at your lecture and try to keep your points in mind going forward. |
You're welcome. Let me know if you want to compare the two again, by re-opening the issue for instance. I'll be interested to have a look at what you come up with! |
I'm still making comparisons between an LBVH build algorithm I made and your library.
The BVH build time of the LBVH is at 25 ms, but that's without any post build optimizations. For some reason, the traversal time is around four seconds for the same resolution. I don't think it's the LBVH build quality, but I'm still debugging it. |
Please let me know when it's ready. Also note that I have changed the camera code a little to have a horizontal FOV, set the proper screen limits |
I can't seem to fix the issue. In a branch of Fast-BVH, which never got merged, the LBVH algorithm had a traversal performance of about 1.1 seconds for the Sponza scene. Now it's at about 3.5 seconds and the build time is around 35ms. Feel free to look at it and critique. It doesn't really have comparable performance to this library unfortunately. I took the advice you gave on ray octant ordering and removing triangle indirection. I couldn't seem to find the paper that talks about it (Garanzha and. Loop 2010?) so I just referenced your implementation. |
So, I just had a quick look with
As you can see, the code itself is pretty lean. Very few spills, and a lot of computation (a rule of thumb is that good code on x86/x86_64 should have more add/sub/mul/... than movs, in general). What I got of all this is that the culprit is your BVH itself. A very easy way to determine that a BVH is awful (and I mean awful as a result of a bug, not just bad or of poor quality) is to check that the maximum number of primitives per leaf is below a small constant (e.g. 16). If you have a leaf with 100 primitives, not matter how good your traversal algorithm is, it will perform poorly. One possible explanation for this is if your Morton codes are only 32-bit and there is not enough precision (an LBVH with this bit width is the equivalent of building a 1024x1024x1024 grid). In this case, just switch to 64-bit Morton codes. Edit: Another cool trick for BVH construction debugging (also holds for other data structures), is to display the number of traversal steps + intersections per ray instead of coloring by the normal. That will help you spot the problem visually. |
@madmann91 That is extremely helpful. I was just about to start profiling the code but I've never been able to get an instruction breakdown as you did. I have it setup so that 64-bit Morton codes can be used with 64-bit floating point types, but I haven't checked to see if this pairing is useful to the compiler. Maybe I'll end up switching to 64-bit codes for all types. I'll give the trick you mentioned a shot. That sounds useful! It does seem like two primitives per leaf is a bit low (no pun intended.) I'm going to see if there's a way to change this in the Thanks for your feedback! Extremely helpful |
I have implemented a simple mechanism to collect traversal statistics in the benchmarking tool. Use |
That's awesome! I can't wait to give this a shot |
I ran your benchmark using a sponza model and compared it to the latest Fast-BVH after I PR I made there. I think results are a bit different at this point. Fast-BVH is now a template library and built the sponza model I had in 70 ms. In your library, it built the same model in 133 ms. Maybe you could verify the results. In my patch to Fast-BVH, I also included an .obj file renderer.
The text was updated successfully, but these errors were encountered: