Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison with fcpw #26

Closed
Janos95 opened this issue Jan 4, 2021 · 4 comments
Closed

Comparison with fcpw #26

Janos95 opened this issue Jan 4, 2021 · 4 comments

Comments

@Janos95
Copy link

Janos95 commented Jan 4, 2021

Hey,

first, great work with this lib, really like using it!

I think it would be great to have a benchmark comparing with fcpw,
which claims to be a very fast BVH implementation. It claims to achieve very fast query times,
by (presumably) vectorizing the slab test with enoki. My feeling is that the tree quality is much more
important, since queries should be latency bound.

@cignox1
Copy link

cignox1 commented Jan 4, 2021

Hello, great library!
I just got back to my old raytracing project (abandoned 10 years ago) and decided that it would have been nice to work on the performances, horrible at best. I've started replacing my quick BIH implementation with a BVH but decided to switch to an external lib and found this one. I could make it work, but performances were exactly the same as with my BIH:

bvh::LocallyOrderedClusteringBuilder<bvh::Bvh, int> builder(bvh);
builder.build(bvh::BoundingBox(MasterBVHPrimitive::toVector3(worldbb.min), MasterBVHPrimitive::toVector3(worldbb.max)), bboxes, centers, shapes.size());
bvh::LeafCollapser<bvh::Bvh> collapser(bvh);
collapser.collapse();

I've started profiling the execution and found that 35% of the time was being spent inside std::fill, and I could track it back to
struct Vector {
Scalar values[N];

Vector() = default;
bvh__always_inline__ Vector(Scalar s) { std::fill(values, values + N, s); }

As I'm working with 3-dimensional vectors, I then replaced the call to fill with:
values[0] = values[1] = values[2] = s;

My test scene rendering time went from 8 seconds to 3!

Am I using the lib the wrong way? Is there an option that would disable/improve that initialization? Would my change break something somewhere?

Thank you!

@madmann91
Copy link
Owner

Hello! @Janos95 I'll look into this. The library you link seems to be focused on closest point search, so chances are it is going to be slower than this library for ray-tracing. The numbers they give on their page at least seems to confirm that. @cignox1 Please create a separate issue for this, so as not to pollute this one. I suspect that you forgot to compile with optimizations on, or that your compiler is too old and effectively terrible at optimizing very simple code (see this for an example of how a decent compiler optimizes std::fill). Consider switching to gcc (Mingw64 if you're on Windows) or clang. If you use CMake, set CMAKE_BUILD_TYPE to Release.

@madmann91
Copy link
Owner

madmann91 commented Jan 4, 2021

So, having a quick look at fcpw, plugging it into my path tracer, I get the following trace:

Scene loaded in 1380 ms (651259 vertices, 781184 triangles).
BVH constructed in 623 ms (0 nodes).
Average frame time: 85 ms.
Average frame time: 86 ms.
Average frame time: 87 ms.
Average frame time: 88 ms.
Average frame time: 88 ms.

For reference, the combination bvh::LocallyOrderedClustering + bvh::LeafCollapser of this library gives:

Scene loaded in 1355 ms (651259 vertices, 781184 triangles).
BVH constructed in 174 ms (889959 nodes).
Average frame time: 59 ms.
Average frame time: 58 ms.
Average frame time: 59 ms.
Average frame time: 59 ms.
Average frame time: 59 ms.

Embree (avx2) gives:

Scene loaded in 1405 ms (651259 vertices, 781184 triangles).
BVH constructed in 68 ms (0 nodes).
Average frame time: 42 ms.
Average frame time: 41 ms.
Average frame time: 41 ms.
Average frame time: 42 ms.
Average frame time: 42 ms.

This is of course with optimizations on and machine-specific flags (-O3 -march=native with gcc). As you can see, this library is faster. The claim that fcpw is "only" 0.8x slower than Embree seems to also be incorrect, in this scenario at least. Maybe they only benchmarked coherent rays, or maybe something was wrong in their configuration. In a realistic usage scenario like running a path tracer to generate the picture of the front page, fcpw is in fact twice as slow as Embree for ray intersections, and 10 times as slow for data structure construction.

Because this library can also generate higher-quality BVHs at a slight cost in build times, here's the result of running bvh::ParallelReinsertionOptimization in between bvh::LocallyOrderedClustering and bvh::LeafCollapser:

Scene loaded in 1354 ms (651259 vertices, 781184 triangles).
BVH constructed in 523 ms (739621 nodes).
Average frame time: 54 ms.
Average frame time: 54 ms.
Average frame time: 54 ms.
Average frame time: 54 ms.
Average frame time: 54 ms.

So as a conclusion, this library is faster than fcpw for ray intersection queries and build times. For single-ray traversal, vectorization does not give you so much (it's essentially the difference between this library and Embree: around 30%). Better BVH layouts and careful traversal loop design are more important. I'm not planning to add this library to the chart. In general, and as a rule for future benchmark submissions, I'll only consider benchmarking a library if:

  • I have the time (which is getting more and more difficult),
  • The library to compare with is within 30% of Embree,
  • The library to compare with generates correct results.

Edit: Please ignore the 0 nodes for Embree and fcpw. I just changed that number to 0 for those libraries since they do not expose this information.

@Janos95
Copy link
Author

Janos95 commented Jan 4, 2021

Thanks for the super fast response time and detailed benchmark + analysis.
I totally agree that allocating your free time to this project is totally up to you.
Personally, I think you should factor in the popularity of a library for making
the decision to add a benchmark to the plot or not (which was probably the reason
why you have Fast-BVH in there and a good argument against fcpw, at least for now ;)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants