improve parallel scalability #36

mmp opened this Issue Oct 4, 2015 · 2 comments


None yet

2 participants

mmp commented Oct 4, 2015

I've done some benchmarks of a parallel scalability on a 16 core machine, using 1, 2, 4, 8, and 16 threads (for all of the integrators besides Whitted and direct lighting). This spreadsheet summarizes the results (for the moderately complex "breakfast" scene, to come in the pbrt-v3 scenes distribution). For the benchmarks, I modified Film:WriteImage() to return immediately, so that the time measured in the Render() methods didn't include that time.

Interestingly enough scalability for everything but SPPM is very nearly the same: a not impressive 1.9x with two cores, up to ~12.7x with 16 cores. SPPM is broken into the camera pass, photon pass, and statistics update pass; there, scalability is slightly worse. (Though for SPPM there is some construction of and updating of shared data structures, so it's reasonable that it's a bit worse...)

wjakob commented Oct 5, 2015

Hi Matt,

why did you write "not impressive" for path tracing etc.? I think 1.9 is pretty decent given other factors like memory traffic, job distribution etc..

The issue with SPPM resembles my own scalability benchmarks with Mitsuba. A better way to parallelize things would be to use the Knaus & Zwicker-style iterations where each photon map pass uses a globally uniform photon radius. This removes some interdependences so that each thread can do its own photon map pass. Maybe something for v4? (with things going like they are now, parallelism should be an even bigger deal a few years down the road)


mmp commented Dec 8, 2015

Commit 74cab9e, which I'd hoped would help a bit with this, barely moved the needle. SPPM grid construction is now 1.98x faster with 16 cores than 1 core, which is an improvement from the 1.40x before, but there is still a ways to go!

@wjakob I dunno, maybe this is reasonable, but I feel like it could be better. The data is almost entirely read-only, there's a lot of compute and a lot of independent jobs, etc. (Or, coming at it from a different way, none has yet carefully ran it through a profiler to look for unexpected false sharing or other things that can meaningfully degrade scalability, so it's likely there are other unknown issues that could be tuned up...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment