exploit embarrassingly parallel options in c++ code #20

njtierney · 2016-11-30T01:45:33Z

The following functions have embarrassingly parallel operations for rowwise and colwise

distance_matrix_cpp
binary_matrix_cpp

Perhaps RcppParallel can do something here

njtierney · 2016-12-31T00:38:35Z

check out unrolling, from this blog post: https://privefl.github.io/blog/Tip-Optimize-your-Rcpp-loops/

njtierney · 2017-01-22T04:00:54Z

Good guide here for parallel computing: http://gallery.rcpp.org/articles/parallel-distance-matrix/

njtierney · 2018-07-05T03:25:52Z

Heya @mpadge, does geodist take advantage of parallelization?

mpadge · 2018-07-05T08:02:14Z

Nup, coz it's all in C and dependency free, but what you could do is pilfer code from dodgr, which has loadsa pure RcppParallel, especially here. The OneDist worker is a pretty simple example, the OneFlow worker much more complicated (where threads all aggregate to same object, and so write binary chunks to disk which are later reassembled).

njtierney · 2018-07-05T13:49:03Z

Ah fair enough! Thanks for the links, I'll dig more into this when I get into the 0.4.0 release

mpadge · 2019-06-05T11:50:27Z

You could close this, no? Non-parallel geodist is plenty fast enough for any operations you might like; you wouldn't notice any real difference out to so-big-it-don't-fit-in-memory

njtierney · 2019-06-05T12:20:44Z

Yeah let's close this, but perhaps it might be useful to have a "wishlist" of things - speed can be one of them? Although to be honest, I'm pretty sure the biggest bottle neck is actually building the design matrix.

mkuehn10 · 2019-07-20T15:50:48Z

I see you closed this issue, but I have written implementations that are very similar to your distance_matrix_cpp and nearest_facility_dist that use RcppParallel. My nearest "facility" function only returns the identifier and not the distance, but I could most likely modify it to do that. I also designed it to take another parameter that specifies if you want to look within X miles, so it would ignore any sites that are not within X miles. It's about ~5.8 times faster on my machine using the parallel version, but I'm not well versed in high performance computing, so who knows if it could be improved beyond that.

library(maxcovr)

set.seed(10)
n_users <- 5000
n_sites <- 25
x <- cbind (-10 + 20 * runif (n_users), -10 + 20 * runif (n_users))
y <- cbind (-10 + 20 * runif (2 * n_sites), -10 + 20 * runif (2 * n_sites))
colnames (x) <- colnames (y) <- c ("x", "y")

microbenchmark::microbenchmark(
  maxcovr = maxcovr::distance_matrix_cpp(y, x),
  parallel = rcpp_parallel_distm_C(x, y) * 1609.34)

microbenchmark::microbenchmark(
  maxcovr = maxcovr::nearest_facility_dist(y, x),
  parallel = rcpp_parallel_distm_C_min(x, y, 5000)
)

Unit: milliseconds
     expr     min       lq      mean   median       uq     max neval cld
  maxcovr 49.4812 50.76415 52.076795 51.56360 52.69165 59.2069   100   b
 parallel  8.7636  8.81665  9.133918  8.90215  9.03285 14.4423   100  a 

Unit: milliseconds
     expr     min       lq      mean   median       uq     max neval cld
  maxcovr 50.8327 51.61255 53.182549 52.24655 54.16985 60.6313   100   b
 parallel  8.2759  8.32910  8.504298  8.42095  8.46105 11.6456   100  a

set.seed(10)
n_users <- 5
n_sites <- 10
x <- cbind (-10 + 20 * runif (n_users), -10 + 20 * runif (n_users))
y <- cbind (-10 + 20 * runif (2 * n_sites), -10 + 20 * runif (2 * n_sites))
colnames (x) <- colnames (y) <- c ("x", "y")

maxcovr::nearest_facility_dist(y, x)
     [,1] [,2]      [,3]
[1,]    1   15 232454.32
[2,]    2   15 227046.58
[3,]    3   15  68395.58
[4,]    4    1 200541.27
[5,]    5   18 365391.14

rcpp_parallel_distm_C_min(x, y, 5000)
[1] 15 15 15  1 18

mkuehn10 · 2019-07-20T17:54:18Z

Here is a proof of concept with the C++ code included: https://github.com/mkuehn10/pargeodist

njtierney · 2019-07-21T06:34:35Z

Wow @mkuehn10 - that looks amazing, thank you so much for writing and sharing that!

It would be great to include in maxcovr, but in the future I'm looking to incorporate geodist to calculate distances, to make that part of the package easier to maintain, plus faster, since there are options such as cheaprule, haversine, and so on.

That said, I feel like a parallel version of geodist would be a very valuable contribution, but I'm not sure if it would be its own package, or if it should be rolled into geodist - that would be up to @mpadge to decide.

But, looking at the benchmarks you have provided, there is about a 4x speedup using parallel, which seems pretty amazing.

Cross ref: hypertidy/geodist#16

mkuehn10 · 2019-07-21T13:41:40Z

Thank you. I'm surprised I hadn't found these packages until yesterday (in retrospect the whole reason I went down the parallel path is that I couldn't find anything that was fast enough for my use case). I have access to a fairly beefy computer (something like 72 cores and a ton of memory), so it can melt through some fairly large distance calculations. I like having options and it wasn't really a premature optimization in this case -- I was sitting there waiting for like 20-30 minutes (using a completely implemented solution in R) for some calculations to complete and after I implemented the parallel version it was something like 45 seconds.

I'm happy to (try to) make some pull requests, or to just inspire someone else to write better parallel code.

mkuehn10 · 2019-07-23T13:26:46Z

Also, regarding the 4x speedup -- that would/should scale up based on whatever system you are using. I see about a ~8-9x difference running it on a much beefier machine and running it through much larger matrices.

This was when n = 15000

Unit: seconds
     expr      min        lq      mean    median        uq       max neval
  geodist 46.40521 46.981085 46.940734 46.990139 47.143812 47.183421     5
 parallel  5.28246  5.381774  5.575951  5.390658  5.825183  5.999681     5

njtierney added internals distance calculation labels Jul 5, 2018

njtierney added this to the 0.2.0 milestone Jul 5, 2018

mpadge mentioned this issue Jul 5, 2018

"Unroll" some of the C loops? hypertidy/geodist#15

Open

njtierney closed this as completed Jun 5, 2019

njtierney mentioned this issue Jul 21, 2019

Start Rcpp branch hypertidy/geodist#16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exploit embarrassingly parallel options in c++ code #20

exploit embarrassingly parallel options in c++ code #20

njtierney commented Nov 30, 2016

njtierney commented Dec 31, 2016

njtierney commented Jan 22, 2017

njtierney commented Jul 5, 2018

mpadge commented Jul 5, 2018

njtierney commented Jul 5, 2018 •

edited

mpadge commented Jun 5, 2019

njtierney commented Jun 5, 2019

mkuehn10 commented Jul 20, 2019 •

edited

mkuehn10 commented Jul 20, 2019

njtierney commented Jul 21, 2019

mkuehn10 commented Jul 21, 2019

mkuehn10 commented Jul 23, 2019 •

edited

exploit embarrassingly parallel options in c++ code #20

exploit embarrassingly parallel options in c++ code #20

Comments

njtierney commented Nov 30, 2016

njtierney commented Dec 31, 2016

njtierney commented Jan 22, 2017

njtierney commented Jul 5, 2018

mpadge commented Jul 5, 2018

njtierney commented Jul 5, 2018 • edited

mpadge commented Jun 5, 2019

njtierney commented Jun 5, 2019

mkuehn10 commented Jul 20, 2019 • edited

mkuehn10 commented Jul 20, 2019

njtierney commented Jul 21, 2019

mkuehn10 commented Jul 21, 2019

mkuehn10 commented Jul 23, 2019 • edited

njtierney commented Jul 5, 2018 •

edited

mkuehn10 commented Jul 20, 2019 •

edited

mkuehn10 commented Jul 23, 2019 •

edited