Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exploit embarrassingly parallel options in c++ code #20

Closed
njtierney opened this issue Nov 30, 2016 · 12 comments
Closed

exploit embarrassingly parallel options in c++ code #20

njtierney opened this issue Nov 30, 2016 · 12 comments

Comments

@njtierney
Copy link
Owner

The following functions have embarrassingly parallel operations for rowwise and colwise

  • distance_matrix_cpp
  • binary_matrix_cpp

Perhaps RcppParallel can do something here

@njtierney
Copy link
Owner Author

check out unrolling, from this blog post: https://privefl.github.io/blog/Tip-Optimize-your-Rcpp-loops/

@njtierney
Copy link
Owner Author

Good guide here for parallel computing: http://gallery.rcpp.org/articles/parallel-distance-matrix/

@njtierney
Copy link
Owner Author

Heya @mpadge, does geodist take advantage of parallelization?

@mpadge
Copy link
Contributor

mpadge commented Jul 5, 2018

Nup, coz it's all in C and dependency free, but what you could do is pilfer code from dodgr, which has loadsa pure RcppParallel, especially here. The OneDist worker is a pretty simple example, the OneFlow worker much more complicated (where threads all aggregate to same object, and so write binary chunks to disk which are later reassembled).

@njtierney
Copy link
Owner Author

njtierney commented Jul 5, 2018

Ah fair enough! Thanks for the links, I'll dig more into this when I get into the 0.4.0 release

@mpadge
Copy link
Contributor

mpadge commented Jun 5, 2019

You could close this, no? Non-parallel geodist is plenty fast enough for any operations you might like; you wouldn't notice any real difference out to so-big-it-don't-fit-in-memory

@njtierney
Copy link
Owner Author

Yeah let's close this, but perhaps it might be useful to have a "wishlist" of things - speed can be one of them? Although to be honest, I'm pretty sure the biggest bottle neck is actually building the design matrix.

@mkuehn10
Copy link

mkuehn10 commented Jul 20, 2019

I see you closed this issue, but I have written implementations that are very similar to your distance_matrix_cpp and nearest_facility_dist that use RcppParallel. My nearest "facility" function only returns the identifier and not the distance, but I could most likely modify it to do that. I also designed it to take another parameter that specifies if you want to look within X miles, so it would ignore any sites that are not within X miles. It's about ~5.8 times faster on my machine using the parallel version, but I'm not well versed in high performance computing, so who knows if it could be improved beyond that.

library(maxcovr)

set.seed(10)
n_users <- 5000
n_sites <- 25
x <- cbind (-10 + 20 * runif (n_users), -10 + 20 * runif (n_users))
y <- cbind (-10 + 20 * runif (2 * n_sites), -10 + 20 * runif (2 * n_sites))
colnames (x) <- colnames (y) <- c ("x", "y")

microbenchmark::microbenchmark(
  maxcovr = maxcovr::distance_matrix_cpp(y, x),
  parallel = rcpp_parallel_distm_C(x, y) * 1609.34)

microbenchmark::microbenchmark(
  maxcovr = maxcovr::nearest_facility_dist(y, x),
  parallel = rcpp_parallel_distm_C_min(x, y, 5000)
)
Unit: milliseconds
     expr     min       lq      mean   median       uq     max neval cld
  maxcovr 49.4812 50.76415 52.076795 51.56360 52.69165 59.2069   100   b
 parallel  8.7636  8.81665  9.133918  8.90215  9.03285 14.4423   100  a 

Unit: milliseconds
     expr     min       lq      mean   median       uq     max neval cld
  maxcovr 50.8327 51.61255 53.182549 52.24655 54.16985 60.6313   100   b
 parallel  8.2759  8.32910  8.504298  8.42095  8.46105 11.6456   100  a 
set.seed(10)
n_users <- 5
n_sites <- 10
x <- cbind (-10 + 20 * runif (n_users), -10 + 20 * runif (n_users))
y <- cbind (-10 + 20 * runif (2 * n_sites), -10 + 20 * runif (2 * n_sites))
colnames (x) <- colnames (y) <- c ("x", "y")

maxcovr::nearest_facility_dist(y, x)
     [,1] [,2]      [,3]
[1,]    1   15 232454.32
[2,]    2   15 227046.58
[3,]    3   15  68395.58
[4,]    4    1 200541.27
[5,]    5   18 365391.14

rcpp_parallel_distm_C_min(x, y, 5000)
[1] 15 15 15  1 18

@mkuehn10
Copy link

Here is a proof of concept with the C++ code included: https://github.com/mkuehn10/pargeodist

@njtierney
Copy link
Owner Author

Wow @mkuehn10 - that looks amazing, thank you so much for writing and sharing that!

It would be great to include in maxcovr, but in the future I'm looking to incorporate geodist to calculate distances, to make that part of the package easier to maintain, plus faster, since there are options such as cheaprule, haversine, and so on.

That said, I feel like a parallel version of geodist would be a very valuable contribution, but I'm not sure if it would be its own package, or if it should be rolled into geodist - that would be up to @mpadge to decide.

But, looking at the benchmarks you have provided, there is about a 4x speedup using parallel, which seems pretty amazing.

Cross ref: hypertidy/geodist#16

@mkuehn10
Copy link

Thank you. I'm surprised I hadn't found these packages until yesterday (in retrospect the whole reason I went down the parallel path is that I couldn't find anything that was fast enough for my use case). I have access to a fairly beefy computer (something like 72 cores and a ton of memory), so it can melt through some fairly large distance calculations. I like having options and it wasn't really a premature optimization in this case -- I was sitting there waiting for like 20-30 minutes (using a completely implemented solution in R) for some calculations to complete and after I implemented the parallel version it was something like 45 seconds.

I'm happy to (try to) make some pull requests, or to just inspire someone else to write better parallel code.

@mkuehn10
Copy link

mkuehn10 commented Jul 23, 2019

Also, regarding the 4x speedup -- that would/should scale up based on whatever system you are using. I see about a ~8-9x difference running it on a much beefier machine and running it through much larger matrices.

This was when n = 15000

Unit: seconds
     expr      min        lq      mean    median        uq       max neval
  geodist 46.40521 46.981085 46.940734 46.990139 47.143812 47.183421     5
 parallel  5.28246  5.381774  5.575951  5.390658  5.825183  5.999681     5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants