-
-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use structs/pairs instead of NumericVectors #23
Comments
Hadley, thanks for the tip! Any speed improvement is very much appreciated, and this was an easy one. I see about 2.5X speed improvement with C++ data structures (Point, Box) in place of R data structures (NumericVector, NumericMatrix). For comparison, I also tested
I'm using microbenchmark with this code: library(microbenchmark)
library(ggrepel)
microbenchmark({
set.seed(42)
d <- mtcars
d$name <- rownames(mtcars)
d <- as.data.frame(rbind(d, d))
d$wt[33:64] <- d$wt[33:64] + 1
d$mpg[33:64] <- d$mpg[33:64] + 5
p <- ggplot(d, aes(wt, mpg, label = name)) +
geom_point(color = 'red') +
geom_text_repel() +
theme_classic(base_size = 16)
print(p)
}, times = 10L) I would consider this to be a very large dataset for ggrepel at |
Instead of using R data structures like: NumericVector NumericMatrix use C++ structures like: typedef struct { double x, y; } Point; typedef struct { double x1, y1, x2, y2; } Box; The code runs about 2.5X faster on my laptop with this optimization. This commit addresses issue #23
I think the easiest way to avoid the Another thing you could do to make life a little easier would be to parameterise by iteration time (i.e. number of seconds) rather than number of iterations. |
Thanks for the tip about using a grid. I think that is indeed the simplest approach to achieve faster neighbour comparison. Since I expect a small number of labels, I'm skeptical if the performance gain will be worth the implementation complexity. In the future, this may become more important for a I tried using
However, I'm not seeing any improvement on top of that when I avoid the I'd be very happy to review pull requests that improve performance! If anyone reading this cares to give it a shot, that would be delightful. |
That said, ordering by x might be just as simple, and would probably be faster. I'm likely to come back to this in a couple of months - I'm going to be working on gggeom which will provide the data structures and efficient C++ code that will power ggvis. |
I suspect you could get a considerable performance improvement by switching from (e.g)
to
Note that I you might also be able to drop the square root, by instead squaring the distance that you're comparing with. And it would definitely be more efficient than later on re-squaring, like in
repel_force()
.Generally you'll get better performance with C++ data structures than with R ones.
The text was updated successfully, but these errors were encountered: