-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark Against Rcpp #11
Comments
I'm not sure that's very useful. The results will depend hugely on what exactly you do. In particular, are you calling back from the compiled language to R or not? In any case, extendr is following the design principle of safety and convenience over performance, so in many applications Rcpp code may be faster, in particular if you're going back-and-forth frequently between R and Rust. On the other hand, for numerically intensive code that does many operations in compiled code, the results are comparable. (In my example below, Rust is slightly faster, but it's basically a wash.)
Created on 2021-01-06 by the reprex package (v0.3.0) |
However, when calling the function many times but with a small computational workload, Rust is almost twice as fast as Rcpp. I hadn't quite expected this. @andy-thomason Looks like starting a new thread for each Rust function is not a big deal.
|
This is good news. It all depends on the target architecture and how vectorisable the loop is. Godbolt gives:
Which is not vectorised but has at least folded the condition into a Most of the problem is the running product which, by IEEE rules, prevents the re-ordering Certainly, LLVM's code generation is streets ahead of GCC's. I was planning to add fast summation that allows loops to vectorise to RobjItertools. |
This is a slightly better version of the loop:
It allows the loop to unroll as the trip count is more |
I had tried the for loop but on my machine it was slower than the C++ code. :-) |
It's been a long while since this post was updated (sorry to necropost) but I'd like to share some findings on the overhead of using Rcpp -vs- Rust for returning a basic zero filled vector and matrix from both... This gist of an R script compiles the Rust and Rcpp versions as similarly as I could make them and iterates N from 10 to 640 for length on the vectors and each dim 10x10 to 640x640 per matrix. For example, the simple vector... IntegerVector rcpp_zeros_intvec(int n) {
IntegerVector my_vec(n);
return my_vec;
} fn rust_zeros_intvec(n: i32) -> Robj {
let my_vec = vec!(0; n as usize);
r!(my_vec)
} It shows how there is a significant negative impact from the extendr handling...
but it doesn't take long for the impact to be very noticable
If that's from copies or from handling the owners or whatever I dont have the knowledge to know and so the guys in discord recommended for it to be posted here.
|
Which version of extendr is this using? The one currently released to crates.io is completely out of date. Unfortunately the next one is not quite ready yet for release. In any case, it would be useful to try again with the latest version and using the new wrappers that have been written. In particular in the vector example, I'm quite certain that the rust code makes an additional copy that the C++ code doesn't make, and I think this copy can be avoided with the development version of extendr. Matrix code probably needs more work, but it's low priority until the basic features are settled. |
The latest via github I can see where an additional copy might need to be made considering the 'safety' aspect of ownership but would one expect a copy to incur that much expense? Granted Rcpp is much more mature and has had many iterations of performance improvements over time. I don't have the rust experience to grok what I read without a glossary yet so all I can offer is the results of the exceedingly simple test. |
Ok. In this case, for the numeric vector, you should try something like the following: fn rust_zeros_intvec(n: i32) -> Robj {
let my_vec : Doubles = (0..n).map(|_| (0.).into()).collect();
my_vec
} I haven't tested this, so there may be some problem with this code. Example from here: https://github.com/extendr/extendr/blob/1264ad5ae3c491247878470aceed584ba9a87844/extendr-api/src/wrapper/doubles.rs#L40 |
Sorry, that's making a vector of doubles, not integers. But there's comparable code for integers also. See e.g. here: |
This comment has been minimized.
This comment has been minimized.
Ok so, this is a typical case of cpp11::cpp_function(
"writable::doubles make_cpp(int n) {
writable::doubles x(n);
for(int i = 0; i < n; i++)
x[i] = i;
return x;
}"
)
rextendr::rust_function(
"fn make_rust(n : i32) -> Doubles {
(0..n).map(|x| (x as f64).into()).collect::<Doubles>()
}",
extendr_deps = list(
`extendr-api` = list(git = "https://github.com/extendr/extendr")
),
profile = "release"
)
#> i build directory: 'C:/Users/[redacted]/AppData/Local/Temp/RtmpUH51JB/file4278337c5e69'
#> v Writing 'C:/Users/[redacted]/AppData/Local/Temp/RtmpUH51JB/file4278337c5e69/target/extendr_wrappers.R'.
c(10, 100, 1000, 10000, 100000) |>
purrr::map_dfr(
\(n) bench::mark(make_rust(n), make_cpp(n)) |>
dplyr::mutate(N = n)
) |>
dplyr::select(expression, N, median, `itr/sec`, dplyr::everything())
#> # A tibble: 10 x 7
#> expression N median `itr/sec` min mem_alloc `gc/sec`
#> <bch:expr> <dbl> <bch:tm> <dbl> <bch:tm> <bch:byt> <dbl>
#> 1 make_rust(n) 10 700ns 1248003. 500ns 0B 0
#> 2 make_cpp(n) 10 500ns 1527708. 400ns 0B 153.
#> 3 make_rust(n) 100 900ns 973179. 700ns 848B 0
#> 4 make_cpp(n) 100 700ns 1122990. 500ns 848B 0
#> 5 make_rust(n) 1000 1.8us 409342. 1.5us 7.86KB 40.9
#> 6 make_cpp(n) 1000 1us 853239. 900ns 7.86KB 171.
#> 7 make_rust(n) 10000 17.9us 49989. 10.9us 78.17KB 75.1
#> 8 make_cpp(n) 10000 5.6us 158774. 5us 78.17KB 254.
#> 9 make_rust(n) 100000 112.8us 7039. 106.4us 781.3KB 116.
#> 10 make_cpp(n) 100000 52.6us 16082. 48.1us 781.3KB 263. Created on 2021-12-21 by the reprex package (v2.0.1) |
I added that rextendr::rust_source(
profile = "release",
extendr_deps = list(
`extendr-api` = list(git = "https://github.com/extendr/extendr")
),
|
@Thell, I responded to my (now hidden) message where I ran Speaking of your gist, here are things that may affect the result
It is possible that we may never reach raw C++ performance simply because we have additional overhead and type-safety, idiomatic Rust |
Also, I think you are missing this function https://gist.github.com/Thell/861d464c7c85ffd4c7285a894b7715f4#file-rcpp_vs_rust_zeros-r-L53-L57. |
Yes, it initializes the buffer in the R memory pool. Earlier I mentioned Rcpp's having gone through performance improvements over time and this is one of the things that looks like has improved because I just ran a comparison initializing and passing back a std::vector and there was no extra overhead... there used to be...
I will look into that and add those to the gist. Regarding
I was hoping to find an equivalent to the Rcpp vector with dim attributes set and that missing function was a false start whos only purpose is to remind me I still need to find an equivalent extendr function. |
Ok, I see. Still, very useful results -- we clearly need to improve matrix performance, and likely before the next release. |
So... here's what I think the c++ and rust versions of the dimmed vector -> matrix are (even though I'm not sure about returning the Result<> like that) IntegerVector rcpp_0s_intmat_dimmedvec(int n) {
IntegerVector my_vec(n * n);
my_vec.attr("dim") = Dimension(n, n);
return my_vec;
} fn rust_0s_intmat_dimmedvec(n: i32) -> Result<Robj> {
let my_vec = vec!(0; n as usize * n as usize);
let my_mat = r!(my_vec).set_attrib(dim_symbol(), [n as usize, n as usize]);
my_mat
} Given the vector results from the previous benchmarks I would anticipate that since we are only setting a dim attribute that this rust function should perform competitively with the c++ function.
As we can see looking at the As a sidebar test I made a R function to call the r_dimmed_rust_intvec_m <- function(n) {
res <- .Call("wrap__rust_zeros_intvec", n*n, PACKAGE = "librextendr27")
dim(res) <- c(n,n)
return(res)
} and
Given these results I'm inclined to think the issue isn't pointing directly at matrix handling. |
Is this the way to use rextendr::rust_source(
profile = "release",
extendr_deps = list(
`extendr-api` = list(git = "https://github.com/extendr/extendr")
),
code = '
/// @export
#[extendr(use_try_from = true)]
fn rust_zeros_stdvec_tf(n: i32) -> Vec<i32> {
let my_vec = vec!(0; n as usize);
my_vec
}
extendr_module! {
mod rust_wrap;
fn rust_zeros_stdvec_tf;
}
')
[edit]
|
Bear in mind that converting from a Using For better results - and I haven't tried it - try passing a vector to the function to fill. An alternative is also to use ALTVEC and convert an iterator to a lazy vector. Note that Rust's default values are very "safe" ie. very slow. To get the best result on a platform |
@andy-thomason, can you comment on the matrix performance? While 1.5 performance decrease in vector creation is a 'reasonable' tradeoff, the numbers we get when creating matrices are... discouraging. |
I would have to look at the generated code to comment. The current Matrix implementation has constructors that take a closure of Much depends on whether the loop will vectorise. There is usually about a 10:1 difference The version:
Should be done using I might make an integers! macro along the lines of: integers![0; 1024] For short vectors (<1k) the R calls will dominate |
It's unclear that this is the right choice, as it makes compilation noticeably slower. For a function that is most likely going to be used interactively, it's not clear to me that this is the right tradeoff. |
That used to be the case but not anymore. I know I've posted quite a few results from quite a few different functions so let's cut this down to size with what I believe to be the most direct comparisons. # Very basic Native -> R
# The idea behind each test is to have in-hand a native integer vector
# obtained via the result of some native operations on `n` that should be
# returned to R as an integer vector.
# our base line
r_fn <- function(n) {
vector(mode="integer", length=n)
}
Rcpp::cppFunction('
std::vector<int> rcpp_fn(int n) {
std::vector<int> my_vec(n, 0);
return my_vec;
}')
rextendr::rust_function('
fn rextendr_defaults(n: i32) -> Vec<i32> {
let my_vec = vec!(0; n as usize);
my_vec
}
')
rextendr::rust_function('
fn rextendr_release(n: i32) -> Vec<i32> {
let my_vec = vec!(0; n as usize);
my_vec
}
',
profile = "release")
rextendr::rust_function('
fn rextendr_release_devapi(n: i32) -> Vec<i32> {
let my_vec = vec!(0; n as usize);
my_vec
}
',
profile = "release",
extendr_deps = list(
`extendr-api` = list(git = "https://github.com/extendr/extendr")
))
# cargo with modification of the function required.
# Afaik it doesn't have any automatic type conversion handling.
cargo_fn <- cargo::rust_fn(n, '
let rust_n = n.as_usize();
let my_vec = vec!(0; rust_n);
let (r_vec, xr_vec) = Rval::new_vector_integer(rust_n, &mut pc);
for (relem, rustelem) in xr_vec.iter_mut().zip(my_vec) {
*relem = rustelem;
};
r_vec
')
Perhaps its just me not understanding what all is required for 'safety' and what a reasonable trade-off is in time but when I used PyO3 I was so impressed with Rust's improvement over my Python solution that I had to try it with R and ... well, ouch. I think a big part of that 'ouch' is that I am familiar with Rcpp and that's my goto target when I need more performance out of my R so that is what I compare to. The cargo benchmark shows that Rust can, indeed, be just as performant on getting the values back to R but I dont know what the safety checks are in=place that give all the time between that cargo benchmark and the rextendr times. This also supports, in my opinion, having release be the default because that is going to be the first impression. |
So, I've been doing some reading to further understand what @andy-thomason stated in his earlier post along with what the overhead of safety is. From looking at cargo's From the rust docs...
So, if that very basic explanation is correct, the overhead between cargo's static mut reference methodology and extendr's Robj methodology is the 'cost' of safety, extendability and ease of use (ie. drop-in rust function usage) rather being the cost of a particular method of wrapping values. Does that sound right? |
Also. Vectors over a certain size are currently built as Altrep. We should
probably make this optional until we can do this at zero cost.
…On Fri, 24 Dec 2021, 20:32 Thell, ***@***.***> wrote:
So, I've been doing some reading to further understand what @andy-thomason
<https://github.com/andy-thomason> stated in his earlier post](#11
(comment)
<#11 (comment)>)
along with what the overhead of safety is.
From looking at cargo's new_vector_integer it is allocating and
incrementing the protect count and returning -> (Self, &'static mut [i32]) doc
src <https://docs.rs/roxido/0.4.2/src/roxido/r.rs.html#194-196> which, if
I am understanding what I've been reading about, is inherently unsafe in
multiple ways and a primary goal of extendr is to ensure safety by
protecting the SEXP behind the Robj interface and providing safe methods of
interaction.
From the rust docs...
Note that all interaction with a static mut is unsafe, both reading and
writing. Dealing with global mutable state requires a great deal of care.
So, if that very basic explanation is correct, the overhead between
cargo's static mut reference methodology and extendr's Robj methodology is
the 'cost' of safety, extendability and ease of use (ie. drop-in rust
function usage) rather being the cost of a particular method of wrapping
values.
Does that sound right?
—
Reply to this email directly, view it on GitHub
<#11 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAL36XHAP27VYNOLAKV77JTUSTKEFANCNFSM4VW5OERQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
While this is an important issue, I believe this particular conversation has gone stale. We should explore a benchmark suit, just internally also to keep track of potential performance regressions. These things tend to grow into their own projects / repos, so for now, we'll leave it at that. |
Note that this PR extendr/extendr#401 |
It would be great if there could be a comparison between extender and Rcpp in terms of performance.
The text was updated successfully, but these errors were encountered: