-
Notifications
You must be signed in to change notification settings - Fork 6
Cpp optimisation - passing by reference, using const, avoiding Rcpp functions in cpp code #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Parsing longitude was slightly slowed down by making the same changes: passing arguments by reference
tests were not updated and fail, not the good time This reverts commit e013812.
In addition to Rcpp::List, calls to Rcpp::toString and to the is_negative function using regular expressions were deleted. The new version of split_decimal_degree is a little faster: > microbenchmark::microbenchmark( + times = 10^6, setup = {x <- 45.235687}, + new = parzer:::split_decimal_degree(x), + old = parzer:::split_decimal_degree_old(x) + ) Unit: microseconds expr min lq mean median uq max neval new 4.469 4.797 5.423814 4.920 5.084 56692.83 1e+06 old 5.125 5.494 6.093964 5.617 5.781 10816.62 1e+06
A Rcpp::as<std::string> in a loop was also deleted. No speed increase for pz_parse_parts_lon() or pz_parse_parts_lat(). More replacements needed such as in extract_floats_from_string Tests and checks passed
local check passed
execution of pz_parse_lat and pz_parse_lon faster by a couple %! Rcpp objects are still used to build data.frames and to handle warnings and NAs closes ropensci#43
src/RcppExports.cpp
Outdated
@@ -55,12 +55,12 @@ BEGIN_RCPP | |||
END_RCPP | |||
} | |||
// pz_parse_lat | |||
Rcpp::NumericVector pz_parse_lat(const Rcpp::CharacterVector& x); | |||
std::vector<float> pz_parse_lat(std::vector<std::string>& x); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlbanSagouis I generally default to <double>
rather than <float>
, although that may not be necessary here, because the main advantage is the numeric limits of <float>
can often accidentally be reached, and cause undefined behaviour. The limits of <double>
are much higher (see here and the bottom of this page). The limits in this case are well defined, so that's not likely to be issue, but there is no performance effect anyway, so maybe worth thinking about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with <float>
and used it because we mentioned it orally :s
Sticking to <double>
sounds good, I'll make the modifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried replacing <float>
by <double>
but got several test errors, for example:
Failure (test-parse_lon.R:46): parse_lon works: run through test_lons
round(parse_lon(test_lons[i]), 5) (`actual`) not equal to -74.64111 (`expected`).
`actual`: -2330764414375590775143610959527936
`expected`: -75
So this will be for later on a dedicated branch
Ready to merge @mpadge Issues to solve when switching to cpp11:
|
In this branch I kept old and new versions of many functions side by side in scripts (eg |
Not necessarily. More importantly: It'd be good to document the benchmarking somewhere, so the old and new versions should be kept until that has been done. I suggest documenting in an issue, rather than a PR, because it's then easier to find at a later stage. The documentation should also be a full The old code should then only be deleted once that's been done. And to make it easier to reproduce the results in the future, I'd suggest merging first to ensure the "old" versions are retained in the protected ("master") branch, rather than in a branch which we might later inadvertendly delete. Does that make sense? |
I create the issue #46 where I pasted reproducible code and results documenting the benchmarking. I did not use gert but directly built the package from the commit we are interested in:
@mpadge do you also think that this is reproducible? |
Yep, i was able to reproduce it. Thanks! |
tests added for abnormal values handling my convert_lat, convert_lon and pz_hemisphere
Reformatted the R/parzer-package.R script Added missing ORCIDs in the DESCRIPTION
Fixing issue ropensci#48 at: ropensci#48
Description
This is a general attempt at speeding up parzer by passing objects by reference and now preparing work with @mpadge, decreasing Rcpp functions use in cpp code and moving toward replacing Rcpp by cpp11