New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
33% runtime reduction for $get_hash() in RDS storrs #98
Conversation
One concern: in order to avoid |
Codecov Report
@@ Coverage Diff @@
## master #98 +/- ##
==========================================
+ Coverage 99.91% 99.91% +<.01%
==========================================
Files 15 16 +1
Lines 1179 1203 +24
==========================================
+ Hits 1178 1202 +24
Misses 1 1
Continue to review full report at Codecov.
|
Confirmed: when I tried to install e9251f9 on R-3.1.0 on Linux, the > install.packages("~/projects/storr", type = "source", repos = NULL)
* installing *source* package ‘storr’ ...
** libs
g++ -I/home/landau/R/R-3.1.0/include -DNDEBUG -I/usr/local/include -fpic -g -O2 -c read_text_file.cpp -o read_text_file.o
read_text_file.cpp:17:14: error: ‘R_CallMethodDef’ does not name a type; did you mean ‘R_dot_Method’?
static const R_CallMethodDef callMethods[] = {
^~~~~~~~~~~~~~~
R_dot_Method
read_text_file.cpp:22:19: error: variable or field ‘R_init_storr’ declared void
void R_init_storr(DllInfo *info) {
^~~~~~~
read_text_file.cpp:22:19: error: ‘DllInfo’ was not declared in this scope
read_text_file.cpp:22:28: error: ‘info’ was not declared in this scope
void R_init_storr(DllInfo *info) {
^~~~
read_text_file.cpp:22:28: note: suggested alternative: ‘ynf’
void R_init_storr(DllInfo *info) {
^~~~
ynf
/home/landau/R/R-3.1.0/etc/Makeconf:137: recipe for target 'read_text_file.o' failed
make: *** [read_text_file.o] Error 1
ERROR: compilation failed for package ‘storr’
* removing ‘/home/landau/R/R-3.1.0/library/storr’
* restoring previous ‘/home/landau/R/R-3.1.0/library/storr’
Warning message:
In install.packages("~/projects/storr", type = "source", repos = NULL) :
installation of package ‘/home/landau/projects/storr’ had non-zero exit status After commenting out these lines, N checking compiled code ...
File ‘storr/libs/storr.so’:
Found no calls to: ‘R_registerRoutines’, ‘R_useDynamicSymbols’
It is good practice to register native routines and to disable symbol
search.
See ‘Writing portable packages’ in the ‘Writing R Extensions’ manual. I suppose we could use a package @richfitz, please let me know what tradeoff you would like to make. |
Related: r-rust/gifski#3. I believe the cutoff is between R-3.3.x and R-3.4.x. |
Hi Will - thanks for this: this looks like something that does look worth pursuing, even at the cost of adding compiled code and the complexity that causes. There's a bunch of stuff to think about here, and I think we can do slightly better than this.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mix of big and small bits here. I'm happy to do the larger change if you'd prefer
R/utils.R
Outdated
#' @useDynLib storr | ||
# Read RDS keys fast | ||
read_text_file <- function(path) { | ||
.Call("read_text_file", PACKAGE = "storr", path, nchar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will become
.Call(Cread_text_file, path, nchar)
with the changes further down the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 4f06d03.
src/read_text_file.cpp
Outdated
@@ -0,0 +1,25 @@ | |||
#include <fstream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer a plain C version if you're ok to port this this C. Otherwise I can take care of this myself later on. Using C strings has the potential to be ever so slightly faster too because we might avoid one allocation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this being the sole compiled file in storr, and given it contains the registration code, please rename to storr.cpp
(or storr.c
if you port it to C)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Porting to C is fine with me. I actually did have a pure C version in an earlier commit, which I can easily go back to. I became hesitant at the last minute because we would need to add extra guard rails against segfaults for key files that are empty, nonexistent, or too small.
By the way, do you have a preference between fread()
or fscanf()
? Maybe something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 4f06d03. Uses fgets()
.
src/read_text_file.cpp
Outdated
{NULL, NULL, 0} | ||
}; | ||
|
||
void R_init_storr(DllInfo *info) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://github.com/richfitz/ring/blob/master/src/registration.c#L58-L65 for how I deal with this usually - do #include <Rversion.h>
earlier up the file and then make R_useDynamicSymbols
and R_forceSymbols
conditional on version. Please use call_methods
rather than callMethods
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad I know this now. Maybe we can shoot for compatibility with R-3.3.x after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 4f06d03.
R/utils.R
Outdated
@@ -179,3 +179,9 @@ file_size <- function(...) { | |||
prompt_ask_yes_no <- function(reason) { | |||
utils::menu(c("no", "yes"), FALSE, title = reason) == 2 # nocov | |||
} | |||
|
|||
#' @useDynLib storr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be @useDynLib storr, .registration = TRUE
in order for registration to work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 4f06d03.
tests/testthat/test-util.R
Outdated
|
||
test_that("read_text_file() works", { | ||
expect_false(file.exists("does_not_exist")) | ||
expect_equal(read_text_file("does_not_exist"), "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behaviour should be changed to throw an error (see comment on main discussion thread)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4f06d03 is a start.
src/read_text_file.cpp
Outdated
} | ||
|
||
static const R_CallMethodDef callMethods[] = { | ||
{"read_text_file", (DL_FUNC) &read_text_file, 2}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the first element to "Cread_text_file"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 4f06d03.
- Use pure C. - Use fgets() instead of fread(). Much safer. - Rename stuff: - read_text_file() => Cread_text_file() - read_text_file.cpp => storr.c - callMethods => call_methods - Register dynamic symbols conditional on R version - @useDynLib storr, .registration = TRUE - .Call(Cread_text_file, path, nchar) cc @richfitz
I believe 4f06d03 takes care of most of the small stuff. Just a couple things:
test-driver.R:127: warning: traits: throw_missing
cannot open compressed file '/tmp/RtmpFa1TGX/storr_5ce4d310937/data/596352599eccbef4ea033cda6ef3fbd9.rds', probable reason 'No such file or directory' |
Should be 3.4.0 at minimum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In get_object, please change to:
get_object = function(hash) {
path <- self$name_hash(hash)
if (!file.exists(path)) {
stop("rds file missing")
}
readRDS(path)
}
and I think that will allow use with the trait. It might be worth checking timings both ways though!
src/storr.c
Outdated
FILE *fp; | ||
fp = fopen(CHAR(asChar(path)), "rb"); | ||
if (fp == NULL) { | ||
return R_NilValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throw with Rf_error()
to satisfy the storr trait. All we need is
Rf_error("File %s does not exist", path);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aa11b12. I'm learning a lot about R internals here.
src/storr.c
Outdated
char *buf = (char*) malloc(n * sizeof(char)); | ||
fgets(buf, n, fp); | ||
fclose(fp); | ||
SEXP out = PROTECT(allocVector(STRSXP, 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that out = PROTECT(mkString(buf));
here will work and be replace these two lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/storr.c
Outdated
return R_NilValue; | ||
} | ||
int n = asInteger(nchar) + 1; // Need an extra character for '\0'. | ||
char *buf = (char*) malloc(n * sizeof(char)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If used a stack allocated array this would be a bit simpler. In general one should use R_alloc()
here otherwise as R will clean that up.
So something like
#define MAX_HASH_LENGTH 128
...
char buf[MAX_HASH_LENGTH + 1];
char * res = fgets(buf, n, fp);
if (res == NULL) {
Rf_error("empty file"); // please test!
}
fclose(fp);
then the free
can come out too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe faster too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/storr.c
Outdated
|
||
void R_init_storr(DllInfo *dll) { | ||
R_registerRoutines(dll, NULL, call_methods, NULL, NULL); | ||
#if defined(R_VERSION) && R_VERSION >= R_Version(3, 3, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please de-indent these 4 lines by 2 spaces (see the ring
example I posted before). Sorry, I know this is picky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/testthat/test-util.R
Outdated
@@ -95,3 +95,20 @@ test_that("write_lines recovers on error", { | |||
expect_silent(write_lines(value, filename)) | |||
expect_identical(readLines(filename), value) | |||
}) | |||
|
|||
|
|||
test_that("read_text_file() works", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test for:
- error on missing file
- error on empty file
- read only first line of multiline file(!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 1f3c05e. For multi-line files, read_text_file("file_name", too_many_characters)
includes '\n'
. Is that okay? For key files, we are going to know how many characters we want, and it would take a little overhead to sub out '\n'
.
R/utils.R
Outdated
# Read RDS keys fast | ||
read_text_file <- function(path, nchar) { | ||
out <- .Call(Cread_text_file, path, nchar) | ||
if (is.null(out)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can come out if you throw in C
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Still throws warnings in tests
Also test on empty key files
With the new |
This just moves things a little closer to the pattern I usually use for C code in packges (which will help maintainability) and splits the tests up a little (which I'm still working on getting better at myself).
I had time to look at this locally today, but need to iron out the traits bit properly. I'm 2 commits ahead of you (see
My laptop is old! but 232.367 -> 58.126 is a really nice increase, and the increased speed with the trait seems worth pursuing while we're here. Could you please double check the benchmarks on your system? |
Sure! Here are the benchmarks on a 2-year-old mid-range Linux desktop. 2750823 ( s <- storr::storr_rds(tempfile())
s$set("a", 1)
microbenchmark::microbenchmark(s$get_hash("a"))
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> s$get_hash("a") 41.454 42.115 43.89025 42.567 42.956 113.922 100 Created on 2019-01-10 by the reprex package (v0.2.1) 1f3c05e ( s <- storr::storr_rds(tempfile())
s$set("a", 1)
microbenchmark::microbenchmark(s$get_hash("a"))
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> s$get_hash("a") 27.687 28.1255 30.69001 28.421 28.7575 100.97 100 Created on 2019-01-10 by the reprex package (v0.2.1) 9394fd3 ( s <- storr::storr_rds(tempfile())
s$set("a", 1)
microbenchmark::microbenchmark(s$get_hash("a"))
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> s$get_hash("a") 20.716 21.2955 22.55306 21.5225 21.828 89.591 100 Created on 2019-01-10 by the reprex package (v0.2.1) Somehow the speedup from traits is not quite as pronounced for me, but 2X speed is still a huge help for |
Hmm... tests are still throwing warnings for me locally. Loading storr
Testing storr
✔ | OK F W S | Context
✔ | 59 | drivers [environment] [0.1 s]
✔ | 42 | export [environment] [0.1 s]
✔ | 17 | external [environment]
✔ | 90 | storr [environment]
✔ | 60 | drivers [rds]nment]
✔ | 42 | export [rds]
✔ | 17 | external [rds]
✔ | 90 | storr [rds] [0.1 s]
✔ | 83 | drivers [DBI/SQLiteConnection] [0.3 s]
✔ | 42 | export [DBI/SQLiteConnection] [0.2 s]
✔ | 17 | external [DBI/SQLiteConnection]
✔ | 90 | storr [DBI/SQLiteConnection] [0.3 s]
✔ | 59 | drivers [multistorr (keys: environment, data: rds)]
✔ | 42 | export [multistorr (keys: environment, data: rds)]
✔ | 17 | external [multistorr (keys: environment, data: rds)]
✔ | 90 | storr [multistorr (keys: environment, data: rds)] [0.1 s]
✔ | 13 | base64[multistorr (keys: environment, data: rds)]
✔ | 35 | copy
✔ | 2 | defunct
✔ | 55 1 | DBI [0.2 s]
──────────────────────────────────────────────────────────────────
test-driver-dbi.R:156: skip: postgres version
Can't make postgres connection
──────────────────────────────────────────────────────────────────
✔ | 7 | environment driver
✔ | 7 | driver multistorr details
✔ | 82 1 | driver rds details [1.7 s]
──────────────────────────────────────────────────────────────────
test-driver-rds.R:114: skip: large vector support
Skipping long running test
──────────────────────────────────────────────────────────────────
✔ | 60 2 | drivers [remote/fake] [0.1 s]
──────────────────────────────────────────────────────────────────
test-driver.R:126: warning: traits: throw_missing
cannot open file '/tmp/RtmpNd8Eeo/filee8f2624edbc/keys/objects/ljwcbuihdaktnsqvezrpfmogxy': No such file or directory
test-driver.R:127: warning: traits: throw_missing
problem copying /tmp/RtmpNd8Eeo/filee8f2624edbc/data/e2052aeb929158382d67fbffb781cb66.rds to/tmp/RtmpNd8Eeo/filee8f1afb85f6/data/e2052aeb929158382d67fbffb781cb66.rds: No such file or directory
──────────────────────────────────────────────────────────────────
✔ | 42 | export [remote/fake] [0.1 s]
✔ | 17 | external [remote/fake]
✔ | 90 5 | storr [remote/fake] [0.2 s]
──────────────────────────────────────────────────────────────────
test-storr.R:36: warning: basic
cannot open file '/tmp/RtmpNd8Eeo/filee8f59571070/keys/objects/aaa': No such file or directory
test-storr.R:243: warning: get_value
problem copying /tmp/RtmpNd8Eeo/filee8f316d15cd/data/nosuchhash.rds to /tmp/RtmpNd8Eeo/filee8f1e568c95/data/nosuchhash.rds: No such file or directory
test-storr.R:263: warning: mget
cannot open file '/tmp/RtmpNd8Eeo/filee8f382f9b96/keys/objects/baz': No such file or directory
test-storr.R:268: warning: mget
cannot open file '/tmp/RtmpNd8Eeo/filee8f382f9b96/keys/objects/baz': No such file or directory
test-storr.R:269: warning: mget
cannot open file '/tmp/RtmpNd8Eeo/filee8f382f9b96/keys/objects/baz': No such file or directory
──────────────────────────────────────────────────────────────────
✔ | 100 5 | storr [remote/fake] [0.2 s]
──────────────────────────────────────────────────────────────────
test-storr.R:36: warning: basic
cannot open file '/tmp/RtmpNd8Eeo/filee8f59571070/keys/objects/aaa': No such file or directory
test-storr.R:243: warning: get_value
problem copying /tmp/RtmpNd8Eeo/filee8f316d15cd/data/nosuchhash.rds to /tmp/RtmpNd8Eeo/filee8f1e568c95/data/nosuchhash.rds: No such file or directory
test-storr.R:263: warning: mget
cannot open file '/tmp/RtmpNd8Eeo/filee8f382f9b96/keys/objects/baz': No such file or directory
test-storr.R:268: warning: mget
cannot open file '/tmp/RtmpNd8Eeo/filee8f382f9b96/keys/objects/baz': No such file or directory
test-storr.R:269: warning: mget
cannot open file '/tmp/RtmpNd8Eeo/filee8f382f9b96/keys/objects/baz': No such file or directory
──────────────────────────────────────────────────────────────────
✔ | 13 | hash
✔ | 8 | spec [0.3 s]
✔ | 45 | storr
✔ | 35 | utils
══ Results ═══════════════════════════════════════════════════════
Duration: 5.0 s
OK: 1378
Failed: 0
Warnings: 7
Skipped: 7 |
The issue seems to only affect remote |
Yes - I'll fix that in the next couple of days :) |
Thanks. |
This seems like a reasonable thing, and something that would have likely been required to implement s3 support properly
Please pull 8304594 into your branch and the test warnings will go away. Could you please also add yourself to the I have one final thought on this: Given the behaviour of
(not including tests). If that seems worthwhile in your performance calculations it would seem a reasonable change. I would be interested in seeing the overall impact on your flame graph if that's easy to do I believe #97 can be closed now? I'm very happy to merge this in - thanks for your work on this |
Great ideas. I have merged 8304594, added myself as a contributor, and bumped the version to 1.2.2. I think passing test-storr.R:284: failure: mset
st$mget_hash(c("foo", "bar")) not equal to `h`.
2/2 mismatches
x[1]: "6717f2823d3202449301145073ab8719\n"
y[1]: "6717f2823d3202449301145073ab8719"
x[2]: "db8e490a925a60e62212cefc7674ca02\n"
y[2]: "db8e490a925a60e62212cefc7674ca02" We could > s <- storr::storr_rds(tempfile())
> s$set("a", 1)
> microbenchmark::microbenchmark(s$get_hash("a"))
Unit: microseconds
expr min lq mean median uq max neval
s$get_hash("a") 20.874 21.219 22.63096 21.478 21.703 103.445 100 With > s <- storr::storr_rds(tempfile())
> s$set("a", 1)
> microbenchmark::microbenchmark(s$get_hash("a"))
Unit: microseconds
expr min lq mean median uq max neval
s$get_hash("a") 35.628 36.1725 39.1697 36.53 36.8735 223.459 100 |
Hmm... the performance improvement for For completeness, here are flame graphs of this test case with and with 2797fd9 |
Thanks! |
Changes
Use C++ code to read key files. Does not require pre-computing the hash length or including
Rcpp
as a package dependency.Benchmarks
2750823:
Created on 2019-01-08 by the reprex package (v0.2.1)
e9251f9:
Created on 2019-01-08 by the reprex package (v0.2.1)