Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

33% runtime reduction for $get_hash() in RDS storrs #98

Merged
merged 26 commits into from Jan 14, 2019

Conversation

wlandau
Copy link
Contributor

@wlandau wlandau commented Jan 8, 2019

Changes

Use C++ code to read key files. Does not require pre-computing the hash length or including Rcpp as a package dependency.

Benchmarks

2750823:

s <- storr::storr_rds(tempfile())
s$set("a", 1)  
microbenchmark::microbenchmark(s$get_hash("a"))
#> Unit: microseconds
#>             expr   min      lq     mean  median      uq     max neval
#>  s$get_hash("a") 41.85 42.8365 45.01356 43.2865 44.0005 120.076   100

Created on 2019-01-08 by the reprex package (v0.2.1)

e9251f9:

s <- storr::storr_rds(tempfile())
s$set("a", 1)  
microbenchmark::microbenchmark(s$get_hash("a"))
#> Unit: microseconds
#>             expr    min      lq     mean  median     uq     max neval
#>  s$get_hash("a") 28.738 29.1825 31.27551 29.4455 29.818 149.485   100

Created on 2019-01-08 by the reprex package (v0.2.1)

@wlandau
Copy link
Contributor Author

wlandau commented Jan 8, 2019

One concern: in order to avoid R CMD check notes, I registered the C++ routines in the DLL. In my experience, not all versions of R are able to handle this, so we may no longer be compatible with R 3.1.0.

@wlandau wlandau changed the title 33% speed improvement to $get_hash() in RDS storrs 33% runtime reduction for $get_hash() in RDS storrs Jan 8, 2019
@codecov-io
Copy link

codecov-io commented Jan 8, 2019

Codecov Report

Merging #98 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #98      +/-   ##
==========================================
+ Coverage   99.91%   99.91%   +<.01%     
==========================================
  Files          15       16       +1     
  Lines        1179     1203      +24     
==========================================
+ Hits         1178     1202      +24     
  Misses          1        1
Impacted Files Coverage Δ
R/driver_rds.R 100% <100%> (ø) ⬆️
R/utils.R 100% <100%> (ø) ⬆️
src/storr.c 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2750823...392a22c. Read the comment docs.

@wlandau
Copy link
Contributor Author

wlandau commented Jan 9, 2019

Confirmed: when I tried to install e9251f9 on R-3.1.0 on Linux, the R_registerRoutines, R_useDynamicSymbols caused problems.

> install.packages("~/projects/storr", type = "source", repos = NULL)
* installing *source* packagestorr...
** libs
g++ -I/home/landau/R/R-3.1.0/include -DNDEBUG  -I/usr/local/include    -fpic  -g -O2  -c read_text_file.cpp -o read_text_file.o
read_text_file.cpp:17:14: error:R_CallMethodDefdoes not name a type; did you meanR_dot_Method’?
 static const R_CallMethodDef callMethods[] = {
              ^~~~~~~~~~~~~~~
              R_dot_Method
read_text_file.cpp:22:19: error: variable or fieldR_init_storrdeclared void
 void R_init_storr(DllInfo *info) {
                   ^~~~~~~
read_text_file.cpp:22:19: error:DllInfowas not declared in this scope
read_text_file.cpp:22:28: error:infowas not declared in this scope
 void R_init_storr(DllInfo *info) {
                            ^~~~
read_text_file.cpp:22:28: note: suggested alternative:ynfvoid R_init_storr(DllInfo *info) {
                            ^~~~
                            ynf
/home/landau/R/R-3.1.0/etc/Makeconf:137: recipe for target 'read_text_file.o' failed
make: *** [read_text_file.o] Error 1
ERROR: compilation failed for packagestorr* removing/home/landau/R/R-3.1.0/library/storr* restoring previous/home/landau/R/R-3.1.0/library/storrWarning message:
In install.packages("~/projects/storr", type = "source", repos = NULL) :
  installation of package/home/landau/projects/storrhad non-zero exit status

After commenting out these lines, storr installed just fine on 3.1.0, but then I got the following R CMD check note in R-3.5.2.

N  checking compiled code ...
   Filestorr/libs/storr.so:
     Found no calls to:R_registerRoutines’, ‘R_useDynamicSymbolsIt is good practice to register native routines and to disable symbol
   search.
   
   SeeWriting portable packagesin theWriting R Extensionsmanual.

I suppose we could use a package Makvars to try to grab the user's R version and make a decision, but that seems like a hack and prone to error.

@richfitz, please let me know what tradeoff you would like to make.

@wlandau
Copy link
Contributor Author

wlandau commented Jan 9, 2019

Related: r-rust/gifski#3. I believe the cutoff is between R-3.3.x and R-3.4.x.

@richfitz
Copy link
Owner

richfitz commented Jan 9, 2019

Hi Will - thanks for this: this looks like something that does look worth pursuing, even at the cost of adding compiled code and the complexity that causes.

There's a bunch of stuff to think about here, and I think we can do slightly better than this.

storr has the concept of "traits" - in particular throw_missing exists to avoid the call to self$exists if the act of reading a missing file will throw an error. So provided we make a little extra change to the rds reader too, we can avoid going through file exists if you check that the file exists in the C function. Does that make sense? traits for the driver then becomes traits = list(accept = "raw", throw_missing = TRUE) and the speedup should be more pronounced. I'll quickly go through the PR now and will return to it later in the week/next week

Copy link
Owner

@richfitz richfitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mix of big and small bits here. I'm happy to do the larger change if you'd prefer

R/utils.R Outdated
#' @useDynLib storr
# Read RDS keys fast
read_text_file <- function(path) {
.Call("read_text_file", PACKAGE = "storr", path, nchar)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will become

.Call(Cread_text_file, path, nchar)

with the changes further down the PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 4f06d03.

@@ -0,0 +1,25 @@
#include <fstream>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a plain C version if you're ok to port this this C. Otherwise I can take care of this myself later on. Using C strings has the potential to be ever so slightly faster too because we might avoid one allocation

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this being the sole compiled file in storr, and given it contains the registration code, please rename to storr.cpp (or storr.c if you port it to C)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Porting to C is fine with me. I actually did have a pure C version in an earlier commit, which I can easily go back to. I became hesitant at the last minute because we would need to add extra guard rails against segfaults for key files that are empty, nonexistent, or too small.

By the way, do you have a preference between fread() or fscanf()? Maybe something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 4f06d03. Uses fgets().

{NULL, NULL, 0}
};

void R_init_storr(DllInfo *info) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/richfitz/ring/blob/master/src/registration.c#L58-L65 for how I deal with this usually - do #include <Rversion.h> earlier up the file and then make R_useDynamicSymbols and R_forceSymbols conditional on version. Please use call_methods rather than callMethods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad I know this now. Maybe we can shoot for compatibility with R-3.3.x after all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 4f06d03.

R/utils.R Outdated
@@ -179,3 +179,9 @@ file_size <- function(...) {
prompt_ask_yes_no <- function(reason) {
utils::menu(c("no", "yes"), FALSE, title = reason) == 2 # nocov
}

#' @useDynLib storr
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be @useDynLib storr, .registration = TRUE in order for registration to work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 4f06d03.


test_that("read_text_file() works", {
expect_false(file.exists("does_not_exist"))
expect_equal(read_text_file("does_not_exist"), "")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behaviour should be changed to throw an error (see comment on main discussion thread)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4f06d03 is a start.

}

static const R_CallMethodDef callMethods[] = {
{"read_text_file", (DL_FUNC) &read_text_file, 2},
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the first element to "Cread_text_file"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 4f06d03.

- Use pure C.
- Use fgets() instead of fread(). Much safer.
- Rename stuff:
    - read_text_file() => Cread_text_file()
    - read_text_file.cpp => storr.c
    - callMethods => call_methods
- Register dynamic symbols conditional on R version
- @useDynLib storr, .registration = TRUE
- .Call(Cread_text_file, path, nchar)

cc @richfitz
@wlandau
Copy link
Contributor Author

wlandau commented Jan 9, 2019

I believe 4f06d03 takes care of most of the small stuff. Just a couple things:

  1. C has no error handling, so I just return NULL from Cread_text_file() and check it in read_text_file(). Is that good enough? I was hesitant to use a special exception like KeyError for a function this general.
  2. When I tried including self$traits <- list(accept = "raw", throw_missing = TRUE) in the initialize() method of the RDS driver, the tests threw a bunch of warnings: for example,
test-driver.R:127: warning: traits: throw_missing
cannot open compressed file '/tmp/RtmpFa1TGX/storr_5ce4d310937/data/596352599eccbef4ea033cda6ef3fbd9.rds', probable reason 'No such file or directory'

Should be 3.4.0 at minimum
Copy link
Owner

@richfitz richfitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In get_object, please change to:

    get_object = function(hash) {
      path <- self$name_hash(hash)
      if (!file.exists(path)) {
        stop("rds file missing")
      }
      readRDS(path)
    }

and I think that will allow use with the trait. It might be worth checking timings both ways though!

src/storr.c Outdated
FILE *fp;
fp = fopen(CHAR(asChar(path)), "rb");
if (fp == NULL) {
return R_NilValue;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throw with Rf_error() to satisfy the storr trait. All we need is

Rf_error("File %s does not exist", path);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aa11b12. I'm learning a lot about R internals here.

src/storr.c Outdated
char *buf = (char*) malloc(n * sizeof(char));
fgets(buf, n, fp);
fclose(fp);
SEXP out = PROTECT(allocVector(STRSXP, 1));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that out = PROTECT(mkString(buf)); here will work and be replace these two lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/storr.c Outdated
return R_NilValue;
}
int n = asInteger(nchar) + 1; // Need an extra character for '\0'.
char *buf = (char*) malloc(n * sizeof(char));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If used a stack allocated array this would be a bit simpler. In general one should use R_alloc() here otherwise as R will clean that up.

So something like

#define MAX_HASH_LENGTH 128
...

char buf[MAX_HASH_LENGTH + 1];
char * res = fgets(buf, n, fp);
if (res == NULL) {
  Rf_error("empty file"); // please test!
}
fclose(fp);

then the free can come out too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe faster too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/storr.c Outdated

void R_init_storr(DllInfo *dll) {
R_registerRoutines(dll, NULL, call_methods, NULL, NULL);
#if defined(R_VERSION) && R_VERSION >= R_Version(3, 3, 0)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please de-indent these 4 lines by 2 spaces (see the ring example I posted before). Sorry, I know this is picky

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -95,3 +95,20 @@ test_that("write_lines recovers on error", {
expect_silent(write_lines(value, filename))
expect_identical(readLines(filename), value)
})


test_that("read_text_file() works", {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test for:

  • error on missing file
  • error on empty file
  • read only first line of multiline file(!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 1f3c05e. For multi-line files, read_text_file("file_name", too_many_characters) includes '\n'. Is that okay? For key files, we are going to know how many characters we want, and it would take a little overhead to sub out '\n'.

R/utils.R Outdated
# Read RDS keys fast
read_text_file <- function(path, nchar) {
out <- .Call(Cread_text_file, path, nchar)
if (is.null(out)) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can come out if you throw in C

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@wlandau
Copy link
Contributor Author

wlandau commented Jan 9, 2019

With the new get_object() in driver_rds.R, some of the test warnings go away, but some remain. I kept the line that adds the traits commented here. Anything else I could try? (Or would it be preferable to take care of traits outside this PR?)

This just moves things a little closer to the pattern I usually use
for C code in packges (which will help maintainability) and splits
the tests up a little (which I'm still working on getting better
at myself).
@richfitz
Copy link
Owner

I had time to look at this locally today, but need to iron out the traits bit properly. I'm 2 commits ahead of you (see wlandau-96-cpp) at the moment here, but here's the timings on my laptop:

2750823 (1.2.1)

> s <- storr::storr_rds(tempfile())
> s$set("a", 1)  
> microbenchmark::microbenchmark(s$get_hash("a"))
Unit: microseconds
            expr     min      lq     mean   median      uq     max neval
 s$get_hash("a") 232.367 244.857 294.8718 256.1145 293.521 784.262   100

a0f106b

> s <- storr::storr_rds(tempfile())
> s$set("a", 1)  
> microbenchmark::microbenchmark(s$get_hash("a"))
Unit: microseconds
            expr    min     lq     mean median     uq    max neval
 s$get_hash("a") 77.773 79.545 98.80603 85.692 90.064 257.34   100

9394fd3

> s <- storr::storr_rds(tempfile())
> s$set("a", 1)  
> microbenchmark::microbenchmark(s$get_hash("a"))
Unit: microseconds
            expr    min     lq     mean  median     uq     max neval
 s$get_hash("a") 58.126 59.904 76.31919 63.9045 73.237 341.906   100

My laptop is old! but 232.367 -> 58.126 is a really nice increase, and the increased speed with the trait seems worth pursuing while we're here. Could you please double check the benchmarks on your system?

@wlandau
Copy link
Contributor Author

wlandau commented Jan 10, 2019

Sure! Here are the benchmarks on a 2-year-old mid-range Linux desktop.

2750823 (master)

s <- storr::storr_rds(tempfile())
s$set("a", 1)  
microbenchmark::microbenchmark(s$get_hash("a"))
#> Unit: microseconds
#>             expr    min     lq     mean median     uq     max neval
#>  s$get_hash("a") 41.454 42.115 43.89025 42.567 42.956 113.922   100

Created on 2019-01-10 by the reprex package (v0.2.1)

1f3c05e (wlandau/storr@96-cpp)

s <- storr::storr_rds(tempfile())
s$set("a", 1)  
microbenchmark::microbenchmark(s$get_hash("a"))
#> Unit: microseconds
#>             expr    min      lq     mean median      uq    max neval
#>  s$get_hash("a") 27.687 28.1255 30.69001 28.421 28.7575 100.97   100

Created on 2019-01-10 by the reprex package (v0.2.1)

9394fd3 (richfitz/storr@wlandau-96-cpp)

s <- storr::storr_rds(tempfile())
s$set("a", 1)  
microbenchmark::microbenchmark(s$get_hash("a"))
#> Unit: microseconds
#>             expr    min      lq     mean  median     uq    max neval
#>  s$get_hash("a") 20.716 21.2955 22.55306 21.5225 21.828 89.591   100

Created on 2019-01-10 by the reprex package (v0.2.1)

Somehow the speedup from traits is not quite as pronounced for me, but 2X speed is still a huge help for drake. I will incorporate your commits into this PR.

@wlandau
Copy link
Contributor Author

wlandau commented Jan 10, 2019

Hmm... tests are still throwing warnings for me locally.

Loading storr
Testing storr| OK F W S | Context| 59       | drivers [environment] [0.1 s]
✔ | 42       | export [environment] [0.1 s]
✔ | 17       | external [environment]
✔ | 90       | storr [environment]
✔ | 60       | drivers [rds]nment]
✔ | 42       | export [rds]
✔ | 17       | external [rds]
✔ | 90       | storr [rds] [0.1 s]
✔ | 83       | drivers [DBI/SQLiteConnection] [0.3 s]
✔ | 42       | export [DBI/SQLiteConnection] [0.2 s]
✔ | 17       | external [DBI/SQLiteConnection]
✔ | 90       | storr [DBI/SQLiteConnection] [0.3 s]
✔ | 59       | drivers [multistorr (keys: environment, data: rds)]
✔ | 42       | export [multistorr (keys: environment, data: rds)]
✔ | 17       | external [multistorr (keys: environment, data: rds)]
✔ | 90       | storr [multistorr (keys: environment, data: rds)] [0.1 s]
✔ | 13       | base64[multistorr (keys: environment, data: rds)]
✔ | 35       | copy|  2       | defunct| 55     1 | DBI [0.2 s]
──────────────────────────────────────────────────────────────────
test-driver-dbi.R:156: skip: postgres version
Can't make postgres connection
──────────────────────────────────────────────────────────────────
✔ |  7       | environment driver
✔ |  7       | driver multistorr details
✔ | 82     1 | driver rds details [1.7 s]
──────────────────────────────────────────────────────────────────
test-driver-rds.R:114: skip: large vector support
Skipping long running test
──────────────────────────────────────────────────────────────────
✔ | 60   2   | drivers [remote/fake] [0.1 s]
──────────────────────────────────────────────────────────────────
test-driver.R:126: warning: traits: throw_missing
cannot open file '/tmp/RtmpNd8Eeo/filee8f2624edbc/keys/objects/ljwcbuihdaktnsqvezrpfmogxy': No such file or directory

test-driver.R:127: warning: traits: throw_missing
problem copying /tmp/RtmpNd8Eeo/filee8f2624edbc/data/e2052aeb929158382d67fbffb781cb66.rds to/tmp/RtmpNd8Eeo/filee8f1afb85f6/data/e2052aeb929158382d67fbffb781cb66.rds: No such file or directory
──────────────────────────────────────────────────────────────────
✔ | 42       | export [remote/fake] [0.1 s]
✔ | 17       | external [remote/fake]
✔ | 90   5   | storr [remote/fake] [0.2 s]
──────────────────────────────────────────────────────────────────
test-storr.R:36: warning: basic
cannot open file '/tmp/RtmpNd8Eeo/filee8f59571070/keys/objects/aaa': No such file or directory

test-storr.R:243: warning: get_value
problem copying /tmp/RtmpNd8Eeo/filee8f316d15cd/data/nosuchhash.rds to /tmp/RtmpNd8Eeo/filee8f1e568c95/data/nosuchhash.rds: No such file or directory

test-storr.R:263: warning: mget
cannot open file '/tmp/RtmpNd8Eeo/filee8f382f9b96/keys/objects/baz': No such file or directory

test-storr.R:268: warning: mget
cannot open file '/tmp/RtmpNd8Eeo/filee8f382f9b96/keys/objects/baz': No such file or directory

test-storr.R:269: warning: mget
cannot open file '/tmp/RtmpNd8Eeo/filee8f382f9b96/keys/objects/baz': No such file or directory
──────────────────────────────────────────────────────────────────
✔ | 100   5   | storr [remote/fake] [0.2 s]
──────────────────────────────────────────────────────────────────
test-storr.R:36: warning: basic
cannot open file '/tmp/RtmpNd8Eeo/filee8f59571070/keys/objects/aaa': No such file or directory

test-storr.R:243: warning: get_value
problem copying /tmp/RtmpNd8Eeo/filee8f316d15cd/data/nosuchhash.rds to /tmp/RtmpNd8Eeo/filee8f1e568c95/data/nosuchhash.rds: No such file or directory

test-storr.R:263: warning: mget
cannot open file '/tmp/RtmpNd8Eeo/filee8f382f9b96/keys/objects/baz': No such file or directory

test-storr.R:268: warning: mget
cannot open file '/tmp/RtmpNd8Eeo/filee8f382f9b96/keys/objects/baz': No such file or directory

test-storr.R:269: warning: mget
cannot open file '/tmp/RtmpNd8Eeo/filee8f382f9b96/keys/objects/baz': No such file or directory
──────────────────────────────────────────────────────────────────
✔ | 13       | hash
✔ |  8       | spec [0.3 s]
✔ | 45       | storr
✔ | 35       | utils

══ Results ═══════════════════════════════════════════════════════
Duration: 5.0 s

OK:       1378
Failed:   0
Warnings: 7
Skipped:  7

@wlandau
Copy link
Contributor Author

wlandau commented Jan 10, 2019

The issue seems to only affect remote storr drivers.

@richfitz
Copy link
Owner

Hmm... tests are still throwing warnings for me locally.

Yes - I'll fix that in the next couple of days :)

@wlandau
Copy link
Contributor Author

wlandau commented Jan 10, 2019

Thanks.

This seems like a reasonable thing, and something that would have
likely been required to implement s3 support properly
@richfitz
Copy link
Owner

Please pull 8304594 into your branch and the test warnings will go away.

Could you please also add yourself to the DESCRIPTION as a ctb? Please also bump the version to reflect the number you used in the NEWS (1.2.2, which seems reasonable to me).

I have one final thought on this:

Given the behaviour of fgets, it is be possible to pass MAX_HASH_LENGTH rather than a computed hash length to fgets with the same behaviour and result, but avoiding the complexity of computing the hash and the one extra R6 field lookup. A diff for that set of changes is

diff --git a/R/driver_rds.R b/R/driver_rds.R
index 5a30f59..b2d5887 100644
--- a/R/driver_rds.R
+++ b/R/driver_rds.R
@@ -203,3 +203,3 @@ R6_driver_rds <- R6::R6Class(
     get_hash = function(key, namespace) {
-      read_text_file(self$name_key(key, namespace), self$hash_length)
+      read_text_file(self$name_key(key, namespace))
     },
diff --git a/R/utils.R b/R/utils.R
index 256f157..24bc4d9 100644
--- a/R/utils.R
+++ b/R/utils.R
@@ -184,4 +184,4 @@ prompt_ask_yes_no <- function(reason) {
 # Read RDS keys fast
-read_text_file <- function(path, nchar) {
-  .Call(Cread_text_file, path, nchar)
+read_text_file <- function(path) {
+  .Call(Cread_text_file, path)
 }
diff --git a/src/storr.c b/src/storr.c
index f9a2455..efc84c6 100644
--- a/src/storr.c
+++ b/src/storr.c
@@ -9,5 +9,5 @@
 
-SEXP read_text_file(SEXP r_path, SEXP r_nchar) {
+SEXP read_text_file(SEXP r_path) {
   const char * path = CHAR(STRING_ELT(r_path, 0));
-  const int nchar = asInteger(r_nchar);
+  const int nchar = MAX_HASH_LENGTH;
 
@@ -29,3 +29,3 @@ SEXP read_text_file(SEXP r_path, SEXP r_nchar) {
 static const R_CallMethodDef call_methods[] = {
-  {"Cread_text_file", (DL_FUNC) &read_text_file, 2},
+  {"Cread_text_file", (DL_FUNC) &read_text_file, 1},
   {NULL, NULL, 0}

(not including tests). If that seems worthwhile in your performance calculations it would seem a reasonable change.

I would be interested in seeing the overall impact on your flame graph if that's easy to do

I believe #97 can be closed now? I'm very happy to merge this in - thanks for your work on this

@wlandau
Copy link
Contributor Author

wlandau commented Jan 11, 2019

Great ideas. I have merged 8304594, added myself as a contributor, and bumped the version to 1.2.2.

I think passing MAX_HASH_LENGTH is a good idea in principle, but the problem is that it captures newline characters.

test-storr.R:284: failure: mset
st$mget_hash(c("foo", "bar")) not equal to `h`.
2/2 mismatches
x[1]: "6717f2823d3202449301145073ab8719\n"
y[1]: "6717f2823d3202449301145073ab8719"

x[2]: "db8e490a925a60e62212cefc7674ca02\n"
y[2]: "db8e490a925a60e62212cefc7674ca02"

We could trimws() them out, but there is a speed penalty. Without trimws(which = "right"):

> s <- storr::storr_rds(tempfile())
> s$set("a", 1)  
> microbenchmark::microbenchmark(s$get_hash("a"))
Unit: microseconds
            expr    min     lq     mean median     uq     max neval
 s$get_hash("a") 20.874 21.219 22.63096 21.478 21.703 103.445   100

With trimws(which = "right")

> s <- storr::storr_rds(tempfile())
> s$set("a", 1)  
> microbenchmark::microbenchmark(s$get_hash("a"))
Unit: microseconds
            expr    min      lq    mean median      uq     max neval
 s$get_hash("a") 35.628 36.1725 39.1697  36.53 36.8735 223.459   100

@wlandau
Copy link
Contributor Author

wlandau commented Jan 11, 2019

Also, may I bump Depends: to R (>= 3.3.0)? That would allow us to close #99, #100, and #101.

@wlandau
Copy link
Contributor Author

wlandau commented Jan 11, 2019

Hmm... the performance improvement for drake is not actually that much. dependency_hash(), which calls $get_hash() repeatedly, now takes 8.46% of the computation instead of 10.58%. I think this is because development drake is now memoizing those hashes in memory while make() is running: ropensci/drake#660. But hopefully someday drake will be fast enough that these time savings will make more of a difference. I suspect they have more of an impact on drake version <= 6.2.1.

For completeness, here are flame graphs of this test case with storr 1.2.1 (CRAN):

old

and with 2797fd9

new

@richfitz
Copy link
Owner

Also, may I bump Depends: to R (>= 3.3.0)? That would allow us to close #99, #100, and #101.

Yes, I think 3.3.0 is a reasonable minimum version at this point, go for it

@wlandau
Copy link
Contributor Author

wlandau commented Jan 14, 2019

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants