cache vec_proxy results as stopgap for r-lib/vctrs#1411 #179
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a stop-gap measure to improve performance on some operations when
rvar
s are used withvctrs
functions, which generally comes up whenrvar
s are put intotibble
s. It should not affect the output of any operations, just speed. No need to merge before CRAN if that ship has already sailed, but I figured I would submit this now since it gets it off my plate :).For more info see r-lib/vctrs#1411
Basically: several
vctrs
functions callvec_proxy()
, which must return a "proxy" for the vector as an "elementary" vector (some kind of object consisting of base lists, vectors, or data.frames) that can be sliced in a manner respecting the semantics of the object. Which in our case basically means returning a list indexed by the first index of thervar
(i.e. the second index of the internal draws array). Generating this proxy is an operation with time proportional to the vector size. This is problematic when that proxy is then used by operations that should be constant time (likevec_slice()
), as this increases the computational complexity of algorithms using those functions, sometimes to disastrous effect. E.g. in the example I gave in r-lib/vctrs#1411,split()
on a tibble with anrvar
column can be orders of magnitude slower thansplit()
on a data.frame with anrvar
column.This PR adds a simple cache of the proxy to an
rvar
so that operations that repeatedly callvec_proxy()
only incur the cost of calculating the proxy once. I believe I identified all cases where the cache should be invalidated (basically, any use ofdraws_of()<-
or operations that change the number of chains should invalidate this cache).