Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache vec_proxy results as stopgap for r-lib/vctrs#1411 #179

Merged
merged 1 commit into from
Jul 14, 2021

Conversation

mjskay
Copy link
Collaborator

@mjskay mjskay commented Jul 13, 2021

This is a stop-gap measure to improve performance on some operations when rvars are used with vctrs functions, which generally comes up when rvars are put into tibbles. It should not affect the output of any operations, just speed. No need to merge before CRAN if that ship has already sailed, but I figured I would submit this now since it gets it off my plate :).

For more info see r-lib/vctrs#1411

Basically: several vctrs functions call vec_proxy(), which must return a "proxy" for the vector as an "elementary" vector (some kind of object consisting of base lists, vectors, or data.frames) that can be sliced in a manner respecting the semantics of the object. Which in our case basically means returning a list indexed by the first index of the rvar (i.e. the second index of the internal draws array). Generating this proxy is an operation with time proportional to the vector size. This is problematic when that proxy is then used by operations that should be constant time (like vec_slice()), as this increases the computational complexity of algorithms using those functions, sometimes to disastrous effect. E.g. in the example I gave in r-lib/vctrs#1411, split() on a tibble with an rvar column can be orders of magnitude slower than split() on a data.frame with an rvar column.

This PR adds a simple cache of the proxy to an rvar so that operations that repeatedly call vec_proxy() only incur the cost of calculating the proxy once. I believe I identified all cases where the cache should be invalidated (basically, any use of draws_of()<- or operations that change the number of chains should invalidate this cache).

@paul-buerkner paul-buerkner mentioned this pull request Jul 14, 2021
Copy link
Collaborator

@paul-buerkner paul-buerkner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you!

@paul-buerkner
Copy link
Collaborator

paul-buerkner commented Jul 14, 2021

Merging this now.

@paul-buerkner paul-buerkner merged commit befdfbd into master Jul 14, 2021
@mjskay mjskay deleted the rvar-proxy-cache branch July 14, 2021 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants