New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom integer64 slicing support #813
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some proof that I am not wrecking Large factor: library(vctrs)
set.seed(123)
x <- factor(sample(letters, 1e4, replace = TRUE))
idx <- 1:100 + 0L
# before
bench::mark(vec_slice(x, idx), iterations = 100000)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_slice(x, idx) 7.72µs 11.4µs 78034. 17.6KB 72.6
# after
bench::mark(vec_slice(x, idx), iterations = 100000)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_slice(x, idx) 7.7µs 10.9µs 82997. 17.6KB 77.3
# before
bench::mark(vec_chop(x), iterations = 1000)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_chop(x) 33.3ms 90.9ms 11.8 82.2KB 21.5
# after
bench::mark(vec_chop(x), iterations = 1000)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_chop(x) 32.5ms 90.7ms 11.8 82.2KB 21.6 Small factor: library(vctrs)
set.seed(123)
x <- factor(c("a", "b", "c"))
idx <- 1:2 + 0L
# before
bench::mark(vec_slice(x, idx), iterations = 100000)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_slice(x, idx) 6.83µs 9.62µs 93748. 17.1KB 87.3
# after
bench::mark(vec_slice(x, idx), iterations = 100000)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_slice(x, idx) 6.49µs 9.19µs 99387. 17.1KB 92.5
# before
bench::mark(vec_chop(x), iterations = 100000)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_chop(x) 11.1µs 13.6µs 70837. 4.01KB 29.8
# after
bench::mark(vec_chop(x), iterations = 100000)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_chop(x) 11.6µs 14.1µs 69094. 4.01KB 29.0 |
lionel-
approved these changes
Feb 13, 2020
Can you add a NEWS item please? Also check if that tidyverse/tidyr#846 is sloved and close it from here in this case. |
db96671
to
70895f2
Compare
Confirmed! library(tidyr)
library(vctrs)
library(bit64)
fish_encounters$seen <- as.integer64(fish_encounters$seen)
fish_encounters <- fish_encounters[1:30,]
# CRAN
fish_encounters %>%
pivot_wider(names_from = station, values_from = seen)
#> # A tibble: 3 x 12
#> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW
#> <fct> <int64> <int6> <int64> <int> <int64> <int> <int> <int> <int> <int> <int>
#> 1 4842 1 1 1 1 1 1 1 1 … … …
#> 2 4843 1 1 1 1 1 1 1 1 … … …
#> 3 4844 1 1 1 1 1 1 1 1 9218… 9218… 9218…
# This PR
fish_encounters %>%
pivot_wider(names_from = station, values_from = seen)
#> # A tibble: 3 x 12
#> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW
#> <fct> <int64> <int6> <int64> <int> <int64> <int> <int> <int> <int> <int> <int>
#> 1 4842 1 1 1 1 1 1 1 1 1 1 1
#> 2 4843 1 1 1 1 1 1 1 1 1 1 1
#> 3 4844 1 1 1 1 1 1 1 1 NA NA NA Created on 2020-02-13 by the reprex package (v0.3.0) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Closes tidyverse/tidyr#846
This PR adds integer64 support for
vec_chop()
andvec_slice()
, repairing the object whenever it is sliced withNA_integer_
, which is unsupported by the cran version of bit64.By extension this allows us to use
vec_init()
with these objects, freeing us to hard-code casting ofunspecified()
objects to anyto
type withvec_init(to, vec_size(x))
in #812This adds two new R functions:
vec_slice_fallback_integer64()
for use with shaped integer64 objectsvec_slice_dispatch_integer64()
for use with 1D integer64 objectsWhen slicing with shaped objects, the C level
vec_slice_fallback()
will switch to calling the R levelvec_slice_fallback_integer64()
if the input is integer64.When slicing 1D objects with
[
, I've added a new C levelvec_slice_dispatch()
that either calls out tovec_slice_dispatch_integer64()
or does our original behavior of calling to[
.The rationale for doing the
inherits(x, "integer64")
check at the C level is for speed withvec_chop()
. We don't want to slow downvec_chop()
on things like factors by making it repeatedly callinherits(x, "integer64")
to decide which slicing path to take. We can determine whether we need[
orvec_slice_dispatch_integer64()
once up front, and then construct the slice call using that function. It feels a little invasive because nowchop_fallback()
has to "know" about integer64 objects, but the benefits seem worth itThis makes the assumption that if
inherits(x, "integer64")
is true, then the user has bit64 installed. I can add anis_installed("bit64")
check tovec_slice_fallback_integer64()
andvec_slice_dispatch_integer64()
if you feel it is worth it.