Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom integer64 slicing support #813

Merged
merged 5 commits into from Feb 13, 2020

Conversation

DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Feb 13, 2020

Closes tidyverse/tidyr#846

This PR adds integer64 support for vec_chop() and vec_slice(), repairing the object whenever it is sliced with NA_integer_, which is unsupported by the cran version of bit64.

By extension this allows us to use vec_init() with these objects, freeing us to hard-code casting of unspecified() objects to any to type with vec_init(to, vec_size(x)) in #812

This adds two new R functions:

  • vec_slice_fallback_integer64() for use with shaped integer64 objects
  • vec_slice_dispatch_integer64() for use with 1D integer64 objects

When slicing with shaped objects, the C level vec_slice_fallback() will switch to calling the R level vec_slice_fallback_integer64() if the input is integer64.

When slicing 1D objects with [, I've added a new C level vec_slice_dispatch() that either calls out to vec_slice_dispatch_integer64() or does our original behavior of calling to [.

The rationale for doing the inherits(x, "integer64") check at the C level is for speed with vec_chop(). We don't want to slow down vec_chop() on things like factors by making it repeatedly call inherits(x, "integer64") to decide which slicing path to take. We can determine whether we need [ or vec_slice_dispatch_integer64() once up front, and then construct the slice call using that function. It feels a little invasive because now chop_fallback() has to "know" about integer64 objects, but the benefits seem worth it

This makes the assumption that if inherits(x, "integer64") is true, then the user has bit64 installed. I can add an is_installed("bit64") check to vec_slice_fallback_integer64() and vec_slice_dispatch_integer64() if you feel it is worth it.

@DavisVaughan
Copy link
Member Author

Some proof that I am not wrecking vec_slice() and vec_chop() performance with factors

Large factor:

library(vctrs)
set.seed(123)

x <- factor(sample(letters, 1e4, replace = TRUE))
idx <- 1:100 + 0L

# before
bench::mark(vec_slice(x, idx), iterations = 100000)
#> # A tibble: 1 x 6
#>   expression             min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>        <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_slice(x, idx)   7.72µs   11.4µs    78034.    17.6KB     72.6

# after
bench::mark(vec_slice(x, idx), iterations = 100000)
#> # A tibble: 1 x 6
#>   expression             min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>        <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_slice(x, idx)    7.7µs   10.9µs    82997.    17.6KB     77.3



# before
bench::mark(vec_chop(x), iterations = 1000)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 x 6
#>   expression       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_chop(x)   33.3ms   90.9ms      11.8    82.2KB     21.5

# after
bench::mark(vec_chop(x), iterations = 1000)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 x 6
#>   expression       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_chop(x)   32.5ms   90.7ms      11.8    82.2KB     21.6

Small factor:

library(vctrs)
set.seed(123)

x <- factor(c("a", "b", "c"))
idx <- 1:2 + 0L

# before
bench::mark(vec_slice(x, idx), iterations = 100000)
#> # A tibble: 1 x 6
#>   expression             min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>        <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_slice(x, idx)   6.83µs   9.62µs    93748.    17.1KB     87.3

# after
bench::mark(vec_slice(x, idx), iterations = 100000)
#> # A tibble: 1 x 6
#>   expression             min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>        <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_slice(x, idx)   6.49µs   9.19µs    99387.    17.1KB     92.5



# before
bench::mark(vec_chop(x), iterations = 100000)
#> # A tibble: 1 x 6
#>   expression       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_chop(x)   11.1µs   13.6µs    70837.    4.01KB     29.8

# after
bench::mark(vec_chop(x), iterations = 100000)
#> # A tibble: 1 x 6
#>   expression       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_chop(x)   11.6µs   14.1µs    69094.    4.01KB     29.0

@DavisVaughan DavisVaughan changed the title Slice integer64 Custom integer64 slicing support Feb 13, 2020
src/slice-chop.c Outdated Show resolved Hide resolved
@lionel-
Copy link
Member

lionel- commented Feb 13, 2020

Can you add a NEWS item please? Also check if that tidyverse/tidyr#846 is sloved and close it from here in this case.

@DavisVaughan
Copy link
Member Author

DavisVaughan commented Feb 13, 2020

Confirmed!

library(tidyr)
library(vctrs)
library(bit64)

fish_encounters$seen <- as.integer64(fish_encounters$seen)
fish_encounters <- fish_encounters[1:30,]

# CRAN
fish_encounters %>%
  pivot_wider(names_from = station, values_from = seen)
#> # A tibble: 3 x 12
#>   fish  Release I80_1  Lisbon  Rstr  Base_TD BCE   BCW   BCE2  BCW2  MAE   MAW  
#>   <fct> <int64> <int6> <int64> <int> <int64> <int> <int> <int> <int> <int> <int>
#> 1 4842  1       1      1       1     1       1     1     1         …     …     …
#> 2 4843  1       1      1       1     1       1     1     1         …     …     …
#> 3 4844  1       1      1       1     1       1     1     1     9218… 9218… 9218…

# This PR
fish_encounters %>%
  pivot_wider(names_from = station, values_from = seen)
#> # A tibble: 3 x 12
#>   fish  Release I80_1  Lisbon  Rstr  Base_TD BCE   BCW   BCE2  BCW2  MAE   MAW  
#>   <fct> <int64> <int6> <int64> <int> <int64> <int> <int> <int> <int> <int> <int>
#> 1 4842  1       1      1       1     1       1     1     1      1     1     1   
#> 2 4843  1       1      1       1     1       1     1     1      1     1     1   
#> 3 4844  1       1      1       1     1       1     1     1     NA    NA    NA

Created on 2020-02-13 by the reprex package (v0.3.0)

@DavisVaughan DavisVaughan merged commit c984ddd into r-lib:master Feb 13, 2020
@DavisVaughan DavisVaughan deleted the slice-integer64 branch February 13, 2020 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pivot_wider generates incorrect values when values_from is a integer64 column
2 participants