Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Altvec objects in vec_slice #696

Merged
merged 7 commits into from Dec 9, 2019
Merged

Support Altvec objects in vec_slice #696

merged 7 commits into from Dec 9, 2019

Conversation

jimhester
Copy link
Member

This dispatches to ALTVEC_EXTRACT_SUBSET for altvec objects,
falling back to the normal slicing code if the Altvec object does not
declare a extract_subset method.

For versions of R prior to the introduction of Altvec (< 3.5.0) stub
macros are defined resulting in a dead conditional, which should be
removed by dead code optimizer in the compiler.

@jimhester
Copy link
Member Author

Adding a test is unfortunately challenging, as AFAIK base R does not include an altrep class that has a custom extract_subset method. However I did confirm this works well with vroom altrep vectors and the devel versions of tibble and dplyr, which use vec_slice().

library(vroom)
x <- vroom(vroom_example("mtcars.csv"), altrep = TRUE)[1:3]
#> Observations: 32
#> Variables: 12
#> chr [ 1]: model
#> dbl [11]: mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb
#> 
#> Call `spec()` for a copy-pastable column specification
#> Specify the column types with `col_types` to quiet this message
vroom_str(x)
#> 'tbl_df', 'tbl', and 'data.frame': 32 obs., 3 vars.:
#> $model:  altrep:true type:vroom::vroom_chr   length:32   materialized:false
#> $mpg:    altrep:true type:vroom::vroom_dbl   length:32   materialized:false
#> $cyl:    altrep:true type:vroom::vroom_dbl   length:32   materialized:false

# calling directly
vroom_str(vctrs::vec_slice(x[[1]], 1:5))
#> altrep:true  type:vroom::vroom_chr   length:5    materialized:false

# implicitly by dev tibble
vroom_str(x[x$cyl == 4, ])
#> 'tbl_df', 'tbl', and 'data.frame': 11 obs., 3 vars.:
#> $model:  altrep:true type:vroom::vroom_chr   length:11   materialized:false
#> $mpg:    altrep:true type:vroom::vroom_dbl   length:11   materialized:false
#> $cyl:    altrep:false    type: double    length:11

# implicitly by dev dplyr
vroom_str(dplyr::filter(x, cyl == 4))
#> 'tbl_df', 'tbl', and 'data.frame': 11 obs., 3 vars.:
#> $model:  altrep:true type:vroom::vroom_chr   length:11   materialized:false
#> $mpg:    altrep:true type:vroom::vroom_dbl   length:11   materialized:false
#> $cyl:    altrep:false    type: double    length:11

Created on 2019-12-06 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.1 (2019-07-05)
#>  os       macOS Mojave 10.14.6        
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       America/New_York            
#>  date     2019-12-06                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version      date       lib source                           
#>  assertthat    0.2.1        2019-03-21 [1] CRAN (R 3.6.0)                   
#>  backports     1.1.5        2019-10-02 [1] CRAN (R 3.6.0)                   
#>  callr         3.3.2        2019-09-22 [1] CRAN (R 3.6.0)                   
#>  cli           1.1.0        2019-03-19 [1] CRAN (R 3.6.0)                   
#>  crayon        1.3.4        2017-09-16 [1] CRAN (R 3.6.0)                   
#>  desc          1.2.0        2018-05-01 [1] CRAN (R 3.6.0)                   
#>  devtools      2.2.1.9000   2019-10-03 [1] local                            
#>  digest        0.6.23       2019-11-23 [1] CRAN (R 3.6.0)                   
#>  dplyr         0.8.99.9000  2019-12-06 [1] Github (tidyverse/dplyr@4304c13) 
#>  ellipsis      0.3.0.9000   2019-12-02 [1] Github (r-lib/ellipsis@6f84df4)  
#>  evaluate      0.14         2019-05-28 [1] CRAN (R 3.6.0)                   
#>  fs            1.3.1        2019-05-06 [1] CRAN (R 3.6.0)                   
#>  glue          1.3.1        2019-03-12 [1] CRAN (R 3.6.0)                   
#>  highr         0.8          2019-03-20 [1] CRAN (R 3.6.0)                   
#>  htmltools     0.4.0        2019-10-04 [1] CRAN (R 3.6.0)                   
#>  knitr         1.26         2019-11-12 [1] CRAN (R 3.6.0)                   
#>  lobstr        1.1.1        2019-07-02 [1] CRAN (R 3.6.0)                   
#>  magrittr      1.5          2014-11-22 [1] CRAN (R 3.6.0)                   
#>  memoise       1.1.0        2017-04-21 [1] CRAN (R 3.6.0)                   
#>  nvimcom     * 0.9-75       2019-10-16 [1] local                            
#>  pillar        1.4.2.9001   2019-12-06 [1] Github (r-lib/pillar@82370d7)    
#>  pkgbuild      1.0.6        2019-10-09 [1] CRAN (R 3.6.0)                   
#>  pkgconfig     2.0.3        2019-09-22 [1] CRAN (R 3.6.0)                   
#>  pkgload       1.0.2        2018-10-29 [1] CRAN (R 3.6.0)                   
#>  prettyunits   1.0.2        2015-07-13 [1] CRAN (R 3.6.0)                   
#>  processx      3.4.1        2019-07-18 [1] CRAN (R 3.6.0)                   
#>  ps            1.3.0        2018-12-21 [1] CRAN (R 3.6.0)                   
#>  purrr         0.3.3        2019-10-18 [1] CRAN (R 3.6.0)                   
#>  R6            2.4.1        2019-11-12 [1] CRAN (R 3.6.0)                   
#>  Rcpp          1.0.3        2019-11-08 [1] CRAN (R 3.6.0)                   
#>  remotes       2.1.0        2019-06-24 [1] CRAN (R 3.6.0)                   
#>  rlang         0.4.2        2019-11-23 [1] CRAN (R 3.6.0)                   
#>  rmarkdown     1.18         2019-11-27 [1] CRAN (R 3.6.1)                   
#>  rprojroot     1.3-2        2018-01-03 [1] CRAN (R 3.6.0)                   
#>  sessioninfo   1.1.1        2018-11-05 [1] CRAN (R 3.6.0)                   
#>  stringi       1.4.3        2019-03-12 [1] CRAN (R 3.6.0)                   
#>  stringr       1.4.0        2019-02-10 [1] CRAN (R 3.6.0)                   
#>  testthat      2.3.1        2019-12-01 [1] CRAN (R 3.6.1)                   
#>  tibble        2.99.99.9010 2019-12-06 [1] Github (tidyverse/tibble@f4365f7)
#>  tidyselect    0.2.5        2018-10-11 [1] CRAN (R 3.6.0)                   
#>  usethis       1.5.1.9000   2019-12-02 [1] Github (r-lib/usethis@c7314cf)   
#>  vctrs         0.2.0.9007   2019-12-06 [1] local                            
#>  vroom       * 1.0.2        2019-06-28 [1] CRAN (R 3.6.0)                   
#>  withr         2.1.2        2018-03-15 [1] CRAN (R 3.6.0)                   
#>  xfun          0.11         2019-11-12 [1] CRAN (R 3.6.0)                   
#>  yaml          2.2.0        2018-07-25 [1] CRAN (R 3.6.0)                   
#>  zeallot       0.1.0        2018-01-28 [1] CRAN (R 3.6.0)                   
#> 
#> [1] /Users/jhester/Library/R/3.6/library
#> [2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

@jimhester
Copy link
Member Author

This is failing on linux because ALTVEC_EXTRACT_SUBSET() has attribute_hidden (https://github.com/wch/r-source/blob/bfe73ecd848198cb9b68427cec7e70c40f96bd72/src/main/altrep.c#L361), I am not sure why as the other methods have this commented out...

@jimhester
Copy link
Member Author

Ok I have added a workaround by defining a proxy ALTVEC_EXTRACT_SUBSET() method. Not ideal, and we will have to keep the definitions in altrep.h in sync with R-core, but it works.

I may also post on R-devel asking why this function in particular is hidden, when others are not.

@lionel-
Copy link
Member

lionel- commented Dec 6, 2019

Do we need checks against upstream changes to the header? I guess they can't really make changes to the struct because that would break ABI compatibility (across versions but also with serialised objects)?

@jimhester
Copy link
Member Author

They might be able to add fields to the struct without breaking compatibility, but even if they do this code should still work. It would be safest to have a check of some sort, but I am not sure what we could do, did you have something in mind?

@lionel-
Copy link
Member

lionel- commented Dec 9, 2019

I agree the code should be forward-compatible if they only add fields to the method table struct.

I was thinking about creating a small altrep class and check that it's sliced properly. But eventually we'll have altrep sequences, repetitions, and RLE vectors in the package, so slicing of altrep vectors should be covered by unit tests sooner or later. No worries if you don't have time to write a test class.

@jimhester
Copy link
Member Author

I have a RLE implementation in vroom that I could move over here, I have to convert it to C, but all the logic is already C anyway, so it shouldn't take too long.

@lionel-
Copy link
Member

lionel- commented Dec 9, 2019

oh that'd be great! It'd give us a good basis for experimenting with RLE classes (#332).

This dispatches to ALTVEC_EXTRACT_SUBSET for altvec objects,
falling back to the normal slicing code if the Altvec object does not
declare a extract_subset method.

For versions of R prior to the introduction of Altvec (< 3.5.0) stub
macros are defined resulting in a dead conditional, which should be
removed by dead code optimizer in the compiler.
Since this method is hidden on linux :/
Use this class to test slicing of altrep objects
@jimhester
Copy link
Member Author

Ok I added the RLE I used in vroom, the constructor takes a named integer vector to create a RLE character vector. e.g.

.Call(vctrs_rle, c(foo = 10L, bar = 5L))

Will create a length 15 character vector, with the first 10 elements as foo and the last 5 as bar.

The subset method is not very sophisticated, it just returns a normal vector for the given indices. One could imagine a subset that returned a new RLE, but I think this is enough for the test.

@lionel- lionel- merged commit a3c4b54 into master Dec 9, 2019
@lionel-
Copy link
Member

lionel- commented Dec 9, 2019

Thanks!

@lionel- lionel- deleted the vec_slice-altvec branch December 9, 2019 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants