Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On retaining class data.table (or not) #58

Closed
Henrik-P opened this issue Mar 21, 2018 · 2 comments
Closed

On retaining class data.table (or not) #58

Henrik-P opened this issue Mar 21, 2018 · 2 comments

Comments

@Henrik-P
Copy link

Henrik-P commented Mar 21, 2018

After answering How to retain data.table class when piped through dplyr::filter? on SO, I thought some observations could be worth mentioning here.

Create a data.table:

library(data.table)
d <- data.table(mtcars)
class(d)
# "data.table" "data.frame"

(1) Apply dplyr verbs directly on the data.table

  • class data.table is retained when using select

    library(magrittr)
    d %>% dplyr::select(hp, mpg) %>% class
    # "data.table" "data.frame"
    
  • class data.table is not retained when using filter

    d %>% dplyr::filter(hp > 100) %>% class
    # "data.frame"
    

First, I realize that this most likely not be 'the correct' way of using dplyr verbs on a data.table (hence the dtplyr package), but this was the premise of the SO question. Still, the .data argument and Value are described in the same way in ?filter and ?select, so from this information only it's hard to tell why .data of class data.table is treated differently in the two functions.

(2) I suppose the correct dtplyr way would be to explicitly convert the data.table to a "data table tbl" using tbl_dt to retain class data.table:

library(dtplyr)
d %>% tbl_dt() %>% dplyr::select(hp, mpg) %>% class
# [1] "tbl_dt"     "tbl"        "data.table" "data.frame"

d %>% tbl_dt() %>% dplyr::filter(hp > 100) %>% class
# [1] "tbl_dt"     "tbl"        "data.table" "data.frame"

(3) However, merely loading dtplyr is enough to retain class data.table. Which was a bit unexpected.

d %>% dplyr::select(hp, mpg) %>% class
# [1] "data.table" "data.frame"

d %>% dplyr::filter(hp > 100) %>% class
# [1] "data.table" "data.frame"
@hadley
Copy link
Member

hadley commented May 14, 2018

I don't think 3) should be too unexpected - where better for filter.data.table method to live?

@Henrik-P
Copy link
Author

I assume you are right :) For some reason I thought an explicitly conversion to a "data table tbl" was necessary to retain the data.table class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants