Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dplyr creates deep copies of data.tables #614

Closed
matthieugomez opened this issue Sep 20, 2014 · 6 comments
Closed

dplyr creates deep copies of data.tables #614

matthieugomez opened this issue Sep 20, 2014 · 6 comments
Assignees
Labels
bug an unexpected problem or unintended behavior
Milestone

Comments

@matthieugomez
Copy link

From the documentation of dplyr, "mutate() never copies columns, except when you modify an existing column". But this is not true for data.tables.

library(pryr)
library(devtools)
install_github("hadley/lazyeval")
install_github("hadley/dplyr")
install_github("Rdatatable/data.table", build_vignettes=FALSE)
library(dplyr)
library(data.table)

N=1e8; K=100
set.seed(1)
DF <- data.frame(
  id = 1:N,
  v1 = sample(round(runif(100,max=100),4), N, TRUE)                       
)
object_size(DF)
1.2 GB
DF1 <- DF %>% mutate(y=mean(v1))
object_size(DF,DF1)
2 GB

While for a data.table object

N=1e8; K=100
set.seed(1)
DT <- data.table(
  id = 1:N,
  v1 = sample(round(runif(100,max=100),4), N, TRUE)                        
)
object_size(DT)
1.2 GB
DT1 <- DT %>% mutate(y=mean(v1))
object_size(DT,DT1)
3.2 GB

The fonction copy() inside the code for mutate seems to do a deep copy and not a shallow copy.
Maybe shallow copies can be used. I don't know the right way to do it but this seems to work:

setDF(DT)
DT1 <- DT
setDT(DT)
setDT(DT1)
DT1[,y:=mean(v1)]
object_size(DT,DT1)
2 GB
@matthieugomez matthieugomez changed the title Mutate creates deep copies of data.tables Mutate and select create deep copies of data.tables Sep 21, 2014
@matthieugomez matthieugomez changed the title Mutate and select create deep copies of data.tables dplyr creates deep copies of data.tables Sep 21, 2014
@hadley
Copy link
Member

hadley commented Sep 22, 2014

Oh hmmm, it seems to me like data.table::copy() should make a shallow copy, not a deep copy.

@arunsrinivasan any thoughts on the best way to do a shallow copy in dt?

@hadley hadley added the bug an unexpected problem or unintended behavior label Sep 22, 2014
@hadley hadley added this to the 0.3.1 milestone Sep 22, 2014
@hadley hadley self-assigned this Sep 22, 2014
@arunsrinivasan
Copy link
Contributor

@hadley, we've shallow function in data.table, but it isn't exported yet. We'll need to add some extra stuff in there so that we can take care of sub-assign by reference properly on shallow copied data.tables. Once that's done and function exported, I'll write back here (or as a pull request?).

@hadley
Copy link
Member

hadley commented Sep 23, 2014

@arunsrinivasan a pull request would be awesome!

@hadley
Copy link
Member

hadley commented Nov 18, 2014

@arunsrinivasan any change on this? If not, I'll put off until the next release.

@arunsrinivasan
Copy link
Contributor

Not yet, sorry. Likely in 1.9.8. Will write back.

@hadley hadley modified the milestones: 0.4, 0.3.1 Nov 18, 2014
@krlmlr
Copy link
Member

krlmlr commented Nov 12, 2015

@arunsrinivasan shallow() doesn't seem to be exported yet in the CRAN version (1.9.6). Any prospects?

Sorry: You said 1.9.8 -- I guess it's in the pipeline then?

@hadley hadley closed this as completed Mar 8, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

4 participants