Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checking of data.table in ... vs setclass #336

Closed
chainsawriot opened this issue Sep 3, 2023 · 9 comments
Closed

checking of data.table in ... vs setclass #336

chainsawriot opened this issue Sep 3, 2023 · 9 comments
Assignees

Comments

@chainsawriot
Copy link
Collaborator

rio/R/import.R

Lines 141 to 158 in 067a0ae

if (missing(setclass) || is.null(setclass)) {
if ("data.table" %in% names(args_list) && isTRUE(args_list[["data.table"]])) {
return(set_class(x, class = "data.table"))
} else {
return(set_class(x, class = "data.frame"))
}
} else {
if ("data.table" %in% names(args_list) && isTRUE(args_list[["data.table"]])) {
if (setclass != "data.table") {
warning(sprintf("'data.table = TRUE' argument overruled. Using setclass = '%s'", setclass))
return(set_class(x, class = setclass))
} else {
return(set_class(x, class = "data.table"))
}
} else {
return(set_class(x, class = setclass))
}
}

rio/R/import_list.R

Lines 128 to 147 in 067a0ae

a <- list(...)
if (is.null(setclass)) {
if ("data.table" %in% names(a) && isTRUE(a[["data.table"]])) {
x <- set_class(x, class = "data.table")
} else {
x <- set_class(x, class = "data.frame")
}
} else {
if ("data.table" %in% names(a) && isTRUE(a[["data.table"]])) {
if (setclass != "data.table") {
warning(sprintf("'data.table = TRUE' argument overruled. Using setclass = '%s'", setclass))
x <- set_class(x, class = setclass)
} else {
x <- set_class(x, class = "data.table")
}
} else {
x <- set_class(x, class = setclass)
}
}
}

It's kind of out of context to do this inside import() and import_list()

@chainsawriot chainsawriot changed the title Put the checking of data.table in ... to setclass Put the checking of data.table in ... into setclass Sep 3, 2023
@chainsawriot
Copy link
Collaborator Author

As a background, data.table is actually a parameter of data.table:fread() and default to TRUE (at least when one doesn't set the options). BTW, readODS::read_ods() v2.0 now has as_tibble and default to TRUE. If data.table is respected, so should as_tibble.

The checking in import() and import_list() here basically is making data.table a de facto undocumented feature even for the importing function not using data.table::fread(), see the ods case (which use readODS::read_ods(), and the warning is confusing)

x1 <- tempfile(fileext = ".csv")
rio::export(mtcars, file = x1)
y1 <- rio::import(x1, data.table = TRUE)
class(y1)
#> [1] "data.table" "data.frame"

x2 <- tempfile(fileext = ".ods")
rio::export(mtcars, file = x2)
y2 <- rio::import(x2, data.table = TRUE)
#> Warning in .import.rio_ods(file = file, ...): The following arguments were ignored for read_ods:
#> data.table
class(y2)
#> [1] "data.table" "data.frame"

Created on 2023-09-04 with reprex v2.0.2

What I wanted to argue here is that it is almost impossible to chase and override all of them. How about we just let setclass to always have the last say.

@chainsawriot
Copy link
Collaborator Author

#326

@chainsawriot
Copy link
Collaborator Author

chainsawriot commented Sep 4, 2023

Another option is to ditch setclass altogether and make data.table and tibble formal arguments of import() and import_form(). If both FALSE, get a vanilla data.frame. Probably can't be both TRUE.

I can imagine a lot of code has been written with setclass

e.g.

datalorax/equatiomatic#39 (comment)

paul-buerkner/brms#847 (comment)

And it's probably a better idea to get rid of an undocumented feature (the overridding data.table) than to get rid of a documented feature (setclass).

@chainsawriot chainsawriot changed the title Put the checking of data.table in ... into setclass checking of data.table in ... into setclass Sep 4, 2023
@chainsawriot chainsawriot changed the title checking of data.table in ... into setclass checking of data.table in ... vs setclass Sep 4, 2023
@schochastics
Copy link
Member

Here is my opinion after looking at the code (but without knowing how people tend to use the function, and not knowing much of the history of rio yet):
The return type should be explicitly specified with the setclass parameter which should default to data.frame (could also be a tibble) and it should not be encouraged to change the class with special parameters in ..., i.e. setclass is authoritative. Not sure how much this would break for people, but I think it would be good to have a parameter dedicated to the returned class.

@chainsawriot
Copy link
Collaborator Author

setclass = getOption("rio.import.class", "data.frame") is probably a better default. Missing as default to export "data.frame" is not expressive. (NULL to bypass setclass is not documented too)

So that one can set the default class: options(rio.import.class = "data.table") or option(rio.import.class = "tibble"). If one wants to be explicit: setclass = "tibble".

And ditch the checking of data.table (or as_tibble or whatever) in ... Say it explicitly in the doc that setclass always has the final say. Demo it in the examples.

@schochastics
Copy link
Member

yes getOption is even better. I definitely like this approach

@schochastics schochastics self-assigned this Sep 6, 2023
@chainsawriot
Copy link
Collaborator Author

43d544b

rio/R/import.R

Lines 134 to 136 in 81c0423

if (inherits(file, c("rio_rdata", "rio_rds", "rio_json"))) {
return(x)
}

This check is weird (rds, rdata, and json to skip setclass en masse). If the output is data.frame, I think it should still go through setclass.

What should be checked is x instead.

@chainsawriot
Copy link
Collaborator Author

chainsawriot commented Sep 8, 2023

#183

rdata, rds, json, gs, r, dump, yaml can be arbitrary R objects.

@schochastics
Copy link
Member

schochastics commented Sep 8, 2023

43d544b

rio/R/import.R

Lines 134 to 136 in 81c0423

if (inherits(file, c("rio_rdata", "rio_rds", "rio_json"))) {
return(x)
}

This check is weird (rds, rdata, and json to skip setclass en masse). If the output is data.frame, I think it should still go through setclass.

What should be checked is x instead.

Another place where I forgot qs?

schochastics added a commit to schochastics/rio that referenced this issue Sep 8, 2023
chainsawriot pushed a commit that referenced this issue Sep 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants