Skip to content

Commit

Permalink
add tweaks for #409 (#411)
Browse files Browse the repository at this point in the history
* Documentation (in import() as well as not using serialization
formats in the overview vignette)
* Correct the deprecating warning
* Add an undocumented option: rio.import.trust. We don't need to
advertise this to encourage users to override this.
  • Loading branch information
chainsawriot committed May 13, 2024
1 parent 4b99765 commit c86db70
Show file tree
Hide file tree
Showing 9 changed files with 106 additions and 66 deletions.
18 changes: 15 additions & 3 deletions R/import.R
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@
#' \item Stata (.dta), using [haven::read_dta()]
#' \item SPSS Portable Files (.por), using [haven::read_por()].
#' \item Excel (.xls and .xlsx), using [readxl::read_xlsx()] or [readxl::read_xls()]. Use `which` to specify a sheet number.
#' \item R syntax object (.R), using [base::dget()]
#' \item Saved R objects (.RData,.rda), using [base::load()] for single-object .Rdata files. Use `which` to specify an object name for multi-object .Rdata files. This can be any R object (not just a data frame).
#' \item Serialized R objects (.rds), using [base::readRDS()]. This can be any R object (not just a data frame).
#' \item R syntax object (.R), using [base::dget()], see `trust` below.
#' \item Saved R objects (.RData,.rda), using [base::load()] for single-object .Rdata files. Use `which` to specify an object name for multi-object .Rdata files. This can be any R object (not just a data frame), see `trust` below.
#' \item Serialized R objects (.rds), using [base::readRDS()]. This can be any R object (not just a data frame), see `trust` below.
#' \item Serialized R objects (.qs), using [qs::qread()], which is
#' significantly faster than .rds. This can be any R
#' object (not just a data frame).
Expand Down Expand Up @@ -62,7 +62,11 @@
#'
#' After importing metadata-rich file formats (e.g., from Stata or SPSS), it may be helpful to recode labelled variables to character or factor using [characterize()] or [factorize()] respectively.
#'
#' For serialization formats (.R, .RDS, and .RData), please note that you should only load these files from trusted sources. It is because these formats are not necessarily for storing rectangular data and can also be used to store many things, e.g. code. Importing these files could lead to arbitary code execution. Please read the security principles by the R Project (Plummer, 2024). When importing these files via `rio`, you should affirm that you trust these files, i.e. `trust = TRUE`. See example below. If this affirmation is missing, the current version assumes `trust` to be true for backward compatibility and a deprecation notice will be printed. In the next major release (2.0.0), you must explicitly affirm your trust when importing these files.
#'
#' @note For csv and txt files with row names exported from [export()], it may be helpful to specify `row.names` as the column of the table which contain row names. See example below.
#' @references
#' Plummer, M (2024). Statement on CVE-2024-27322. [https://blog.r-project.org/2024/05/10/statement-on-cve-2024-27322/](https://blog.r-project.org/2024/05/10/statement-on-cve-2024-27322/)
#' @examples
#' ## For demo, a temp. file path is created with the file extension .csv
#' csv_file <- tempfile(fileext = ".csv")
Expand Down Expand Up @@ -99,6 +103,14 @@
#'
#' ## the default import class can be set with options(rio.import.class = "data.table")
#' ## options(rio.import.class = "tibble"), or options(rio.import.class = "arrow")
#'
#' ## Security
#' rds_file <- tempfile(fileext = ".rds")
#' export(iris, rds_file)
#'
#' ## You should only import serialized formats from trusted sources
#' ## In this case, you can trust it because it's generated by you.
#' import(rds_file, trust = TRUE)
#' @seealso [import_list()], [characterize()], [gather_attrs()], [export()], [convert()]
#' @export
import <- function(file, format, setclass = getOption("rio.import.class", "data.frame"), which, ...) {
Expand Down
10 changes: 5 additions & 5 deletions R/import_methods.R
Original file line number Diff line number Diff line change
Expand Up @@ -124,13 +124,13 @@ import_delim <- function(file, which = 1, sep = "auto", header = "auto", strings
}

#' @export
.import.rio_r <- function(file, which = 1, trust = TRUE, ...) {
.import.rio_r <- function(file, which = 1, trust = getOption("rio.import.trust", default = NULL), ...) {
.check_trust(trust, format = ".R")
.docall(dget, ..., args = list(file = file))
}

#' @export
.import.rio_dump <- function(file, which = 1, envir = new.env(), trust = TRUE, ...) {
.import.rio_dump <- function(file, which = 1, envir = new.env(), trust = getOption("rio.import.trust", default = NULL), ...) {
.check_trust(trust, format = "dump")
source(file = file, local = envir)
if (missing(which)) {
Expand All @@ -147,13 +147,13 @@ import_delim <- function(file, which = 1, sep = "auto", header = "auto", strings
}

#' @export
.import.rio_rds <- function(file, which = 1, trust = TRUE, ...) {
.check_trust(trust, format = "Rds")
.import.rio_rds <- function(file, which = 1, trust = getOption("rio.import.trust", default = NULL), ...) {
.check_trust(trust, format = "RDS")
readRDS(file = file)
}

#' @export
.import.rio_rdata <- function(file, which = 1, envir = new.env(), trust = TRUE, ...) {
.import.rio_rdata <- function(file, which = 1, envir = new.env(), trust = getOption("rio.import.trust", default = NULL), ...) {
.check_trust(trust, format = "RData")
load(file = file, envir = envir)
if (missing(which)) {
Expand Down
15 changes: 8 additions & 7 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -138,14 +138,15 @@ escape_xml <- function(x, replacement = c("&amp;", "&quot;", "&lt;", "&gt;", "&a
}

.check_trust <- function(trust, format) {
lifecycle::deprecate_warn(
when = "2.0.0",
what = "import(trust)",
details = paste0("Trust will be set to FALSE by default for ", format, ".")
)
if (is.null(trust)) {
lifecycle::deprecate_warn(
when = "1.0.3",
what = "import(trust = 'should be explicit for serialization formats')",
details = paste0("Missing `trust` will be set to FALSE by default for ", format, " in 2.0.0."))
trust <- TRUE ## Change this for version 2.0.0
}
if (isFALSE(trust)) {
stop(format, "files may execute arbitary code. Only load", format, "files that you personally generated or can trust the origin.", call. = FALSE)
stop(format, " files may execute arbitary code. Only load ", format, " files that you personally generated or can trust the origin.", call. = FALSE)
}
NULL
}

19 changes: 16 additions & 3 deletions man/import.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 0 additions & 10 deletions tests/testthat/test_format_R.R
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,3 @@ test_that("Export / Import to .R dump file", {
expect_true(is.data.frame(import(iris_file)))
})
})

test_that("Deprecation of untrusted dump", {
withr::with_tempfile("iris_file", fileext = ".dump", code = {
export(iris, iris_file)
## expect deprecation to work
lifecycle::expect_deprecated(import(iris_file), regexp = "set to FALSE by default")
## expect false to error
expect_error(import(iris_file, trust = FALSE))
})
})
7 changes: 0 additions & 7 deletions tests/testthat/test_format_rdata.R
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,6 @@ test_that("Export to and import from Rdata", {
## expect error otherwise
expect_error(export(iris$Species, iris_file))
})
withr::with_tempfile("iris_file", fileext = ".Rdata", code = {
export(iris, iris_file)
## expect deprecation to work
lifecycle::expect_deprecated(import(iris_file), regexp = "set to FALSE by default")
## expect false to error
expect_error(import(iris_file, trust = FALSE))
})
})

test_that("Export to and import from rda", {
Expand Down
11 changes: 0 additions & 11 deletions tests/testthat/test_format_rds.R
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,3 @@ test_that("Export to rds (non-data frame)", {
expect_true(length(import(list_file)) == 2L)
})
})

test_that("Deprecation of untrusted rds", {
withr::with_tempfile("iris_file", fileext = ".rds", code = {
export(iris, iris_file)
## expect deprecation to work
lifecycle::expect_deprecated(import(iris_file), regexp = "set to FALSE by default")
## expect false to error
expect_error(import(iris_file, trust = FALSE))
})
})

61 changes: 61 additions & 0 deletions tests/testthat/test_trust.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
test_that("Deprecation of untrusted dump", {
withr::with_tempfile("iris_file", fileext = ".dump", code = {
export(iris, iris_file)
## expect deprecation to work
lifecycle::expect_deprecated(import(iris_file), regexp = "set to FALSE by default")
## expect false to error
expect_error(import(iris_file, trust = FALSE))
})
})

test_that("Deprecation of untrusted Rdata", {
withr::with_tempfile("iris_file", fileext = ".Rdata", code = {
export(iris, iris_file)
## expect deprecation to work
lifecycle::expect_deprecated(import(iris_file), regexp = "set to FALSE by default")
## expect false to error
expect_error(import(iris_file, trust = FALSE))
})
})

test_that("Deprecation of untrusted rds", {
withr::with_tempfile("iris_file", fileext = ".rds", code = {
export(iris, iris_file)
## expect deprecation to work
lifecycle::expect_deprecated(import(iris_file), regexp = "set to FALSE by default")
## expect false to error
expect_error(import(iris_file, trust = FALSE))
})
})

test_that("No deprecation warning if `trust` is explicit", {
withr::with_tempfile("iris_file", fileext = ".rds", code = {
export(iris, iris_file)
expect_silent(import(iris_file, trust = TRUE))
expect_error(import(iris_file, trust = FALSE)) ## no warning right?
})
})

test_that("Undocumented feature, options", {
withr::with_options(list(rio.import.trust = TRUE), {
withr::with_tempfile("iris_file", fileext = ".rds", code = {
export(iris, iris_file)
expect_silent(import(iris_file))
expect_error(import(iris_file), NA)
})
})
withr::with_options(list(rio.import.trust = FALSE), {
withr::with_tempfile("iris_file", fileext = ".rds", code = {
export(iris, iris_file)
expect_error(import(iris_file))
})
})
})

test_that("`trust` wont cause problems for other import methods", {
withr::with_tempfile("iris_file", fileext = ".xlsx", code = {
export(iris, iris_file)
expect_silent(import(iris_file, trust = TRUE))
expect_error(import(iris_file, trust = FALSE), NA)
})
})
21 changes: 1 addition & 20 deletions vignettes/rio.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,6 @@ Additionally, any format that is not supported by **rio** but that has a known R
library("rio")
export(mtcars, "mtcars.csv")
export(mtcars, "mtcars.rds")
export(mtcars, "mtcars.dta")
export(mtcars, "mtcars_noext", format = "csv")
```
Expand All @@ -78,12 +77,10 @@ export(mtcars, "mtcars_noext", format = "csv")
library("rio")
x <- import("mtcars.csv")
y <- import("mtcars.rds")
z <- import("mtcars.dta")
y <- import("mtcars.dta")
# confirm identical
all.equal(x, y, check.attributes = FALSE)
all.equal(x, z, check.attributes = FALSE)
```

If for some reason a file does not have an extension, or has a file extension that does not match its actual type, you can manually specify a file format to override the format inference step. For example, we can read in a CSV file that does not have a file extension by specifying `csv`:
Expand All @@ -95,7 +92,6 @@ head(import("mtcars_noext", format = "csv"))

```{r, echo=FALSE, results='hide'}
unlink("mtcars.csv")
unlink("mtcars.rds")
unlink("mtcars.dta")
unlink("mtcars_noext")
```
Expand All @@ -110,7 +106,6 @@ str(import_list(dir()), 1)

Similarly, some single-file formats (e.g. Excel Workbooks, Zip directories, HTML files, etc.) can contain multiple data sets. Because `import()` is type safe, always returning a data frame, importing from these formats requires specifying a `which` argument to `import()` to dictate which data set (worksheet, file, table, etc.) to import (the default being `which = 1`). But `import_list()` can be used to import all (or only a specified subset, again via `which`) of data objects from these types of files.


## Data Export

The export capabilities of **rio** are somewhat more limited than the import capabilities, given the availability of different functions in various R packages and because import functions are often written to make use of data from other applications and it never seems to be a development priority to have functions to export to the formats used by other applications. That said, **rio** currently supports the following formats:
Expand All @@ -120,7 +115,6 @@ The export capabilities of **rio** are somewhat more limited than the import cap
library("rio")
export(mtcars, "mtcars.csv")
export(mtcars, "mtcars.rds")
export(mtcars, "mtcars.dta")
```

Expand All @@ -141,22 +135,12 @@ Some file formats (e.g., Excel workbooks, Rdata files) can support multiple data
export(list(mtcars = mtcars, iris = iris), "multi.xlsx")
```

```{r}
# export to an .Rdata file
## as a named list
export(list(mtcars = mtcars, iris = iris), "multi.rdata")
## as a character vector
export(c("mtcars", "iris"), "multi.rdata")
```

It is also possible to use the new (as of v0.6.0) function `export_list()` to write a list of data frames to multiple files using either a vector of file names or a file pattern:

```{r}
export_list(list(mtcars = mtcars, iris = iris), "%s.tsv")
```


## File Conversion

The `convert()` function links `import()` and `export()` by constructing a dataframe from the imported file and immediately writing it back to disk. `convert()` invisibly returns the file name of the exported file, so that it can be used to programmatically access the new file.
Expand Down Expand Up @@ -203,11 +187,8 @@ Rscript -e "rio::convert('mtcars.dta', 'mtcars.csv')"

```{r, echo=FALSE, results='hide'}
unlink("mtcars.csv")
unlink("mtcars.rds")
unlink("mtcars.rdata")
unlink("mtcars.dta")
unlink("multi.xlsx")
unlink("multi.rdata")
unlink("mtcars2.dta")
unlink("mtcars.tsv")
unlink("iris.tsv")
Expand Down

0 comments on commit c86db70

Please sign in to comment.