Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conflicting which for compressed formats with multiple sheets #412

Closed
chainsawriot opened this issue May 14, 2024 · 1 comment
Closed

Conflicting which for compressed formats with multiple sheets #412

chainsawriot opened this issue May 14, 2024 · 1 comment
Labels

Comments

@chainsawriot
Copy link
Collaborator

chainsawriot commented May 14, 2024

Of course, one can argue why anyone would use compressed formats with multiple sheets in the first place, e.g. xlsx.zip. But a bug is a bug.

The issue is that the which parameter of import() is used twice: first for selecting a file in the archive, and second for selecting a sheet.

rio/R/import.R

Line 131 in c86db70

file <- parse_archive(file, which = which, file_type = "zip")

rio/R/import.R

Line 156 in c86db70

x <- .import(file = file, which = which, ...)

In order not to make thing more complicated (such as introducing new parameters for such an edge case), my suggestion is simply to make some precedence rules.

zip_file <- tempfile(fileext = ".xlsx.zip")

rio::export(head(iris), zip_file)

raw_file <- utils::unzip(zip_file, list = TRUE)$Name[1]

rio::import(zip_file)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa

## this is fine-ish, I guess?
rio::import(zip_file, which = "aaaa.xlsx")
#> Warning in extract_func(file, files = file_list[grep(which2, file_list)[1]], :
#> requested file not found in the zip file
#> Error: `path` does not exist: '/tmp/RtmpH9K6ta/file831fb50f53589/aaaa.xlsx'

rio::import(zip_file, which = raw_file)
#> Error: Sheet 'file831fb5a3e85e.xlsx' not found

## a more illustrative example

zip_file2 <- tempfile(fileext = ".xlsx.zip")

rio::export(list(first_sheet = head(iris), second_sheet = tail(iris)), zip_file2)

xlsx_file <- tempfile(fileext = ".xlsx")

rio::export(list(first_sheet = head(iris), second_sheet = tail(iris)), xlsx_file)

rio::import(zip_file2, which = 1)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa
rio::import(zip_file2, which = 2)
#> Warning in extract_func(file, files = file_list[which], exdir = d): requested
#> file not found in the zip file
#> Error: 'file' has no extension

rio::import(xlsx_file, which = 1)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa
rio::import(xlsx_file, which = 2)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#> 1          6.7         3.3          5.7         2.5 virginica
#> 2          6.7         3.0          5.2         2.3 virginica
#> 3          6.3         2.5          5.0         1.9 virginica
#> 4          6.5         3.0          5.2         2.0 virginica
#> 5          6.2         3.4          5.4         2.3 virginica
#> 6          5.9         3.0          5.1         1.8 virginica

Created on 2024-05-14 with reprex v2.1.0

@chainsawriot
Copy link
Collaborator Author

ref #400

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant