Skip to content

use_zip error in unzipping directory with a different name from the zipfile #1961

@burnsal

Description

@burnsal

I am running into a missing value error trying to download and unzip large files with the use_zip function.

I was able to isolate the issue to an incompatibility with the top_directory function for extracting the head directory of the zip files in the helper function tidy_unzip. Here is the reprex with output included:

## usethis::use_zip() issue reprex

# Goal: download and unzip large files stored on a Figshare platform

# long link: https://agdatacommons.nal.usda.gov/ndownloader/files/44576230
# bitly link: https://bit.ly/ctdv2sp

# attempt with existing function
> usethis::use_zip("https://bit.ly/ctdv2sp",
                 destdir = "C:/temp")
✔ Downloading from 'https://bit.ly/ctdv2sp'
Downloaded: 312.74 MB  (100%)
✔ Download stored in 'C:/temp/species_v2_3.zip'
Error in if (length(unique_top) > 1 || !is_directory) { : 
  missing value where TRUE/FALSE needed

# separate process into its helper functions to source the error
> usethis:::tidy_download("https://bit.ly/ctdv2sp",
                        destdir = "C:/temp")
Downloaded: 312.74 MB  (100%)

> usethis:::tidy_unzip("C:/temp/species_v2_3.zip")
Error in if (length(unique_top) > 1 || !is_directory) { : 
  missing value where TRUE/FALSE needed


## Dig deeper into `tidy_unzip`

# ABnote: first, define zipfile downloaded with `tidy_download`
> zipfile <- "C:/temp/species_v2_3.zip"
> file.exists(zipfile)
[1] TRUE

> base_path <- path_dir(zipfile)
> print(base_path)
[1] "C:/temp"

> filenames <- utils::unzip(zipfile, list = TRUE)[["Name"]]
> print(filenames) # you can see that the zipfile has a single-folder directory with a different name than the zip file
[1] "species_v2/label_encoder.txt" "species_v2/model_arch.pt"     "species_v2/model_weights.pth"

# dropbox particularites do not apply here, skip next two lines

> td <- top_directory(filenames)
Error in if (length(unique_top) > 1 || !is_directory) { : 
  missing value where TRUE/FALSE needed

## AH HA! Look at `top_directory` function
> in_top <- path_dir(filenames) == "."
[1] FALSE FALSE FALSE
> unique_top <- unique(filenames[in_top])
character(0)
# I think the code should extract the dir `species_v2` from my zipfile here
> is_directory <- grepl("/$", unique_top)
logical(0)

# address conditionals one at a time
> length(unique_top) > 1 
[1] FALSE
> !is_directory
logical(0)

# conditional statement is throwing the error b/c directory was not correctly extracted
> length(unique_top) > 1 || !is_directory
[1] NA
> if (length(unique_top) > 1 || !is_directory) {
+   NA_character_
+ } else {
+   unique_top
+ }
Error in if (length(unique_top) > 1 || !is_directory) { : 
  missing value where TRUE/FALSE needed

The code creating the top directory is not correctly addressing the structure of my zip files and extracting the unique directory within the zipfile.

# current structure
> in_top <- path_dir(filenames) == "."
> all(in_top) == FALSE
[1] TRUE
> unique_top <- unique(filenames[in_top])
character(0)

# the folder name at the top of the directory inside the zip file should be extracted
> path_dir(filenames)
[1] "species_v2" "species_v2" "species_v2"
> unique(path_dir(filenames))
[1] "species_v2"

A revised top_directory function makes this work:

top_directory <- function(filenames) {
  in_top <- path_dir(filenames) == "."
  unique_top <- unique(filenames[in_top])
  is_directory <- grepl("/$", unique_top)
  if(length(unique(path_dir(filenames)))==1 & length(unique_top) == 0){
    unique_top <- unique(path_dir(filenames))
  } else {
    if (length(unique_top) > 1 || !is_directory) {
      NA_character_
    } else {
      unique_top
    }
  }
}

# run use_zip with the revised helper
use_zip(url = "https://bit.ly/ctdv2sp",
+         destdir = "C:/temp", cleanup=T)
✔ Downloading from <https://bit.ly/ctdv2sp>.
Downloaded: 312.74 MB  (100%)
✔ Download stored in C:/temp/species_v2_3.zip.Unpacking ZIP file into species_v2/ (3 files extracted).Deleting species_v2_3.zip.Opening species_v2/ in the file manager.

I have made this change to a new branch in my forked clone of the repository here, and it passes the devtools::check tests. Please let me know if this change needs to be more generalized or if it is ready for a pull request. Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions