Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

safer file saving (check md5 sums) #207

Closed
jmbarbone opened this issue Oct 12, 2023 · 1 comment · Fixed by #223
Closed

safer file saving (check md5 sums) #207

jmbarbone opened this issue Oct 12, 2023 · 1 comment · Fixed by #223
Labels
new feature 🎁 New feature request
Milestone

Comments

@jmbarbone
Copy link
Owner

fs_file_copy <- function(path, new_path, overwrite = NA, quiet = FALSE) {
  msg <- if (quiet) function(...) invisible() else message
  new_exists <- fs::file_exists(new_path)

  if (new_exists && isFALSE(overwrite)) {
    msg("File already exists, skipping: ", new_path)
    return(invisible(new_path))
  }

  if (!new_exists || isTRUE(overwrite)) {
    msg("Copying file: ", new_path)
    return(fs::file_copy(path, new_path, overwrite = isTRUE(overwrite)))
  }

  old <- tools::md5sum(path)
  new <- tools::md5sum(new_path)

  if (old == new) {
    msg("md5 hashes match, skipping: ", new_path)
    return(invisible(new_path))
  }

  msg("md5 hashes do not match, updating: ", new_path)
  fs::file_copy(path, new_path, overwrite = TRUE)
}

and making for writing, too

write_file <- function(
    data,
    file = stdout(),
    method = NULL,
    overwrite = NA,
    quiet = FALSE,
    ...
) {
  if (is.null(method)) {
    method <- fs::path_ext(path)
  }

  method <- switch(
    method,
    delim = "table",
    txt = "lines",
    md = "lines",
    method
  )

  write_function <- switch(
    method,
    csv = utils::write.csv,
    csv2 = utils::write.csv2,
    table = utils::write.table,
    rds = function(x, file, ...) saveRDS(object = x, file = file, ...),
    tsv = function(x, file, sep = "\t", ...) utils::write.table(x = x, file = file, sep = sep, ...),
    lines = function(x, file, ...) writeLines(text = x, con = file, ...)
  )

  params <- rlang::list2(...)
  params$file <- fs::file_temp(ext = fs::path_ext(file))
  params$x <- data
  do.call(write_function, params)
  fs_file_copy(params$file, file, overwrite = overwrite, quiet = quiet)
}
@jmbarbone
Copy link
Owner Author

Potentially more robust solution, but requires additional dependencies. Maybe a separate package would do better.

fs_file_copy <- function(path, new_path, overwrite = NA, quiet = FALSE, archive = NA) {
  msg <- if (quiet) function(...) invisible() else message
  new_exists <- fs::file_exists(new_path)

  if (new_exists && isFALSE(overwrite)) {
    msg("File already exists, skipping: ", new_path)
    return(invisible(new_path))
  }

  if (!new_exists || isTRUE(overwrite)) {
    msg("Copying file: ", new_path)
    return(fs::file_copy(path, new_path, overwrite = isTRUE(overwrite)))
  }

  if (compare_md5sum(path, new_path, archive = archive)) {
    msg("md5 hashes match, skipping: ", new_path)
    return(invisible(new_path))
  }

  msg("md5 hashes do not match, updating: ", new_path)
  fs::file_copy(path, new_path, overwrite = TRUE)
}

compare_md5sum <- function(path, new_path, archive = NA) {
  if (is.na(archive)) {
    archive <- fs::path_ext(new_path) %in% c("zip", "docx", "xlsx", "7z")
  }

  # handle special cases ... eventually move to a switch() statement
  if (fs::path_ext(new_path) == "docx") {
    # require officer
    old <- as_md5sum(officer::docx_summary(officer::read_docx(path)))
    new <- as_md5sum(officer::docx_summary(officer::read_docx(new_path)))
  } else if (archive) {
    tmp_path <- fs::file_temp("old_path_", ext = fs::path_ext(path))

    fs::file_copy(path, tmp_path)
    on.exit(fs::file_delete(tmp_path), add = TRUE)

    old_archive <- fs::file_temp("old_files_")
    new_archive <- fs::file_temp("new_files_")

    fs::dir_create(old_archive)
    fs::dir_create(new_archive)

    on.exit({
      fs::dir_delete(old_archive)
      fs::dir_delete(new_archive)
    }, add = TRUE)

    utils::unzip(tmp_path, exdir = old_archive)
    utils::unzip(new_path, exdir = new_archive)

    old <- fs::dir_ls(old_archive, type = "file", recurse = TRUE)
    new <- fs::dir_ls(new_archive, type = "file", recurse = TRUE)

    # for ms stuff
    #> old <- grep("[trash]", old, value = TRUE, fixed = TRUE, invert = TRUE)
    #> new <- grep("[trash]", new, value = TRUE, fixed = TRUE, invert = TRUE)

    old <- vapply(old, tools::md5sum, NA_character_, USE.NAMES = FALSE)
    new <- vapply(new, tools::md5sum, NA_character_, USE.NAMES = FALSE)
  } else {
    old <- unname(tools::md5sum(path))
    new <- unname(tools::md5sum(new_path))
  }

  identical(old, new)
}

as_md5sum <- function(x) {
  temp <- fs::file_temp()
  on.exit(fs::file_delete(temp))
  jsonlite::write_json(x, temp)
  unname(tools::md5sum(temp))
}

@jmbarbone jmbarbone added this to the 0.8.0 milestone Oct 19, 2023
@jmbarbone jmbarbone added the new feature 🎁 New feature request label Nov 16, 2023
jmbarbone added a commit that referenced this issue Nov 19, 2023
jmbarbone added a commit that referenced this issue Nov 19, 2023
jmbarbone added a commit that referenced this issue Nov 19, 2023
jmbarbone added a commit that referenced this issue Nov 19, 2023
jmbarbone added a commit that referenced this issue Nov 19, 2023
jmbarbone added a commit that referenced this issue Nov 19, 2023
jmbarbone added a commit that referenced this issue Nov 19, 2023
jmbarbone added a commit that referenced this issue Nov 19, 2023
jmbarbone added a commit that referenced this issue Nov 19, 2023
jmbarbone added a commit that referenced this issue Nov 25, 2023
jmbarbone added a commit that referenced this issue Nov 25, 2023
@jmbarbone jmbarbone mentioned this issue Nov 25, 2023
jmbarbone added a commit that referenced this issue Nov 25, 2023
jmbarbone added a commit that referenced this issue Nov 25, 2023
jmbarbone added a commit that referenced this issue Nov 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature 🎁 New feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant