-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
safer file saving (check md5 sums) #207
Labels
new feature 🎁
New feature request
Milestone
Comments
Potentially more robust solution, but requires additional dependencies. Maybe a separate package would do better. fs_file_copy <- function(path, new_path, overwrite = NA, quiet = FALSE, archive = NA) {
msg <- if (quiet) function(...) invisible() else message
new_exists <- fs::file_exists(new_path)
if (new_exists && isFALSE(overwrite)) {
msg("File already exists, skipping: ", new_path)
return(invisible(new_path))
}
if (!new_exists || isTRUE(overwrite)) {
msg("Copying file: ", new_path)
return(fs::file_copy(path, new_path, overwrite = isTRUE(overwrite)))
}
if (compare_md5sum(path, new_path, archive = archive)) {
msg("md5 hashes match, skipping: ", new_path)
return(invisible(new_path))
}
msg("md5 hashes do not match, updating: ", new_path)
fs::file_copy(path, new_path, overwrite = TRUE)
}
compare_md5sum <- function(path, new_path, archive = NA) {
if (is.na(archive)) {
archive <- fs::path_ext(new_path) %in% c("zip", "docx", "xlsx", "7z")
}
# handle special cases ... eventually move to a switch() statement
if (fs::path_ext(new_path) == "docx") {
# require officer
old <- as_md5sum(officer::docx_summary(officer::read_docx(path)))
new <- as_md5sum(officer::docx_summary(officer::read_docx(new_path)))
} else if (archive) {
tmp_path <- fs::file_temp("old_path_", ext = fs::path_ext(path))
fs::file_copy(path, tmp_path)
on.exit(fs::file_delete(tmp_path), add = TRUE)
old_archive <- fs::file_temp("old_files_")
new_archive <- fs::file_temp("new_files_")
fs::dir_create(old_archive)
fs::dir_create(new_archive)
on.exit({
fs::dir_delete(old_archive)
fs::dir_delete(new_archive)
}, add = TRUE)
utils::unzip(tmp_path, exdir = old_archive)
utils::unzip(new_path, exdir = new_archive)
old <- fs::dir_ls(old_archive, type = "file", recurse = TRUE)
new <- fs::dir_ls(new_archive, type = "file", recurse = TRUE)
# for ms stuff
#> old <- grep("[trash]", old, value = TRUE, fixed = TRUE, invert = TRUE)
#> new <- grep("[trash]", new, value = TRUE, fixed = TRUE, invert = TRUE)
old <- vapply(old, tools::md5sum, NA_character_, USE.NAMES = FALSE)
new <- vapply(new, tools::md5sum, NA_character_, USE.NAMES = FALSE)
} else {
old <- unname(tools::md5sum(path))
new <- unname(tools::md5sum(new_path))
}
identical(old, new)
}
as_md5sum <- function(x) {
temp <- fs::file_temp()
on.exit(fs::file_delete(temp))
jsonlite::write_json(x, temp)
unname(tools::md5sum(temp))
} |
jmbarbone
added a commit
that referenced
this issue
Nov 19, 2023
jmbarbone
added a commit
that referenced
this issue
Nov 19, 2023
jmbarbone
added a commit
that referenced
this issue
Nov 25, 2023
Merged
jmbarbone
added a commit
that referenced
this issue
Nov 25, 2023
jmbarbone
added a commit
that referenced
this issue
Nov 25, 2023
jmbarbone
added a commit
that referenced
this issue
Nov 25, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
and making for writing, too
The text was updated successfully, but these errors were encountered: