Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote_to_local requires refactoring #403

Closed
chainsawriot opened this issue Apr 29, 2024 · 4 comments
Closed

remote_to_local requires refactoring #403

chainsawriot opened this issue Apr 29, 2024 · 4 comments

Comments

@chainsawriot
Copy link
Collaborator

The two large if blocks and the repeated handling of google urls scream refactoring.

@chainsawriot
Copy link
Collaborator Author

This bunch of code is not tested

rio/R/remote_to_local.R

Lines 34 to 52 in 144ebca

if (format == "TMP") {
# try to extract format from curl's final URL
format <- try(get_info(u$url)$format, silent = TRUE)
if (inherits(format, "try-error")) {
# try to extract format from headers
h1 <- curl::parse_headers(u$headers)
# check `Content-Disposition` header
if (any(grepl("^Content-Disposition", h1))) {
h <- h1[grep("filename", h1)]
if (length(h)) {
f <- regmatches(h, regexpr("(?<=\")(.*)(?<!\")", h, perl = TRUE))
if (!length(f)) {
f <- regmatches(h, regexpr("(?<=filename=)(.*)", h, perl = TRUE))
}
f <- paste0(dirname(temp_file), "/", f)
file.copy(from = temp_file, to = f)
unlink(temp_file)
temp_file <- f
}

@chainsawriot
Copy link
Collaborator Author

The logic in this bunch of code looks incorrect to me. But I don't have a URL that can test this code.

rio/R/remote_to_local.R

Lines 43 to 52 in 019f0e4

if (length(h)) {
f <- regmatches(h, regexpr("(?<=\")(.*)(?<!\")", h, perl = TRUE))
if (!length(f)) {
f <- regmatches(h, regexpr("(?<=filename=)(.*)", h, perl = TRUE))
}
f <- paste0(dirname(temp_file), "/", f)
file.copy(from = temp_file, to = f)
unlink(temp_file)
temp_file <- f
}

@chainsawriot
Copy link
Collaborator Author

#36 archeological dig. And the commit closing this 2172307 has no tests.

I am asking myself: is it popular to have a URL of a file without a file extension, even with the final redirected url, but with the Content Disposition? If it is really the case, is it still rio's responsibility to stiff the curl HEADER for it's file format? It appears to go back to #238.

@chainsawriot
Copy link
Collaborator Author

A colleague pointed me to this MDN article.

The Content-Disposition header is defined in the larger context of MIME messages for email, but only a subset of the possible parameters apply to HTTP forms and POST requests. Only the value form-data, as well as the optional directive name and filename, can be used in the HTTP context.

Probably, it is only for a URL that is like a HTTP form request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant