-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorized curl_download? #166
Comments
I have thought about this but I am not sure how to do proper exception handling that way. What should happen if 1 of the downloads fails but the other ones are OK? Should it raise an error and delete all the files? Or just return FALSE for the files that failed? |
It should be possible to design a good API. E.g. for failures you would want to return an error object for that download, not just FALSE. |
What about the destfile argument? Should it be a vector of equal length as the input url? Or should the user be able to specify a directory, and curl will automatically guess the filesnames and save them to that directory? |
I think that's a good start. We can add the directory approach later. That one is tricky, because you need to sanitize the output file names. E.g. |
And a lot of URL's done have an obvious filename, i.e when they are a REST endpoint to some oid. |
Yeah, another good reason. I think just a vector of output file names is fine. |
Sorry to reactivate that issue but I have a question related to that topic (it seems)! You raise the fact that we can use map and curl_download but it will dl files one by one right? |
@lemairev You could do the parallelism at the process level by creating a cluster and using clus <- parallel::makeCluster(10) # or whatever number you want
parallel::clusterMap(clus, function(src, dest)
{
curl::curl_fetch_disk(src, dest) # don't forget to check the result
}, srcfiles, destfiles) This isn't technically asynchronous, but for large numbers of small files it'll still be much faster than downloading sequentially. The number of processes in the cluster isn't so important since you'll generally be constrained by your network bandwidth more than memory or CPU. You can kill the cluster afterwards, or keep it around if you know you're going to be using it again. |
@Hong-Revo Thank you for the suggestion! I'll try that :) cheers |
Have a look at the multi_download interface. |
I often have a character vector of URLs that represent data files that I'd like to download to a local directory. I can use
purrr:map2
withcurl_download
pretty easily to grab these files; however, I'm a little sad that I can't just pass vectors tocurl_download
. I know I should probably usecurl_fetch_multi
for large vectors but for the usual case I am getting a few dozen files from a reliable server and don't want to go to the trouble of writing callback handlers for an async API.The text was updated successfully, but these errors were encountered: