New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no way to download files directly to the disk with httr #44

Closed
ajdamico opened this Issue Jun 27, 2013 · 10 comments

Comments

Projects
None yet
4 participants
@ajdamico

ajdamico commented Jun 27, 2013

regarding this SO post on httr and this SO post on RCurl - it's currently impossible to load a large file if that file is bigger than available RAM.

i have 64GB of RAM, so these two lines work for me in R x64 and do not work in R 32-bit (both on windows) --

require(httr)
x <- GET( "http://www2.census.gov/acs2011_5yr/pums/csv_pus.zip" )
@hadley

This comment has been minimized.

Member

hadley commented Feb 28, 2014

I don't see any obvious way to enable this. Perhaps request functions could also be parameterised by their output type. By default, data would be stored in R, but alternatively you could request that it be saved to disk. But maybe this is outside of the scope of httr.

@randyzwitch

This comment has been minimized.

randyzwitch commented Mar 28, 2014

I'd love to have this functionality as well. Here's a link that describes the API that's a beta functionality to the API RSiteCatalyst provides access to:

https://marketing-beta.adobe.com/developer/en_US/documentation/analytics-firehose/overview-1

The functionality here will allow for users to stream the collection of their web analytics data being captured by Adobe Analytics (AA is a competitor to Google Analytics) directly via API, rather than wait for it to be collected and processed by Adobe. I'm not sure R is really the right tool for this operations type use case, but I've already written a lot of functionality against their REST API, so it seems natural to extend my package to include this as well.

streamR seems to use a combination of RCurl and writeLines from base to accomplish writing directly to file from the Twitter API.

@Ironholds

This comment has been minimized.

Contributor

Ironholds commented Aug 23, 2014

Well, an easier way of doing this would just be using, say, the downloader package, which does what it says on the tin.[0] The problem would be having httr recognise when something is to be downloaded to disk versus read in as a page. Personally I'm not seeing the argument for this to be built into httr - again, there are other libraries that handle it perfectly happily.

[1] I appreciate base R has download.file, but downloader's implementation handles HTTPS by default.

@ajdamico

This comment has been minimized.

ajdamico commented Aug 23, 2014

@Ironholds how would you solve the issue that i've presented? as i state in my original links, i need to authenticate first and then download a file that is too big for RAM. downloader and download.file do not have the authentication functionality but httr and RCurl pull everything into RAM. thanks!

@Ironholds

This comment has been minimized.

Contributor

Ironholds commented Aug 23, 2014

Ah, point; I missed the authentication-is-necessary bit. Moral of the story: no commenting on things before 2pm ;p. I suspect the answer is "don't use R" :/. As a general principle, R tends not to be a language oriented around streaming - it's oriented around "I have this one big pile of data all at once". StreamR is an outlier for a reason.

@ajdamico

This comment has been minimized.

ajdamico commented Aug 23, 2014

all of these scripts authenticate, download, import, and clean public-use survey data sets directly into R. the "quick start blocks" at the top of each script make the download automation a cinch. if i require users to run programs in multiple languages, the time-savings is gone: i might as well just ask them to log in and download the files by hand. if you can answer either of my SO posts, i'd appreciate it. thanks!

hadley added a commit that referenced this issue Aug 23, 2014

@hadley hadley closed this in a48c1f5 Aug 23, 2014

@hadley

This comment has been minimized.

Member

hadley commented Aug 23, 2014

I had an epiphany and realised that this is really pretty simple to implement. Enjoy :)

@Ironholds

This comment has been minimized.

Contributor

Ironholds commented Aug 23, 2014

NICE.

@randyzwitch

This comment has been minimized.

randyzwitch commented Aug 23, 2014

Looking forward to checking this out, thanks Hadley!

@ajdamico

This comment has been minimized.

ajdamico commented Aug 24, 2014

excellent, hadley. when i write up the instruction manual to work with international census data in the r language, i will use write_disk() for those giant microdata files. thank you for making this possible!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment