New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upload_file fails with files larger than ~2.15 GB #257

Closed
zachmayer opened this Issue Jun 30, 2015 · 17 comments

Comments

Projects
None yet
3 participants
@zachmayer

zachmayer commented Jun 30, 2015

I have a largish file (~4.9GB) I am trying to upload with POST. I can upload the first few lines no problem with:

POST(
    url = url_string,
    body = upload_file(short_file),
    verbose()
)

Which yields:

Response [http://myserver:port/path/to/api]
  Date: 2015-06-30 11:11
  Status: 200
  Content-Type: application/json;charset=UTF-8
  Size: 4 B

However, the larger version of the file fails to upload:

POST(
      url = url_string,
      body = upload_file(long_file),
      verbose()
      )

Which yields:

Response [http://myserver:port/path/to/api]
  Date: 2015-06-30 11:11
  Status: 400
  Content-Type: <unknown>
<EMPTY BODY>
Warning message:
In curl::handle_setopt(handle, .list = req$options) :
  NAs introduced by coercion

The verbose output from the upload is:

-> POST http://myserver:port/path/to/api
-> User-Agent: libcurl/7.35.0 r-curl/0.9 httr/1.0.0
-> Host: http://myserver::port
-> Accept-Encoding: gzip, deflate
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: text/csv
-> Content-Length: -2147483648
-> Expect: 100-continue
-> 
<- HTTP/1.1 400 Bad Request
<- Server: Apache-Coyote/1.1
<- Transfer-Encoding: chunked
<- Date: Tue, 30 Jun 2015 11:11:11 GMT
<- Connection: close

And my sessionInfo() is:

R version 3.1.1 (2014-07-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] httr_1.0.0

loaded via a namespace (and not attached):
[1] magrittr_1.5  R6_2.0.1      stringi_0.5-5 stringr_1.0.0 tools_3.1.1  

Content-Length: -2147483648 looks a lot like an integer overflow. I strongly suspect this function will always fail with files over ~2.2 GB, but haven't tested other cases.

@hadley

This comment has been minimized.

Member

hadley commented Jul 1, 2015

@jeroenooms this is probably a curl issue?

@zachmayer

This comment has been minimized.

zachmayer commented Jul 1, 2015

It may well be an issue with the curl package. I can post an issue there, if you'd like.

My system curl handles the file just fine, so it might be a problem specific to the R library.

@jeroen

This comment has been minimized.

Member

jeroen commented Jul 1, 2015

Hum. Content-Length: -2147483648 looks suspicious :)

@zachmayer

This comment has been minimized.

zachmayer commented Jul 1, 2015

@jeroenooms Hah, yes it does.

@jeroen

This comment has been minimized.

Member

jeroen commented Jul 1, 2015

This seems to work fine:

h <- new_handle(verbose = T)
handle_setform(h, description = form_file("~/Desktop/3gb.bin"))
req <- curl_fetch_memory("http://httpbin.org/post", handle = h)
* Hostname was NOT found in DNS cache
*   Trying 54.175.222.246...
* Connected to httpbin.org (54.175.222.246) port 80 (#0)
> POST /post HTTP/1.1
User-Agent: r/curl/jeroen
Host: httpbin.org
Accept: */*
Accept-Encoding: gzip, deflate
Content-Length: 3221225678
Expect: 100-continue
Content-Type: multipart/form-data; boundary=------------------------4ffba8824c7fbf91

< HTTP/1.1 100 Continue
@zachmayer

This comment has been minimized.

zachmayer commented Jul 1, 2015

Ok, let me see if I can reproduce the bug.

@jeroen

This comment has been minimized.

Member

jeroen commented Jul 1, 2015

It does not happen when we use multipart:

POST(
    url = url_string,
    body = list(test = upload_file(short_file)),
    verbose()
)

It only happens when upload_file is directly used as the payload.

@zachmayer

This comment has been minimized.

zachmayer commented Jul 1, 2015

Ahah! So that's what I've been doing (The api I'm working with requires it).

@jeroen

This comment has been minimized.

Member

jeroen commented Jul 1, 2015

I have a suspicion. When posting raw data httr sets the postfieldsize option which might get converted to an integer...

@jeroen

This comment has been minimized.

Member

jeroen commented Jul 1, 2015

I pushed a fix to keep numbers as doubles and then cast them to long long instead of using regular integers. Can you test if this solves your problem?

library(devtools)
install_github("jeroenooms/curl")

We should make sure this does not introduce any side effects though.

@zachmayer

This comment has been minimized.

zachmayer commented Jul 1, 2015

I'll check. Thanks!

@jeroen

This comment has been minimized.

Member

jeroen commented Jul 1, 2015

@hadley we should fix this in httr as well. The curl doc states:

If you post more than 2GB, use CURLOPT_POSTFIELDSIZE_LARGE.

@jeroen

This comment has been minimized.

Member

jeroen commented Jul 1, 2015

Actually I'm reverting the curl fix, it should be done in httr by setting CURLOPT_POSTFIELDSIZE_LARGE instead of CURLOPT_POSTFIELDSIZE in body_config.

@jeroen

This comment has been minimized.

Member

jeroen commented Jul 2, 2015

@zachmayer can you confirm that the problems have been resolved with the latest versions?

library(devtools)
install_github("jeroenooms/curl")
install_github("hadley/httr")
@zachmayer

This comment has been minimized.

zachmayer commented Jul 2, 2015

That works. Thank you. When do you think this change will be on CRAN?

@hadley

This comment has been minimized.

Member

hadley commented Jul 4, 2015

The bigquery breaking bug is a bit of pain, so a new release is moderately high priority, but I'm doing a lot of travel in the next few weeks, so I wouldn't expect anything before July 20

@jeroen

This comment has been minimized.

Member

jeroen commented Jul 4, 2015

New curl is on cran now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment