utf-8 encoding issue when uploading a string #266

Closed
schorsch opened this Issue Sep 9, 2013 · 9 comments

Projects

None yet

2 participants

@schorsch
schorsch commented Sep 9, 2013

i am reading a file content an put it as string into the reuqest body and get UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)

In connection.rb https://github.com/geemus/excon/blob/master/lib/excon/connection.rb#L176

if datum[:body].is_a?(String) # write out string body
  socket.write(request << datum[:body]) # write out request + headers + body

you concat the request string (encoded probably utf-8) with whatever encoding comes in from the body, in my case binary and this probably throws the error.

I think making two calls to socket.write could solve the problem. For now i'am saving the string to a tempfile and pass the file handle to excon .. not very elegant

@geemus
Contributor
geemus commented Sep 10, 2013

@schorsch - I believe if you just provide a file handle, ie File.open('path') as the value for body you pass in, that it should ensure that it does the proper invocations to make the encoding work. Could you try that and see if you have any better luck?

@ghost
ghost commented Sep 10, 2013

@schorsch Instead of using a tempfile, you could use StringIO. Excon.put(url, :body => StringIO.new(my_str))

@schorsch

@geemus yes this is what i am doing and it works, but i already have the binary string and dont want to create another tempfile
@burns thanks, i'll give it a try

I still think this is a bad workaround which could be prevented by not concatenating two strings. Btw i am using it together with fog S3.

@geemus
Contributor
geemus commented Sep 11, 2013

@schorsch - will it "just work" if the body is written separately from the rest? The current implementation was chosen for performance reasons (concatenation is a faster than writing to the socket). But if it solves this problem it seems like it might be worth the small overhead it imposes.

@ghost
ghost commented Sep 14, 2013

@geemus I don't think the overhead of a separate write would be an issue. But I'll let @schorsch confirm his use case.
However, I wanted to point out another solution. Instead of Excon treating a String body differently from an IO-like body, have Excon convert a String body to a StringIO. Note that StringIO.new(str) does not create a new string, but simply holds a pointer to str. I say this because I realized there was an issue with using a String body under ruby-1.8.7. Even with a relatively small string, all requests were timing out (see backup/backup@f6a8408). I decided not to bring it up at that time, since ruby-1.8.7 is EOL. But given this issue, I thought I'd mention it.

@geemus
Contributor
geemus commented Sep 16, 2013

@burns interesting. Treating both more similarly seems desirable in it's own right. As long as that takes care of the encoding issue it certainly seems worth doing.

@nelhage nelhage added a commit to nelhage/excon that referenced this issue Mar 4, 2016
@nelhage nelhage Write the first body chunk in the initial packet.
When #266 was fixed, it caused a regression in the behavior from #233,
which deliberately merged the initial writes to the socket, for network
efficiency and to avoid triggering the pathological behavior caused by
the combination of Nagle's Algorithm and TCP delayed ACKs.

Restore that optimization, in a slightly more general way: Do a
nonblocking read of an initial chunk of data off the provided body, and
merge that with the headers before sending.
8fc63ae
@geemus
Contributor
geemus commented Mar 15, 2016

I think this was fixed, but do let me know if you are still seeing issues.

@geemus geemus closed this Mar 15, 2016
@schorsch

Just recently re-checked my hack and removed it, so the issue is resolved for me, thanks for closing

@geemus
Contributor
geemus commented Mar 16, 2016

Great, thanks for confirming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment