Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences in response body encoding on JRuby #413

Closed
janko opened this issue May 23, 2017 · 1 comment
Closed

Differences in response body encoding on JRuby #413

janko opened this issue May 23, 2017 · 1 comment
Assignees

Comments

@janko
Copy link
Member

janko commented May 23, 2017

When I run the following:

require "http"

response = HTTP.get("http://httpbin.org/image/png")
response.body.each do |chunk|
  puts chunk.encoding
end

MRI outputs ASCII-8BIT, whereas JRuby outputs UTF-8 encodings. The same thing happens when I swap the endpoint with http://httpbin.org/encoding/utf8.

I'm note sure what should be the correct general behaviour, whether chunks should always be in binary encoding, or only when charset isn't specified. But it seems that me that in the latter case the encoding should always be binary, both in MRI and JRuby.

I'm not sure yet whether this cross-Ruby inconsistency comes from HTTP.rb or http_parser.rb, but just wanted to report it here. I think a good solution would be to call force_encoding on the result of Response::Body#readpartial (just like we're doing in Response::Body#to_s).

@ixti
Copy link
Member

ixti commented May 23, 2017

It's definitely on our end (not http_parser.rb, as it's only responsible for headers parsing). And I agree that chunks must be either in encoding specified in headers, or binary (I believe we had that somewhere).

@ixti ixti self-assigned this May 23, 2017
janko added a commit to janko/http that referenced this issue May 24, 2017
MRI will return content read from the socket in the ASCII-8BIT (binary)
encoding, whereas JRuby will return it in the UTF-8 encoding. In
whichever encoding the body is retrieved, we want to force its encoding
to the one specified (charset response header if present, otherwise
binary). This is already the behaviour in Response#to_s, we just extend
it to Response#readpartial as well.

Fixes httprb#413
@ixti ixti closed this as completed in #414 May 29, 2017
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Dec 9, 2017
pkgsrc changes:
- sort DEPENDS

Upstream changes (from CHANGES.md):

## 3.0.0 (2017-10-01)

* Drop support of Ruby `2.0` and Ruby `2.1`.
  ([@ixti])

* [#410](httprb/http#410)
  Infer `Host` header upon redirects.
  ([@janko-m])

* [#409](httprb/http#409)
  Enables request body streaming on any IO object.
  ([@janko-m])

* [#413](httprb/http#413),
  [#414](httprb/http#414)
  Fix encoding of body chunks.
  ([@janko-m])

* [#368](httprb/http#368),
  [#357](httprb/http#357)
  Fix timeout issue.
  ([@HoneyryderChuck])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants