Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding::UndefinedConversionError: "\xF0" from ASCII-8BIT to UTF-8 #786

Closed
elado opened this issue Oct 21, 2014 · 14 comments
Closed

Encoding::UndefinedConversionError: "\xF0" from ASCII-8BIT to UTF-8 #786

elado opened this issue Oct 21, 2014 · 14 comments
Labels

Comments

@elado
Copy link
Contributor

elado commented Oct 21, 2014

I get the following error when I post a file with extra body that contains UTF-8 chars, and trying to convert those fields to JSON:

Encoding::UndefinedConversionError: "\xF0" from ASCII-8BIT to UTF-8
    .rbenv/versions/2.1.3/lib/ruby/gems/2.1.0/gems/hashie-3.3.1/lib/hashie/hash.rb:42:in `encode'
    .rbenv/versions/2.1.3/lib/ruby/gems/2.1.0/gems/hashie-3.3.1/lib/hashie/hash.rb:42:in `to_json'
    .rbenv/versions/2.1.3/lib/ruby/gems/2.1.0/gems/hashie-3.3.1/lib/hashie/hash.rb:42:in `to_json'

Example app:

require 'grape'
require 'json'

class API < Grape::API
  format :json

  post do
    puts params[:body].to_json
  end
end

run API

Run with bundle exec rackup -p3000

Call with

curl -X POST --form "body=😈&file=@file.jpg" http://localhost:3000 --trace-ascii dump.txt

In this case the 😈 is an emoji (not sure it's visible here).

Since it's a multipart/form-data I can't set charset on the Content-Type as the request generates a random boundary. I'm not even sure it'd help.

Any suggestions?

@dblock
Copy link
Member

dblock commented Oct 21, 2014

Does setting LANG to en_US.UTF-8 help? I think this is jruby/jruby#290 (or similar).

@dblock dblock added the bug? label Oct 21, 2014
@elado
Copy link
Contributor Author

elado commented Oct 21, 2014

LANG header doesn't help.
It's not only emoji but any UTF-8 char.
It's MRI 2.1.3

@elado
Copy link
Contributor Author

elado commented Oct 21, 2014

Calling force_encoding helps:

class API < Grape::API
  post do
    params[:body].force_encoding(Encoding::UTF_8)
    { body: params[:body] }.to_json
  end
end

Maybe it should be called on all rack.input on every request?

Something like:

before { params.each { |k, v| v.force_encoding(Encoding::UTF_8) if v.is_a?(String) } }

@dblock
Copy link
Member

dblock commented Oct 21, 2014

I definitely don't think you should be doing force_encoding, neither should you. Maybe you need to POST with the correct encoding? Try -H "Content-Type: application/json; charset=UTF-8"

@dblock
Copy link
Member

dblock commented Oct 21, 2014

I just read that you're saying you can't generate a content type. This has to be possible, somehow?

@elado
Copy link
Contributor Author

elado commented Oct 21, 2014

Even when I force charset (I used Charles to do it) it doesn't use the right encoding:

POST / HTTP/1.1
Host: localhost:3000
Cache-Control: no-cache
Content-Type: multipart/form-data; charset=UTF-8; boundary=----WebKitFormBoundarydOP0PG1LSkoOoLaw
Accept: */*
Content-Length: 137

------WebKitFormBoundarydOP0PG1LSkoOoLaw
Content-Disposition: form-data; name="body"

😈
------WebKitFormBoundarydOP0PG1LSkoOoLaw--

Gives the same error.

Isn't it what Rails' hidden input utf8 resolving?

@elado
Copy link
Contributor Author

elado commented Oct 21, 2014

What's also weird is that only real UTF-8 strings are encoded as ASCII-8BIT

before { params.each { |k, v| ap [k, v, v.encoding] if v.is_a?(String) } }

[
    [0] "body",
    [1] "\xF0\x9F\x98\x88",
    [2] #<Encoding:ASCII-8BIT>
]
[
    [0] "extra",
    [1] "string",
    [2] #<Encoding:UTF-8>
]

force_encoding seems to work just fine, even though it sounds 'risky'...

@dblock
Copy link
Member

dblock commented Oct 22, 2014

I think this is a legit bug. Maybe you can try to turn it into a spec?

@elado
Copy link
Contributor Author

elado commented Oct 22, 2014

dblock added a commit to dblock/grape that referenced this issue Oct 27, 2014
@jimhj
Copy link

jimhj commented Oct 30, 2014

I had the same error when I post some Chinese characters.

@sbounmy
Copy link

sbounmy commented Nov 24, 2014

Ok I just had the same issue here.
I believe Content-Type should be set on the client side for each field but couldnt find how. Something like:

POST / HTTP/1.1
Host: localhost:3000
Cache-Control: no-cache
Content-Type: multipart/form-data; charset=UTF-8; boundary=----WebKitFormBoundarydOP0PG1LSkoOoLaw
Accept: */*
Content-Length: 137

------WebKitFormBoundarydOP0PG1LSkoOoLaw
Content-Disposition: form-data; name="body"
Content-Type: plain/text; charset=UTF-8;
😈
------WebKitFormBoundarydOP0PG1LSkoOoLaw--

@chrisdebruin
Copy link

have the same issue, anybody resolved this problem without doing the force_encoding hack?

@dm1try
Copy link
Member

dm1try commented May 8, 2015

@chrisdebruin , try update your rack to version ~> 1.6.0.
Seems like this encoding problem is fixed here.

BTW, @sbounmy is right, rfc2388 says

As with all multipart MIME types, each part has an optional
"Content-Type", which defaults to text/plain. If the contents of a
file are returned via filling out a form, then the file input is
identified as the appropriate media type, if known, or
"application/octet-stream". If multiple files are to be returned as
the result of a single form entry, they should be represented as a
"multipart/mixed" part embedded within the "multipart/form-data".

You can use curl command to check it out (it has type option for the form requests)
curl -X POST -F "body=😈;type=text/plain; charset=UTF-8" http://localhost:3000 (try another encoding because UTF-8 is set by default, see the commit above)
It will work if rack ~>1.6.0 is used .

I'll close this.

@dm1try dm1try closed this as completed May 8, 2015
@chrisdebruin
Copy link

@dm1try thnx works for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants