Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding::UndefinedConversionError #732

Closed
bsv9 opened this issue Sep 8, 2014 · 11 comments
Closed

Encoding::UndefinedConversionError #732

bsv9 opened this issue Sep 8, 2014 · 11 comments

Comments

@bsv9
Copy link

bsv9 commented Sep 8, 2014

Found this errors in my error.log

2014-09-08 21:16:58 +0000: Read error: #<Encoding::UndefinedConversionError: "\xC3" to UTF-8 in conversion from ASCII-8BIT to UTF-8 to US-ASCII>
/usr/local/rvm/gems/ruby-1.9.3-p484@global/gems/rack-1.4.5/lib/rack/commonlogger.rb:46:in `write'
/usr/local/rvm/gems/ruby-1.9.3-p484@global/gems/rack-1.4.5/lib/rack/commonlogger.rb:46:in `log'
/usr/local/rvm/gems/ruby-1.9.3-p484@global/gems/puma-2.8.0/lib/puma/rack_patch.rb:20:in `block in call'
/usr/local/rvm/gems/ruby-1.9.3-p484@global/gems/puma-2.8.0/lib/puma/server.rb:642:in `call'
/usr/local/rvm/gems/ruby-1.9.3-p484@global/gems/puma-2.8.0/lib/puma/server.rb:642:in `block in handle_request'
/usr/local/rvm/gems/ruby-1.9.3-p484@global/gems/puma-2.8.0/lib/puma/server.rb:642:in `each'
/usr/local/rvm/gems/ruby-1.9.3-p484@global/gems/puma-2.8.0/lib/puma/server.rb:642:in `handle_request'
/usr/local/rvm/gems/ruby-1.9.3-p484@global/gems/puma-2.8.0/lib/puma/server.rb:361:in `process_client'
/usr/local/rvm/gems/ruby-1.9.3-p484@global/gems/puma-2.8.0/lib/puma/server.rb:254:in `block in run'
/usr/local/rvm/gems/ruby-1.9.3-p484@global/gems/puma-2.8.0/lib/puma/thread_pool.rb:92:in `call'
/usr/local/rvm/gems/ruby-1.9.3-p484@global/gems/puma-2.8.0/lib/puma/thread_pool.rb:92:in `block in spawn_thread'

option stdout_redirect

stdout_redirect 'log/access.log', 'log/error.log', true

ENV: ruby 1.9.3-p484, puma-2.7.1
OS: Ubuntu 12.10 x86_64

@raggi
Copy link
Member

raggi commented Sep 9, 2014

Looks like your platform has locale set to ascii, not utf-8, so default_external is trying to downconvert on write, but it's been a while so I'm partially guessing.

@raggi
Copy link
Member

raggi commented Sep 9, 2014

In case that means nothing to you, try:

export LANG=en_US.UTF-8

Before launching your server.

@bsv9
Copy link
Author

bsv9 commented Sep 9, 2014

Setting UTF8 locale doesn't help me

shaman@d4436:~/someprj$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
shaman@d4436:~/someprj$ RACK_ENV=staging bundle exec puma -p 11012 config.ru
Puma starting in single mode...
* Version 2.8.0 (ruby 1.9.3-p392), codename: Sir Edmund Percival Hillary
* Min threads: 0, max threads: 16
* Environment: staging
* Listening on tcp://0.0.0.0:11012
Use Ctrl-C to stop
2014-09-09 22:07:11 +0000: Read error: #<Encoding::UndefinedConversionError: "\xC3" from ASCII-8BIT to UTF-8>
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/rack-1.4.5/lib/rack/commonlogger.rb:46:in `write'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/rack-1.4.5/lib/rack/commonlogger.rb:46:in `log'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/puma-2.8.0/lib/puma/rack_patch.rb:20:in `block in call'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/puma-2.8.0/lib/puma/server.rb:642:in `call'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/puma-2.8.0/lib/puma/server.rb:642:in `block in handle_request'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/puma-2.8.0/lib/puma/server.rb:642:in `each'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/puma-2.8.0/lib/puma/server.rb:642:in `handle_request'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/puma-2.8.0/lib/puma/server.rb:361:in `process_client'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/puma-2.8.0/lib/puma/server.rb:254:in `block in run'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/puma-2.8.0/lib/puma/thread_pool.rb:92:in `call'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/puma-2.8.0/lib/puma/thread_pool.rb:92:in `block in spawn_thread'
^C- Gracefully stopping, waiting for requests to finish
- Goodbye!

UTF8 request

[shaman@bofh ~]$ cat ttt | nc -i 1 -w 2 dev.local 80
HTTP/1.1 200 OK
Server: nginx
Date: Tue, 09 Sep 2014 22:09:41 GMT
Content-Type: text/html;charset=utf-8
Content-Length: 145
Connection: keep-alive
Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: 0
P3P: CP='IDC DSP COR ADM DEVi TAIi PSA PSD IVAi IVDi CONi HIS OUR IND CNT'

<html><body><h1 style="width:270px;height:150px;position:absolute;left:50%;top:50%;margin:-75px 0 0 -135px;">Yaaaaarrrrr!!!111</h1></body></html>
[shaman@bofh ~]$
[shaman@bofh ~]$ cat ttt
GET /yarr?À=À\xC3\x80  HTTP/1.1
Host: dev.local
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.77.4 (KHTML, like Gecko) Version/7.0.5 Safari/537.77.4
Accept-Language: en-us
Accept-Encoding: gzip, deflate

@raggi
Copy link
Member

raggi commented Sep 9, 2014

Ah i see, it's path info escaping issues. so this is actually an illegal URI per 2396.

Setting the locale is changing things, but it's not valid utf-8, so setting to utf-8 doesn't actually change anything.

We should probably open the log in binary mode and put data in it, but there are potential security implications to logging raw user binary to files.

@jeremy was talking about this stuff the other day too, I forget which issue/PR it was on.

In the meantime, you could fix this by providing a custom logger as the second parameter to the initializer, that logs in binary.

e.g.

logfile = open('log/binary.log', 'a+')
logfile.binmode
use Rack::CommonLogger, logfile
run App.new

@dre-hh
Copy link

dre-hh commented Dec 19, 2014

We had some similar issues on passenger rails, which lead us to submit the following issue

phusion/passenger#1328

Shouldn't the rack interface respect ruby's Encoding.default_external ?

One of the author's of passenger says "The spec really should clarify this" and i'm sharing his opinion.

@raggi
Copy link
Member

raggi commented Dec 19, 2014

No. As I said on the other issue, default_external does NOT specify anything about data over sockets.

@raggi
Copy link
Member

raggi commented Dec 19, 2014

@dre-hh
Copy link

dre-hh commented Dec 19, 2014

I've checked some rails default app servers. Puma and webrick encode them in UTF-8, thin and unicorn into ASCII-8BIT.

LC_ALL setting has no effect on it, however people suggest to set this on several issue lists or posts.

For those who stumblling upon it, just use force_encoding('utf-8'). If it's already utf-8, it will have no effect and if not any ascii string should be losslessly forced into utf-8.

Your are right, Encoding.default_external has nothing to with it. The fragmented situation is just bad, imho.

@raggi
Copy link
Member

raggi commented Dec 19, 2014

To be really really clear: LC_ALL should NOT affect any data coming from HTTP requests, period.

@raggi
Copy link
Member

raggi commented Dec 19, 2014

Yes, the fragmented situation is bad. Here's where the bad comes from:

  1. The HTTP spec, not specifying encodings. The new RFCs have some shoulds, but that's not really sufficient for Rack (it is for Sinatra and Rails, Sinatra always puts the rack-utf8sanitizer in the middleware stack to force everything to utf8, and rails has something similar IIRC).
  2. The URI spec, for not specifying encodings. Much the same story as above.
  3. Ruby for confusing users with it's lack of opinion about encodings. While it's cool to be encoding agnostic in a theoretical sense, and this is probably really useful in some places (e.g. Japan) it's a real pain for most users. Most users would be quite happy for most use cases if all of the data inside their app was converted forcefully to UTF-8 on the way in, and only encoded differently on the way out. Unfortunately this doesn't work for all use cases, but it is what Rails intends to do.
  4. Rack for choosing to be agnostic here. We try to support all use cases (e.g. hard interop problems in Japan), rather than say "sorry, if you have a client that sends us a funky encoding you have to write your own HTTP layer".

There may yet be a real problem in the things you describe, but this situation is subtle.

If you can describe only what you observe, not what you "expect" that would help with diagnosis, as I can't easily determine in between expectations what's conjecture and what isn't. Unfortunately these issues are subtle and it's too easy for me to make a mistake if I'm not focusing clearly.

To be clear here too: this does mean that I'm suggesting the logger should also not follow Encoding.default_external, as transcoding in a logger in this setting is bad.

The logger in particular can only be fixed as I describe above. We don't really have a good choice alternative. Clients can send illegal data, and if you want a log of that you must accept to log it in it's raw binary form. Doing any transcoding would be a lie and prevent you from diagnosing root causes, should you mistakenly (and this is common) trust the logger.

@jeremyevans
Copy link
Contributor

So it seems this issue is related to how Rack::CommonLogger's logger object (which is passed as an argument) handles encoded data. That doesn't seem to be a bug in Rack itself, and I don't think Rack::CommonLogger can fix the issue, because it can only assume the logger object supports write or <<.

I think Rack's SPEC's handling of encoding (treating input as binary, since you don't know the encoding) is fine and should not be changed.

Basically, this issue can be fixed by the software using Rack::CommonLogger providing an argument that supports write or << with strings in any encoding. As this isn't a bug in Rack, I'm going to close the issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants