Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default lang/locale should be UTF-8, not ASCII. #156

Closed
GrahamDumpleton opened this issue Nov 21, 2016 · 7 comments
Closed

Default lang/locale should be UTF-8, not ASCII. #156

GrahamDumpleton opened this issue Nov 21, 2016 · 7 comments

Comments

@GrahamDumpleton
Copy link
Contributor

People are used to the default lang/locale of an operating system when using the command line being a variant of UTF-8. The consequence of this is that people often unknowingly use or try to output strings, using print() from a Python web application or module which contains character strings that are valid UTF-8, but which will fail if the default lang/locale is ASCII.

When they try and deploy any code which does this to OpenShift it will fail when using recommended gunicorn, or if they roll their own web application as an app.py. For example:

[2016-11-21 02:28:10 +0000] [1] [INFO] Starting gunicorn 19.6.0
[2016-11-21 02:28:10 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
[2016-11-21 02:28:10 +0000] [1] [INFO] Using worker: sync
[2016-11-21 02:28:10 +0000] [28] [INFO] Booting worker with pid: 28
[2016-11-21 02:28:10 +0000] [29] [INFO] Booting worker with pid: 29
[2016-11-21 02:28:10 +0000] [30] [INFO] Booting worker with pid: 30
Internal Server Error: /
Traceback (most recent call last):
  File "/opt/app-root/src/.local/lib/python3.5/site-packages/django/core/handlers/exception.py", line 39, in inner
    response = get_response(request)
  File "/opt/app-root/src/.local/lib/python3.5/site-packages/django/core/handlers/base.py", line 249, in _legacy_get_response
    response = self._get_response(request)
  File "/opt/app-root/src/.local/lib/python3.5/site-packages/django/core/handlers/base.py", line 187, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/opt/app-root/src/.local/lib/python3.5/site-packages/django/core/handlers/base.py", line 185, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/opt/app-root/src/demo/views.py", line 4, in index
    print(u'\u292e')
UnicodeEncodeError: 'ascii' codec can't encode character '\u292e' in position 0: ordinal not in range(128)
10.1.7.1 - - [21/Nov/2016:02:28:13 +0000] "GET / HTTP/1.1" 500 27 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36"

This problem doesn't actually arise with mod_wsgi-express because Apache/mod_wsgi has been susceptible to this sort of problem a long time as when Apache is run by Linux from system startup scripts it is often stripped of the default UTF-8 lang/locale the system otherwise runs as, and it runs with ASCII instead. As a consequence, mod_wsgi-expresss corrects the situation to a more sane default so that users code doesn't keep blowing up all the time with the user not knowing why and then having to research what configuration changes they need to make.

It would be much more friendly to developers if the default lang/locale for their deployed web applications were en_US.UTF-8. This should be hardwired into the Docker image itself by setting both the LANG and LC_ALL environment variables in the Dockerfile.

ENV LANG=en_US.UTF-8
ENV LC_ALL=en_US.UTF-8

They can still override this via the .s2i/environment file if they need to change it some other variant of UTF-8 or other language.

@GrahamDumpleton
Copy link
Contributor Author

I have written a blog about this problem before:

@GrahamDumpleton
Copy link
Contributor Author

Hmmm, Gunicorn seems to fail to do the right thing even when these environment variables are set.

-- GLOBAL --
('en_US', 'UTF-8')
UTF-8
------------
-- REQUEST --
('en_US', 'UTF-8')
UTF-8
------------
[2016-12-06 20:20:30,923] ERROR in app: Exception on / [GET]
Traceback (most recent call last):
  File "/opt/app-root/src/.local/lib/python2.7/site-packages/flask/app.py", line 1988, in wsgi_app
    response = self.full_dispatch_request()
  File "/opt/app-root/src/.local/lib/python2.7/site-packages/flask/app.py", line 1641, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/opt/app-root/src/.local/lib/python2.7/site-packages/flask/app.py", line 1544, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/opt/app-root/src/.local/lib/python2.7/site-packages/flask/app.py", line 1639, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/app-root/src/.local/lib/python2.7/site-packages/flask/app.py", line 1625, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/opt/app-root/src/wsgi.py", line 13, in hello
    print(u'\u292e')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u292e' in position 0: ordinal not in range(128)

Additionally setting LC_LANG doesn't help.

Spin up python on the command line and works okay where as didn't before environment variables set. When I have run Python 3.5 app.py with aiohttpd application before problem was solved by environment variables. So Gunicorn is doing something nasty with its fiddling with stdout or stderr or need to provide other options to gunicorn when run is executing it.

@GrahamDumpleton
Copy link
Contributor Author

This problem is only occurring with gunicorn when it is run from the run script triggered on container start. If you get an interactive shell using oc rsh and in that run gunicorn against a wsgi.py file which prints out a problematic string on import it works fine. Similarly, if use oc debug and run run manually, works fine.

@GrahamDumpleton
Copy link
Contributor Author

This goes beyond gunicorn and seems to be an issue in general with Python 2.7 in a container where when not associated with a tty/interactive shell, default encoding doesn't get applied to stdout and stderr.

@GrahamDumpleton
Copy link
Contributor Author

Solution is that also need to set environment variable:

PYTHONIOENCODING=UTF-8

This is as well as LANG and LC_ALL.

@pkubatrh
Copy link
Member

pkubatrh commented Feb 6, 2017

Closed via #167

@pkubatrh pkubatrh closed this as completed Feb 6, 2017
@lily524
Copy link

lily524 commented Jun 19, 2019

for supervisor + gunicorn, can be config as below.

# other program config

[program:gunicorn]
command = gunicorn wsgi  --bind 0.0.0.0:8000 --log-level error ...
environment = LANG=en_US.UTF-8,LC_ALL=en_US.UTF-8,PYTHONIOENCODING=UTF-8

#other program config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants