UnicodeEncodeError when no locale is set #309

andreasWallner · 2014-03-12T01:39:18Z

I'm running jinja2 on python 3.3.
Normally, rendering templates with Unicode characters works as expected, but if I execute my scripts on a shell without a proper locale, it fails with an UnicodeEncodeError.

Testcase:

from jinja2 import Template
t = Template('{{s}}')
t.stream(s='ë').dump('testfile')

In my 'special' shell, this produces an "UnicodeEncodeError: 'ascii' codec can't encode character..."

The reason for the error is that dump(...) just calls open('testfile', 'w') if there is no encoding specified. In my locale-less shell this open the file in Ascii mode, since python uses the locale as a default. It works as expected when calling dump with an explicit encoding.

I do not think that behaviour is per-se incorrect, but the 'Unicode' chapter of the Documentation states that the 'default encoding' used for templates is utf-8. A note there about the locale would be nice if you choose to keep the behavior the same.
I would be willing to write a patch, but I wanted to ask first what the intended behavior was.

The text was updated successfully, but these errors were encountered:

berkerpeksag · 2014-03-12T02:16:43Z

What is your Jinja2 version? I can't reproduce this with Python 3.3.4 and Jinja2 2.8-dev.

>>> from jinja2 import Template
>>> t = Template('{{s}}')
>>> t.stream(s='ë').dump('testfile2')
>>> open('testfile2').read()
'ë'

You can also the optional encoding parameter of the dump method:

t.stream(s='ë').dump('testfile2', encoding='utf-8')

andreasWallner · 2014-03-12T22:47:18Z

My Jinja2 version is 2.7.
As I said in my original message, it works as expected when I execute the code in a normal shell. The problem arises only when there is no environment set up. My script is being executed in a shell that has e.g. no locale set up (and I have no control over it so that I can setup a locale in the shell). But the problem also arises in environments where the locale is set to something else than an unicode encoding.

If you want to test it you can just remove your environment variables and try:

# bash
# unset `env | awk -F= '/^\w/ {print $1}' | xargs`
# ./test.py
Traceback (most recent call last):
  File "./test.py", line 4, in <module>
    Template('{{s}}').stream(s='\xeb').dump('test.file')
  File "/usr/lib64/python3.3/site-packages/jinja2/environment.py", line 1142, in dump
    fp.writelines(iterable)
UnicodeEncodeError: 'ascii' codec can't encode character '\xeb' in position 0: ordinal not in range(128)

With test.py being

#!/usr/bin/python3
from jinja2 import Template
Template('{{s}}').stream(s='ë').dump('test.file')

I knew about the encoding parameter, I said that using an explicit encoding it works. That is also why I said that I would not say that this is surely an error. It is just unexpected that the system locale makes the jinja output fail if there is no explicit encoding specified. I interpreted the documentation in a way that it would default to unicode, not to the system locale (which does not make that much sense in the case of python3, because there it will simply not work without the encoding paramter set (since all strings are Unicode anyhow))

If this is the intended behavior a short note in the Unicode section of the docs would be nice, otherwise it could be a solution to default to utf-8.

If no encoding parameter is given, the file is opened in 'w' mode, which will default to locale.getpreferredencoding(False) not to a unicode like hinted at in the old documentation.

mitsuhiko · 2014-06-06T16:42:24Z

This is a Python 3 bug. There is a workaround for this in place now but please file this against Python itself.

andreasWallner · 2014-09-23T20:56:22Z

@mitsuhiko

Thanks for the fix, I checked up on it since I wanted to file it as a Python bug, and it seems though that this is the intended behaviour.
From the Python 3.3 docs for open() (3.5 is the same):

In text mode, if encoding is not specified the encoding used is platform dependent:
locale.getpreferredencoding(False) is called to get the current locale encoding.

From the getpreferredencoding() docs:

Return the encoding used for text data, according to user preferences.

So you basically depend on the users locale to be set to utf-8 (or something that can handle all utf-8 characters). With your fix now this should be OK anyway, I just wanted to respond that I wanted to file this against Python, and that this is the behaviour Python intends.

mitsuhiko added a commit that referenced this issue Jun 6, 2014

Fixed issue #309

e7086db

mitsuhiko closed this as completed Jun 6, 2014

Wrzlprmft mentioned this issue Mar 10, 2018

Problem trying the first example in mac os x neurophysik/jitcdde#10

Closed

github-actions bot locked as resolved and limited conversation to collaborators Nov 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeEncodeError when no locale is set #309

UnicodeEncodeError when no locale is set #309

andreasWallner commented Mar 12, 2014

berkerpeksag commented Mar 12, 2014

andreasWallner commented Mar 12, 2014

mitsuhiko commented Jun 6, 2014

andreasWallner commented Sep 23, 2014

UnicodeEncodeError when no locale is set #309

UnicodeEncodeError when no locale is set #309

Comments

andreasWallner commented Mar 12, 2014

berkerpeksag commented Mar 12, 2014

andreasWallner commented Mar 12, 2014

mitsuhiko commented Jun 6, 2014

andreasWallner commented Sep 23, 2014