unicode(exception) and str(exception) should return the same message on Py2.6 #50358

ezio-melotti · 2009-05-26T04:53:13Z

BPO	6108
Nosy	@warsaw, @ncoghlan, @pitrou, @rbtcollins, @ezio-melotti
Files	issue6108_testcase.diff: Testcase that checks if str() and unicode() return the same message output_on_py26.txt: Output of unicode_exceptions.py on Python 2.6 unicode_exceptions.py: Example script that shows the str() and unicode() of some exceptions issue6108.diff: Proof of concept that fixes UnicodeDecodeException issue6108-2.patch: Patch that makes all the tests in issue6108_testcase pass (except for KeyError). issue6108-3.patch: Patch that makes all the tests in issue6108_testcase pass. issue6108-4.patch: Patch + unittests against trunk issue6108-5.patch: Patch + unittests against trunk issue6108-6.patch: Patch + unittests against trunk

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/ezio-melotti'
closed_at = <Date 2009-12-24.23:03:08.792>
created_at = <Date 2009-05-26.04:53:13.017>
labels = ['interpreter-core', 'type-bug', 'release-blocker']
title = 'unicode(exception) and str(exception) should return the same message on Py2.6'
updated_at = <Date 2009-12-24.23:03:08.789>
user = 'https://github.com/ezio-melotti'

bugs.python.org fields:

activity = <Date 2009-12-24.23:03:08.789>
actor = 'ezio.melotti'
assignee = 'ezio.melotti'
closed = True
closed_date = <Date 2009-12-24.23:03:08.792>
closer = 'ezio.melotti'
components = ['Interpreter Core']
creation = <Date 2009-05-26.04:53:13.017>
creator = 'ezio.melotti'
dependencies = []
files = ['15314', '15315', '15316', '15529', '15534', '15535', '15632', '15650', '15651']
hgrepos = []
issue_num = 6108
keywords = ['patch']
message_count = 25.0
messages = ['88330', '92561', '92568', '93313', '95158', '96281', '96297', '96313', '96314', '96315', '96318', '96319', '96321', '96322', '96323', '96324', '96325', '96331', '96717', '96742', '96755', '96756', '96761', '96762', '96872']
nosy_count = 7.0
nosy_names = ['barry', 'exarkun', 'ncoghlan', 'pitrou', 'rbcollins', 'ezio.melotti', 'cvrebert']
pr_nums = []
priority = 'release blocker'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue6108'
versions = ['Python 2.6', 'Python 2.7']

ezio-melotti · 2009-05-26T04:52:58Z

On Python 2.5 str(exception) and unicode(exception) return the same text:
>>> err
UnicodeDecodeError('ascii', '\xc3\xa0', 0, 1, 'ordinal not in range(128)')
>>> str(err)
"'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in
range(128)"
>>> unicode(err)
u"'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in
range(128)"

On Python 2.6 unicode(exception) returns unicode(exception.args):
>>> err
UnicodeDecodeError('ascii', '\xc3\xa0', 0, 1, 'ordinal not in range(128)')
>>> str(err)
"'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in
range(128)"
>>> unicode(err)
u"('ascii', '\\xc3\\xa0', 0, 1, 'ordinal not in range(128)')"

This seems to affect only exceptions with more than 1 arg (e.g.
UnicodeErrors and SyntaxErrors). KeyError is also different (the '' are
missing with unicode()).

Note that when an exception like ValueError() is instantiated with more
than 1 arg even str() returns str(exception.args) on both Py2.5 and Py2.6.

Probably __str__() checks the number of args before returning a specific
message and if it doesn't match it returns str(self.args). __unicode__()
instead seems to always return unicode(self.args) on Py2.6.

Attached there's a script that prints the repr(), str() and unicode() of
some exceptions, run it on Py2.5 and Py2.6 to see the differences.

exarkun · 2009-09-13T05:00:40Z

Perhaps also worth noting is that in Python 2.4 as well, str(exception)
and unicode(exception) returned the same thing. Unlike some other
exception changes in 2.6, this doesn't seem to be a return to older
behavior, but just a new behavior. (Or maybe no one cares about that;
just wanted to point it out, though.)

pitrou · 2009-09-13T14:36:34Z

Looks like a potentially annoying bug to me.

warsaw · 2009-09-29T18:31:08Z

Since we do not yet have a patch for this, I'm knocking it off the list
for 2.6.3. It seems like an annoying loss of compatibility, but do we
have any reports of it breaking real-world code?

ezio-melotti · 2009-11-12T05:22:49Z

I added the output of unicode_exceptions.py on Py2.6 and a testcase
(against the trunk) that fails for 5 different exceptions, including the
IOError mentioned in bpo-6890 (also added to unicode_exceptions.py).
The problem has been introduced by bpo-2517.

ezio-melotti · 2009-12-12T03:21:57Z

In r64791, BaseException gained a new __unicode__ method that does the
equivalent of the following things:

if the number of args is 0, returns u''
if it's 1 returns unicode(self.args[0])
if it's >1 returns unicode(self.args)

Before this, BaseException only had a __str__ method, so unicode(e)
(with e being an exception derived from BaseException) called:

e.__str__().decode(), if e didn't implement __unicode__
e.__unicode__(), if e implemented an __unicode__ method

Now, all the derived exceptions that don't implement their own
__unicode__ method inherit the "generic" __unicode__ of BaseException,
and they use that instead of falling back on __str__.
This is generally ok if the numbers of args is 0 or 1, but if there are
more args, there's usually some specific formatting in the __str__
method that is lost when BaseException.__unicode__ returns
unicode(self.args).

Possible solutions:

implement a __unicode__ method that does the equivalent of calling
unicode(str(self)) (i.e. converting to unicode the message returned by
__str__ instead of converting self.args);
implement a __unicode__ method that formats the message as __str__
for all the exceptions with a __str__ that does some specific formatting;

Attached there's a proof of concept (bpo-6108.diff) where I tried to
implement the first method with UnicodeDecodeError. This method can be
used as long as __str__ always returns only ascii.

The patch seems to work fine for me (note: this is my first attempt to
use the C API). If the approach is correct I can do the same for the
other exceptions too and submit a proper patch.

pitrou · 2009-12-12T15:53:07Z

In r64791, BaseException gained a new __unicode__ method that does the
equivalent of the following things:

It remains to be seen why that behaviour was chosen. Apparently Nick
implemented it.
IMO __unicode__ should have the same behaviour as __str__. There's no
reason to implement two different formatting algorithms.

ncoghlan · 2009-12-13T01:09:44Z

Following this down the rabbit hole a little further: Issue bpo-2517 (the
origin of my checkin) was just a restoration of the __unicode__ slot
implementation that had been ripped out in r51837 due to Issue bpo-1551432.

At the time of the r64791 checkin, BaseException_str and
BaseException_unicode were identical aside from the type of object
returned (checking SVN head shows they're actually still identical).

However, it looks like several exceptions with __str__ overrides (i.e.
Unicode[Encode/Decode/Translate]Error_str, EnvironmentError_str,
WindowsError_str. SyntaxError_str, KeyError_str) are missing
corresponding __unicode__ overrides, so invoking unicode() on them falls
back to the BaseException_unicode implementation instead of using the
custom formatting behaviour of the subclass.

ezio-melotti · 2009-12-13T01:39:56Z

IMO __unicode__ should have the same behaviour as __str__. There's no
reason to implement two different formatting algorithms.

If BaseException has both the methods they have to be both overridden by
derived exceptions in order to have the same behaviour. The simplest way
to do it is to convert the string returned by __str__ to unicode, as I
did in bpo-6108.diff.
If you have better suggestions let me know.

pitrou · 2009-12-13T01:52:48Z

Well the obvious problem with this approach is that it won't work if
__str__() returns a non-ascii string. The only working solution would be
to replicate the functioning of __str__() in each __unicode__()
implementation.

ncoghlan · 2009-12-13T04:24:37Z

As Antoine said, there's a reason BaseException now implements both
__str__ and __unicode__ and doesn't implement the latter in terms of the
former - it's the only way to consistently support Unicode arguments
that can't be encoded to an 8-bit ASCII string:

>>> str(Exception(u"\xc3\xa0"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-1: ordinal not in range(128)
>>> unicode(Exception(u"\xc3\xa0"))
u'\xc3\xa0'

For some of the exception subclasses that will always return ASCII (e.g.
KeyError, which calls repr() on its arguments) then defining __unicode__
in terms of __str__ as Ezio suggests will work.

For others (as happened with BaseException itself), the __unicode__
method will need to be a reimplementation that avoids trying to encode
potentially non-ASCII characters into an 8-bit ASCII string.

ezio-melotti · 2009-12-13T04:39:30Z

What you said is only a special case, and I agree that the solution
introduced with r64791 is correct for that. However, that fix has the
side effect of breaking the code in other situations.

To summarize the possible cases and the behaviours I prepared the
following list (odd numbers -> BaseException; even numbers -> any
exception with overridden __str__ and no __unicode__.):

0 args, e = Exception():
py2.5 : str(e) -> ''; unicode(e) -> u''
py2.6 : str(e) -> ''; unicode(e) -> u''
desired: str(e) -> ''; unicode(e) -> u''
Note: this is OK
0 args, e = MyException(), with overridden __str__:
py2.5 : str(e) -> 'ascii' or error; unicode(e) -> u'ascii' or error;
py2.6 : str(e) -> 'ascii' or error; unicode(e) -> u''
desired: str(e) -> 'ascii' or error; unicode(e) -> u'ascii' or error;
Note: py2.5 behaviour is better: if __str__ returns an ascii string
(including ''), unicode(e) should return the same string decoded, if
__str__ returns a non-ascii string, both should raise an error.

3a) 1 str arg, e = Exception('foo'):
py2.5 : str(e) -> 'foo'; unicode(e) -> u'foo'
py2.6 : str(e) -> 'foo'; unicode(e) -> u'foo'
desired: str(e) -> 'foo'; unicode(e) -> u'foo'
Note: this is OK

3b) 1 non-ascii unicode arg, e = Exception(u'föö'):
py2.5 : str(e) -> error; unicode(e) -> error
py2.6 : str(e) -> error; unicode(e) -> u'föö'
desired: str(e) -> error; unicode(e) -> u'föö'
Note: py2.6 behaviour is better: unicode(e) should return u'föö'

1 unicode arg, e = MyException(u'föö'), with overridden __str__:
py2.5 : str(e) -> error or 'ascii'; unicode(e) -> error or u'ascii'
py2.6 : str(e) -> error or 'ascii'; unicode(e) -> u'föö'
desired: str(e) -> error or 'ascii'; unicode(e) -> error or u'ascii'
Note: py2.5 behaviour is better: if __str__ returns an ascii string
str(e) should work, otherwise it should raise an error. unicode(e)
should return the ascii string decoded or an error, not the arg.
>1 args of any type, e = Exception('foo', u'föö', 5):
py2.5 : str(e) -> "('foo', u'f\\xf6\\xf6', 5)";
unicode(e) -> u"('foo', u'f\\xf6\\xf6', 5)";
py2.6 : str(e) -> "('foo', u'f\\xf6\\xf6', 5)";
unicode(e) -> u"('foo', u'f\\xf6\\xf6', 5)";
desired: str(e) -> "('foo', u'f\\xf6\\xf6', 5)";
unicode(e) -> u"('foo', u'f\\xf6\\xf6', 5)";
Note: this is OK
>1 args of any type, e = MyException('foo', u'föö', 5), with
overridden __str__:
py2.5 : str(e) -> 'ascii' or error; unicode(e) -> u'ascii' or error;
py2.6 : str(e) -> 'ascii' or error; unicode(e) -> u"('foo',
u'f\\xf6\\xf6', 5)";
desired: str(e) -> 'ascii' or error; unicode(e) -> u'ascii' or error;
Note: py2.5 behaviour is better: if __str__ returns an ascii string,
unicode(e) should return the same string decoded, if __str__ returns a
non-ascii string, both should raise an error.

As you can see, your example corresponds just to the case 3b) (now
fixed), but cases 2, 4, 6 are now broken.

Making this list allowed me to come out with a new patch, that seems to
solve all the problems (2, 4 and 6 while leaving 3b as it is now). The
only exception is for KeyError, if we want it to print the repr, then
KeyError_unicode should be implemented, but I think that Python only
calls str() so it's probably not necessary.

Attached new patch that passes all the tests in issue6108_testcase
except for KeyError. Unless you disagree with the 'desired behaviours'
that I listed, this patch should fix the issue.

rbtcollins · 2009-12-13T04:44:57Z

"2) 0 args, e = MyException(), with overridden __str__:
py2.5 : str(e) -> 'ascii' or error; unicode(e) -> u'ascii' or error;
py2.6 : str(e) -> 'ascii' or error; unicode(e) -> u''
desired: str(e) -> 'ascii' or error; unicode(e) -> u'ascii' or error;
Note: py2.5 behaviour is better: if __str__ returns an ascii string
(including ''), unicode(e) should return the same string decoded, if
__str__ returns a non-ascii string, both should raise an error.
"

I'm not sure how you justify raising an unnecessary error when trying to
stringify an exception as being 'better'.

__str__ should not decode its arguments if they are already strings:
they may be valid data for the user even if they are not decodable (and
note that an implicit decode may try to decode('ascii') which is totally
useless.

__str__ and __unicode__ are /different/ things, claiming they have to
behave the same is equivalent to claiming either that we don't need
unicode, or that we don't need binary data.

Surely there is space for both things, which does imply that
unicode(str(e)) != unicode(e).

Why _should_ that be the same anyway?

ncoghlan · 2009-12-13T04:49:12Z

I agree the 2.6 implementation creates backwards compatibility problems
with subclasses that only override __str__ that we didn't recognise at
the time.

An alternative approach that should work even for the KeyError case is
for BaseException_unicode to check explicitly for the situation where
the __str__ slot has been overridden but __unicode__ is still the
BaseException version and invoke "PyObject_Unicode(PyObject_Str(self))"
when it detects that situation.

That way subclasses that only override __str__ would continue to see the
old behaviour, while subclasses that don't override either would
continue to benefit from the new behaviour.

ezio-melotti · 2009-12-13T04:51:42Z

Assume the case of e = MyException() (note: 0 args) with a __str__ that
returns a default message. Now, if the message is ascii, str(e) works
and the user see the default message but unicode(e) returns a
not-so-useful empty string.
On the other hand, if __str__ returns a non-ascii string, then it's
wrong in the first place, because str(e) will fail and returning an
empty string with unicode(e) is not going to help.

ezio-melotti · 2009-12-13T04:56:45Z

An alternative approach that should work even for the KeyError case is
for BaseException_unicode to check explicitly for the situation where
the __str__ slot has been overridden but __unicode__ is still the
BaseException version and invoke "PyObject_Unicode(PyObject_Str(self))"
when it detects that situation.

This is even better, I'll try to do it.

ezio-melotti · 2009-12-13T06:01:50Z

Here is a new patch (bpo-6108-3.patch) that checks if __str__ has been
overridden and calls PyObject_Unicode(PyObject_Str(self)).

All the tests (including the one with KeyError) in
issue6108_testcase.diff now pass.

If the patch is OK I'll make sure that the tests cover all the possible
cases that I listed and possibly add a few more before the commit.

pitrou · 2009-12-13T12:52:52Z

You should check the return value from PyObject_Str().

ezio-melotti · 2009-12-20T19:29:51Z

I created a comprehensive set of tests to check all the possibilities
that I listed in msg96319 and updated the patch for Object/exceptions.c.
Without patch all the test_*with_overridden__str_ and
test_builtin_exceptions fail, both on 2.6 and on trunk, with the patch
all the tests pass.
The code in exceptions.c now does the equivalent of unicode(e.__str__())
instead of unicode(str(e)). If e.__str__() returns a non-ascii unicode
string, unicode() now shows the message instead of raising an error.

pitrou · 2009-12-21T11:54:50Z

I created a comprehensive set of tests to check all the possibilities
that I listed in msg96319 and updated the patch for Object/exceptions.c.

Great!
Small thing: in tests, you should use setUp() to initialize test data
rather than __init__().

ezio-melotti · 2009-12-21T16:00:29Z

I updated the patch and moved the helper class outside the __init__.

pitrou · 2009-12-21T16:05:58Z

This looks fine, module the slight style issue mentioned on IRC. Please
commit after you fix it.
(this is assuming all tests pass, of course!)

ezio-melotti · 2009-12-21T17:40:36Z

This should be the final patch (bpo-6108-6.patch). I update the
comments, checked that (some of) the tests fail without the patch, that
they (all) pass with it and that there are no leaks.
I plan to backport this on 2.6 and possibly port the tests to py3k and 3.1.

pitrou · 2009-12-21T18:34:06Z

It's ok for me.

ezio-melotti · 2009-12-24T23:03:08Z

Fixed in r77045 (trunk) and r77046 (release26-maint). No need to port it
to py3k since unicode() is gone.

ezio-melotti added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels May 26, 2009

birkenfeld added the release-blocker label Sep 16, 2009

warsaw removed the release-blocker label Sep 29, 2009

ezio-melotti changed the title ~~unicode(exception) behaves differently on Py2.6 when len(exception.args) > 1~~ unicode(exception) and str(exception) should return the same message on Py2.6 Nov 12, 2009

ezio-melotti added the release-blocker label Nov 12, 2009

ezio-melotti self-assigned this Dec 12, 2009

ezio-melotti closed this as completed Dec 24, 2009

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unicode(exception) and str(exception) should return the same message on Py2.6 #50358

unicode(exception) and str(exception) should return the same message on Py2.6 #50358

ezio-melotti commented May 26, 2009

ezio-melotti commented May 26, 2009

exarkun mannequin commented Sep 13, 2009

pitrou commented Sep 13, 2009

warsaw commented Sep 29, 2009

ezio-melotti commented Nov 12, 2009

ezio-melotti commented Dec 12, 2009

pitrou commented Dec 12, 2009

ncoghlan commented Dec 13, 2009

ezio-melotti commented Dec 13, 2009

pitrou commented Dec 13, 2009

ncoghlan commented Dec 13, 2009

ezio-melotti commented Dec 13, 2009

rbtcollins commented Dec 13, 2009

ncoghlan commented Dec 13, 2009

ezio-melotti commented Dec 13, 2009

ezio-melotti commented Dec 13, 2009

ezio-melotti commented Dec 13, 2009

pitrou commented Dec 13, 2009

ezio-melotti commented Dec 20, 2009

pitrou commented Dec 21, 2009

ezio-melotti commented Dec 21, 2009

pitrou commented Dec 21, 2009

ezio-melotti commented Dec 21, 2009

pitrou commented Dec 21, 2009

ezio-melotti commented Dec 24, 2009

unicode(exception) and str(exception) should return the same message on Py2.6 #50358

unicode(exception) and str(exception) should return the same message on Py2.6 #50358

Comments

ezio-melotti commented May 26, 2009

ezio-melotti commented May 26, 2009

exarkun mannequin commented Sep 13, 2009

pitrou commented Sep 13, 2009

warsaw commented Sep 29, 2009

ezio-melotti commented Nov 12, 2009

ezio-melotti commented Dec 12, 2009

pitrou commented Dec 12, 2009

ncoghlan commented Dec 13, 2009

ezio-melotti commented Dec 13, 2009

pitrou commented Dec 13, 2009

ncoghlan commented Dec 13, 2009

ezio-melotti commented Dec 13, 2009

rbtcollins commented Dec 13, 2009

ncoghlan commented Dec 13, 2009

ezio-melotti commented Dec 13, 2009

ezio-melotti commented Dec 13, 2009

ezio-melotti commented Dec 13, 2009

pitrou commented Dec 13, 2009

ezio-melotti commented Dec 20, 2009

pitrou commented Dec 21, 2009

ezio-melotti commented Dec 21, 2009

pitrou commented Dec 21, 2009

ezio-melotti commented Dec 21, 2009

pitrou commented Dec 21, 2009

ezio-melotti commented Dec 24, 2009