Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

int(u"\u1234") raises UnicodeEncodeError #37601

Closed
gvanrossum opened this issue Dec 11, 2002 · 4 comments
Closed

int(u"\u1234") raises UnicodeEncodeError #37601

gvanrossum opened this issue Dec 11, 2002 · 4 comments
Assignees

Comments

@gvanrossum
Copy link
Member

BPO 652104
Nosy @gvanrossum, @loewis, @doerwalter

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/loewis'
closed_at = <Date 2002-12-12.17:35:35.000>
created_at = <Date 2002-12-11.16:07:21.000>
labels = ['invalid', 'expert-unicode']
title = 'int(u"\\u1234") raises UnicodeEncodeError'
updated_at = <Date 2002-12-12.17:35:35.000>
user = 'https://github.com/gvanrossum'

bugs.python.org fields:

activity = <Date 2002-12-12.17:35:35.000>
actor = 'doerwalter'
assignee = 'loewis'
closed = True
closed_date = None
closer = None
components = ['Unicode']
creation = <Date 2002-12-11.16:07:21.000>
creator = 'gvanrossum'
dependencies = []
files = []
hgrepos = []
issue_num = 652104
keywords = []
message_count = 4.0
messages = ['13594', '13595', '13596', '13597']
nosy_count = 3.0
nosy_names = ['gvanrossum', 'loewis', 'doerwalter']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue652104'
versions = ['Python 2.3']

@gvanrossum
Copy link
Member Author

In python 2.2, int of a unicode string containing
non-digit characters raises ValueError, like all other
attempts to convert an invalid string or unicode to
int. But in Python 2.3, it appears that int() of a
unicode string si implemented differently and now can
raise UnicodeEncodeError:

>>> int(u"\u1234")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'decimal' codec can't encode
character '\u1234' in position 0: invalid decimal
Unicode string
>>> 

I think it's important that int() of a string or
unicode argument only raises ValueError to indicate
invalid inputs -- otherwise one ends up writing bare
excepts for conversions to string (as it is too much
trouble to keep track of which Python versions can
raise which exceptions).

@loewis
Copy link
Mannequin

loewis mannequin commented Dec 11, 2002

Logged In: YES
user_id=21627

I don't see the problem:

>>> try:
...   int(u"\u1234")
... except ValueError:
...   print "caught"
...
caught
>>> issubclass(UnicodeEncodeError,ValueError)
True

@gvanrossum
Copy link
Member Author

Logged In: YES
user_id=6380

Ah, thanks. Sorry.

@doerwalter
Copy link
Contributor

Logged In: YES
user_id=89016

PyUnicode_EncodeDecimal() is responsible for this change.
This function was changed due to the PEP-293 implementation.
In Python 2.2 it raised a ValueError, which IMHO is a bug,
because as an encoding function that encodes unicode to str,
it should raise a UnicodeError in case of an unencodable
character.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants