New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeDecodeError that cannot be caught in narrow unicode builds #45818
Comments
The following error is uncatchable: >>> try: ur'\U0010FFFF'
... except UnicodeDecodeError: pass
...
UnicodeDecodeError: 'rawunicodeescape' codec can't decode byte 0x5c
in position 0: \Uxxxxxxxx out of range This is in a narrow unicode build: >>> sys.version_info, hex(sys.maxunicode)
((2, 5, 1, 'final', 0), '0xffff') Of course the r in ur'...' is redundant in the test case above, but >>> ur'\U0010FFFF\test'
u'\U0010ffff\\test'
- from a wide unicode build
>>> ur'\U0010FFFF\test'
UnicodeDecodeError: 'rawunicodeescape' codec can't decode byte 0x5c
in position 0: \Uxxxxxxxx out of range
- from the narrow unicode build The problem occurs with .decode('raw-unicode-escape') too. >>> '\U0010FFFF\test'.decode('raw-unicode-escape')
Traceback (most recent call last):
[&c.] Most surprisingly of all, however, this problem doesn't occur when you >>> u'\U0010ffff\\test'
u'\U0010ffff\\test' So there is at least a workaround for all cases, which is why this bug |
Can someone comment on this, or bring it up on python-dev if it needs |
The error is not uncatchable; but it is generated while compiling, like OTOH, there is a bug in PyUnicode_DecodeRawUnicodeEscape(): it should >>> ur'\U00010000'
u'\x00' I join a patch to make raw-unicode-escape similar to unicode-escape: |
For a wide build, the code
if (x <= 0xffff)
*p++ = (Py_UNICODE) x;
else {
*p++ = (Py_UNIC0DE) x; looks strange. Furthermore with the patch applied Python no longer complains about >>> ur'\U11111111'
u'\u1c04\udd11' |
The "strange" code is a copy of PyUnicode_DecodeUnicodeEscape. I find it Here is a new version of the patch which:
in python2.5, the end position was completely bogus:
>>> try: '\U11111111'.decode("raw-unicode-escape")
... except Exception, e: print repr(e)
UnicodeDecodeError('rawunicodeescape', '\\U11111111', 0, 504955452,
'\\Uxxxxxxxx out of range') |
The patch looks goog to me now. Go ahead and check it in. |
s/goog/good/g ;) |
Committed r61793. Will backport. |
backported to 2.5 branch as r61854 |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: