Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codecs error handler is called with a UnicodeDecodeError with the same args #58038

Open
amauryfa opened this issue Jan 19, 2012 · 4 comments
Open
Labels
topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@amauryfa
Copy link
Member

BPO 13830
Nosy @malemburg, @doerwalter, @amauryfa, @vstinner, @ezio-melotti, @serhiy-storchaka

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2012-01-19.19:56:36.834>
labels = ['type-bug', 'expert-unicode']
title = 'codecs error handler is called with a UnicodeDecodeError with the same args'
updated_at = <Date 2018-02-28.18:24:13.140>
user = 'https://github.com/amauryfa'

bugs.python.org fields:

activity = <Date 2018-02-28.18:24:13.140>
actor = 'serhiy.storchaka'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Unicode']
creation = <Date 2012-01-19.19:56:36.834>
creator = 'amaury.forgeotdarc'
dependencies = []
files = []
hgrepos = []
issue_num = 13830
keywords = []
message_count = 4.0
messages = ['151650', '152528', '152573', '313062']
nosy_count = 6.0
nosy_names = ['lemburg', 'doerwalter', 'amaury.forgeotdarc', 'vstinner', 'ezio.melotti', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue13830'
versions = []

@amauryfa
Copy link
Member Author

The script below shows that the error handler is always called with the same error object. The 'start', 'end', and 'reason' properties are correctly updated, but the 'args' is always the same and holds the values used for the first call.

It's a bit weird that error.args[2] is not equal to error.start, for example. All versions are affected: 2.7, 3.2, 3.3.
And by the way, I could not find where these are attributes documented.

def custom_handler(error):
    print(error.args,
          (error.start, error.end, error.reason))
    return b'?'.decode(), error.end

import codecs
codecs.register_error('custom', custom_handler)
b'\x80\xd0'.decode('utf-8', 'custom')

@amauryfa amauryfa added topic-unicode type-bug An unexpected behavior, bug, or error labels Jan 19, 2012
@doerwalter
Copy link
Contributor

See this ancient posting about this problem:

http://mail.python.org/pipermail/python-dev/2002-August/027661.html

(see point 4.). So I guess somebody did finally complain! ;)

The error attributes are documented in PEP-293. The existence of the attributes is documented in Doc/c-api/exceptions.rst, but not their meaning.

@vstinner
Copy link
Member

vstinner commented Feb 4, 2012

Codec encoders reuse the same exception object for speed, but set some attributes (start, end and reason). Recreate the args tuple each time that a attribute is set. UnicodeEncodeError and UnicodeDecodeError should maybe override args getter to create a new tuple at each call.

@serhiy-storchaka
Copy link
Member

For reference, this behavior was from beginning, since implementing PEP-293 in bpo-432401.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-unicode type-bug An unexpected behavior, bug, or error
Projects
Development

No branches or pull requests

4 participants