-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
Better start and end position for unicodeerror in unicode_encode_ucs1 #72960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
unicode_encode_ucs1 now recognizes as many characters as it can one time instead of one character a time. But the unicodeerror positions still only count 1(the second time). A similar problem reported in bpo-28561. |
If I understood correctly, the patch fix the ASCII encoder to handle correctly error handlers which return non-ASCII text replacement strings. Right? I am not aware of such error handler, so I guess that it's a more a theorical fix? I really hate the code (in each encoder) which handles non-ASCII replacement strings. The code in the charmap encoder is just a mess: it uses a reentrant call to the encoder... I never understood this crazy behaviour. I guess that nobody relies on the behaviour. I hesitate to simply raise an error instead of using different rules depending on the code. Ah yes, by the way, each codec behaves differently on non-ASCII replacement strings... |
LGTM. But it is too late for beta 4. I'll commit the patch either after releasing 3.6.0 or in the 3.7 branch only. And while we are here I noticed that handling non-ASCII replacement string could be simpler. |
Right now, I suggest to only commit into 3.7. Such minor bug can wait for Python 3.6.1.
I also suggest to first commit unicode_encode_ucs1_error_pos.patch and then commit the other part of unicode_encode_ucs1_error_pos2.patch in a separated commit. I will be easy to backport the fix to the 3.6 branch later. Serhiy: Xiang became a core developer, are you ok if he push himself unicode_encode_ucs1_error_pos.patch to default tomorrow, and later you rebase your patch on top of that? I'm not super confident because the fix doesn't come with an unit test, but it's ok if Serhiy reviewed it :-) |
This is not a bug itself. It seems to me that at worst case the current code is less efficient with non-standard error handler than it can be. I would commit the path to the 3.6 branch before beta 4 as it is nice and simple additional to already added optimization. But it is too late now, at last beta. Xiang can commit his patch to 3.7. No need to backport it to 3.6 (if I didn't miss something). |
Sorry, I misunderstood the issue. It's an enhancement, so for 3.7 only. Right, 3.6 is now almost frozen, only major bug fixes blocking the release are accepted now (in short). Regular bugfixes should wait for 3.6.1. |
New changeset 3d660ed2a60e by Xiang Zhang in branch 'default': |
Thanks Serhiy and Victor. Finished my first commit. :-) Now assign back to Serhiy and pos2 LGTM. |
Congratulations, Xiang! |
New changeset 3addf93f4111 by Serhiy Storchaka in branch 'default': |
Misc/NEWS
so that it is managed by towncrier #552Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: