New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Porting guide: disabling & warning on implicit unicode conversions #72589
Comments
Some of the hardest compatibility issues to track down in Python 3 migrations are those where existing code is depending on an implicit str->unicode promotion something in the depths of a support library (or sometimes even the standard library - the context where this came up relates to some apparent misbehaviour in the standard library). In other cases, just being able to rule implicit conversions out as a possible contributing factor can be helpful in finding the real problem. It's technically already possible to hook implicit conversions by adjusting (or shadowing) the site.py module and replacing the default "ascii" encoding with one that emits a warning whenever you rely on it: http://washort.twistedmatrix.com/2010/11/unicode-in-python-and-how-to-prevent-it.html However, actually setting that up is a bit tricky, since we deliberately drop "sys.setdefaultencoding" from the sys module in the default site module. That means requesting warnings for implicit conversions requires doing the following:
2a. Run with "-S" and call sys.setdefaultencoding post-startup
If we wanted to make that easier for folks migrating, the first step would be to provide the "ascii_with_warnings" codec by default in Python 2.7 (perhaps as "_ascii_with_warnings", since it isn't intended for general use, it's just a migration helper) The second would be to provide a way to turn it on that doesn't require fiddling with the site module. The simplest option there would be to always enable it under The argument against the simple option is that I'm not sure how noisy it would be by default - there are some standard library modules (e.g. URL processing) where we still rely on implicit encoding and decoding in Python 2, but have separate code paths in Python 3. Since we don't have -X options in Python 2, the second simplest alternative would be to leave |
(Correction to the above: the case where this came up turned out to be due to consuming code monkeypatching things when it really shouldn't have been, so it fell into the second category of "It would have been helpful to be able to more easily rule this out as a contributing factor") |
Nick, I think you've missed the "undefined" encoding that we've had for this ever since Unicode was added to Python. You put the needed code into your sitecustomize.py file and Python2 will then behave just like Python3, i.e. raise an exception instead of coercing to Unicode: sitecustomize.py: There's no need to hack this into site.py or to make sys.setdefaultencoding() available outside sitecustomize.py. If you want an OS environ switch, you can put the necessary logic into sitecustomize.py as well. |
The main problem with the "undefined" encoding is that it actually *fails* the application, rather than allowing it to continue, but providing a warning at each new point where it encounters implicit encoding or decoding. This means the parts of the standard library that actually rely on implicit coercion fail outright, rather than just generate warning noise that you can filter out as irrelevant to your particular application. You raise a good point about The existing "undefined" option also at least allows you to categorically ensure you're not relying on implicit conversions at all, so the Python 3 porting guide could be updated to explicitly cover:
import sys
sys.setdefaultencoding('undefined') Giving folks the following tiered path to Python 3 support:
Brett, does the above approach sound reasonable to you? If so, then I'll do that as a pure documentation change in the Py3k porting guide with a "See Also" to the above blog post, and then mark this as closed/postponed (given the |
Adding Petr to the nosy list, as I'd like to get his perspective on this once I have a draft docs patch to review. I also realised it made more sense to just repurpose this issue to cover the proposed docs updates. |
In portingguide [0] I could only recommend sitecustomize with a (possibly third-party) codec that emits warnings; not 'undefined'. The things that aren't ported yet are generally either Non-Python applications with Python bindings or plugins (Gimp, Samba, ...), projects that are very large relative to the count of available maintainers (VCSs, Sugar, wxPython, ...), or code that depends on those. If sys.setdefaultencoding('undefined') breaks parts of the standard library, it might be OK for smaller scripts but I fear it won't help big projects much. |
On 10.10.2016 15:08, Petr Viktorin wrote:
That's true. It does break the stdlib (the codec was originally A new codec "ascii-warn" could easily be added, based on the |
If a new codec gets added to 2.7 then I'm fine with the proposed change. |
Python 2.7 is no longer supported. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: