Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Porting guide: disabling & warning on implicit unicode conversions #72589

Closed
ncoghlan opened this issue Oct 10, 2016 · 9 comments
Closed

Porting guide: disabling & warning on implicit unicode conversions #72589

ncoghlan opened this issue Oct 10, 2016 · 9 comments
Labels
docs Documentation in the Doc dir type-feature A feature request or enhancement

Comments

@ncoghlan
Copy link
Contributor

BPO 28403
Nosy @malemburg, @brettcannon, @ncoghlan, @encukou, @serhiy-storchaka

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2020-05-31.13:24:31.605>
created_at = <Date 2016-10-10.03:52:12.303>
labels = ['type-feature', 'docs']
title = 'Porting guide: disabling & warning on implicit unicode conversions'
updated_at = <Date 2020-05-31.13:24:31.604>
user = 'https://github.com/ncoghlan'

bugs.python.org fields:

activity = <Date 2020-05-31.13:24:31.604>
actor = 'serhiy.storchaka'
assignee = 'docs@python'
closed = True
closed_date = <Date 2020-05-31.13:24:31.605>
closer = 'serhiy.storchaka'
components = ['Documentation']
creation = <Date 2016-10-10.03:52:12.303>
creator = 'ncoghlan'
dependencies = []
files = []
hgrepos = []
issue_num = 28403
keywords = []
message_count = 9.0
messages = ['278402', '278403', '278405', '278411', '278413', '278415', '278420', '278665', '370445']
nosy_count = 6.0
nosy_names = ['lemburg', 'brett.cannon', 'ncoghlan', 'petr.viktorin', 'docs@python', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'out of date'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue28403'
versions = ['Python 2.7']

@ncoghlan
Copy link
Contributor Author

Some of the hardest compatibility issues to track down in Python 3 migrations are those where existing code is depending on an implicit str->unicode promotion something in the depths of a support library (or sometimes even the standard library - the context where this came up relates to some apparent misbehaviour in the standard library). In other cases, just being able to rule implicit conversions out as a possible contributing factor can be helpful in finding the real problem.

It's technically already possible to hook implicit conversions by adjusting (or shadowing) the site.py module and replacing the default "ascii" encoding with one that emits a warning whenever you rely on it: http://washort.twistedmatrix.com/2010/11/unicode-in-python-and-how-to-prevent-it.html

However, actually setting that up is a bit tricky, since we deliberately drop "sys.setdefaultencoding" from the sys module in the default site module. That means requesting warnings for implicit conversions requires doing the following:

  1. Finding the "ascii_with_warnings" codec above (or writing your own)
  2. Learning one of the following 3 tricks for overriding the default encoding:

2a. Run with "-S" and call sys.setdefaultencoding post-startup
2b. Edit the actual system site.py in a container or other test environment
2c. Shadow site.py with your own modified copy

  1. Run your tests or application with the modified default encoding

If we wanted to make that easier for folks migrating, the first step would be to provide the "ascii_with_warnings" codec by default in Python 2.7 (perhaps as "_ascii_with_warnings", since it isn't intended for general use, it's just a migration helper)

The second would be to provide a way to turn it on that doesn't require fiddling with the site module. The simplest option there would be to always enable it under -3.

The argument against the simple option is that I'm not sure how noisy it would be by default - there are some standard library modules (e.g. URL processing) where we still rely on implicit encoding and decoding in Python 2, but have separate code paths in Python 3.

Since we don't have -X options in Python 2, the second simplest alternative would be to leave sys.setdefaultencoding available when running under -3: that way folks could more easily opt in to enabling the "ascii_with_warnings" codec selectively (e.g. via a context manager), rather than always having it enabled.

@ncoghlan ncoghlan added the type-feature A feature request or enhancement label Oct 10, 2016
@ncoghlan
Copy link
Contributor Author

(Correction to the above: the case where this came up turned out to be due to consuming code monkeypatching things when it really shouldn't have been, so it fell into the second category of "It would have been helpful to be able to more easily rule this out as a contributing factor")

@malemburg
Copy link
Member

Nick, I think you've missed the "undefined" encoding that we've had for this ever since Unicode was added to Python.

You put the needed code into your sitecustomize.py file and Python2 will then behave just like Python3, i.e. raise an exception instead of coercing to Unicode:

sitecustomize.py:
import sys
sys.setdefaultencoding('undefined')

There's no need to hack this into site.py or to make sys.setdefaultencoding() available outside sitecustomize.py.

If you want an OS environ switch, you can put the necessary logic into sitecustomize.py as well.

@ncoghlan
Copy link
Contributor Author

The main problem with the "undefined" encoding is that it actually *fails* the application, rather than allowing it to continue, but providing a warning at each new point where it encounters implicit encoding or decoding. This means the parts of the standard library that actually rely on implicit coercion fail outright, rather than just generate warning noise that you can filter out as irrelevant to your particular application.

You raise a good point about sitecustomize.py though - I always forget about that feature myself, and it didn't come up in any of the Google results I looked at either.

The existing "undefined" option also at least allows you to categorically ensure you're not relying on implicit conversions at all, so the Python 3 porting guide could be updated to explicitly cover:

  1. Finding the site customization path for your active virtual environment:

    python -c 'import os.path, sysconfig; print(os.path.join(sysconfig.get_path("purelib"), "sitecustomize.py"))'

  2. What to write to that location to disable implicit Unicode conversions:

    import sys
    sys.setdefaultencoding('undefined')

Giving folks the following tiered path to Python 3 support:

  • get "pylint --py3k" passing (e.g. via python-modernize)
  • eliminate "python -3" warnings under Python 2
  • (optional) support running with the above site customizations
  • actually run under Python 3

Brett, does the above approach sound reasonable to you? If so, then I'll do that as a pure documentation change in the Py3k porting guide with a "See Also" to the above blog post, and then mark this as closed/postponed (given the sitecustomize approach to enable it, the 3rd party codec should be fine for folks that want the warning behaviour instead)

@ncoghlan
Copy link
Contributor Author

Adding Petr to the nosy list, as I'd like to get his perspective on this once I have a draft docs patch to review.

I also realised it made more sense to just repurpose this issue to cover the proposed docs updates.

@ncoghlan ncoghlan added the docs Documentation in the Doc dir label Oct 10, 2016
@ncoghlan ncoghlan changed the title Migration RFE: optional warning for implicit unicode conversions Porting guide: disabling & warning on implicit unicode conversions Oct 10, 2016
@encukou
Copy link
Member

encukou commented Oct 10, 2016

In portingguide [0] I could only recommend sitecustomize with a (possibly third-party) codec that emits warnings; not 'undefined'.

The things that aren't ported yet are generally either Non-Python applications with Python bindings or plugins (Gimp, Samba, ...), projects that are very large relative to the count of available maintainers (VCSs, Sugar, wxPython, ...), or code that depends on those.

If sys.setdefaultencoding('undefined') breaks parts of the standard library, it might be OK for smaller scripts but I fear it won't help big projects much.

[0] http://portingguide.readthedocs.io/en/latest/

@malemburg
Copy link
Member

On 10.10.2016 15:08, Petr Viktorin wrote:

If sys.setdefaultencoding('undefined') breaks parts of the standard library, it might be OK for smaller scripts but I fear it won't help big projects much.

That's true. It does break the stdlib (the codec was originally
added in order to test exactly this scenario).

A new codec "ascii-warn" could easily be added, based on the
code used for "undefined".

@brettcannon
Copy link
Member

If a new codec gets added to 2.7 then I'm fine with the proposed change.

@serhiy-storchaka
Copy link
Member

Python 2.7 is no longer supported.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants