Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytes do not work on sys.path #91181

Closed
graingert mannequin opened this issue Mar 15, 2022 · 18 comments
Closed

bytes do not work on sys.path #91181

graingert mannequin opened this issue Mar 15, 2022 · 18 comments
Assignees
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir topic-importlib

Comments

@graingert
Copy link
Mannequin

graingert mannequin commented Mar 15, 2022

BPO 47025
Nosy @brettcannon, @jaraco, @ericvsmith, @eryksun, @graingert
PRs
  • gh-91181: restore support for bytes on sys.path in FileFinder #31897
  • gh-91181: drop support for bytes on sys.path #31934
  • Files
  • demo.py: reproducer of the failure
  • zipfile_demo.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2022-03-15.10:32:43.256>
    labels = ['library', '3.9', '3.10', '3.11']
    title = 'bytes do not work on sys.path'
    updated_at = <Date 2022-03-17.19:22:31.783>
    user = 'https://github.com/graingert'

    bugs.python.org fields:

    activity = <Date 2022-03-17.19:22:31.783>
    actor = 'brett.cannon'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2022-03-15.10:32:43.256>
    creator = 'graingert'
    dependencies = []
    files = ['50678', '50679']
    hgrepos = []
    issue_num = 47025
    keywords = ['patch']
    message_count = 10.0
    messages = ['415233', '415234', '415235', '415236', '415237', '415238', '415239', '415323', '415340', '415443']
    nosy_count = 5.0
    nosy_names = ['brett.cannon', 'jaraco', 'eric.smith', 'eryksun', 'graingert']
    pr_nums = ['31897', '31934']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue47025'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @graingert
    Copy link
    Mannequin Author

    graingert mannequin commented Mar 15, 2022

    importing a module with bytes in sys.path fails with:

    File "<frozen importlib._bootstrap_external>", line 182, in _path_isabs
    TypeError: startswith first arg must be bytes or a tuple of bytes, not str

    (see reproducer in attached demo.py)

    however sys.path is documented as supporting bytes "Only strings and bytes should be added to sys.path; all other data types are ignored during import." https://docs.python.org/3/library/sys.html?highlight=Only%20strings%20and%20bytes#sys.path

    bytes are allowed in PathFinder._find_spec

    if not isinstance(entry, (str, bytes)):
    continue
    finder = cls._path_importer_cache(entry)
    but perhaps they should be ignored or explicitly fsdecoded ?

    see also:
    https://bugs.python.org/issue32642
    python/importlib_metadata#372 (comment)

    @graingert graingert mannequin added stdlib Python modules in the Lib dir 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes labels Mar 15, 2022
    @ericvsmith
    Copy link
    Member

    In case it helps anyone:

    On Windows 3.11.0a5+ the full traceback is:

    $ ./python.bat demo.py
    Running Debug|x64 interpreter...
    Traceback (most recent call last):
      File "...\demo.py", line 23, in <module>
        sys.exit(main())
                 ^^^^^^
      File "...\Lib\contextlib.py", line 155, in __exit__
        self.gen.throw(typ, value, traceback)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "...\demo.py", line 11, in _tmp_path
        yield pathlib.Path(tmp_dir)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "...\demo.py", line 18, in main
        import module
        ^^^^^^^^^^^^^
      File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
      File "<frozen importlib._bootstrap>", line 1140, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 1080, in _find_spec
      File "<frozen importlib._bootstrap_external>", line 1487, in find_spec
      File "<frozen importlib._bootstrap_external>", line 1459, in _get_spec
      File "<frozen importlib._bootstrap_external>", line 1596, in find_spec
      File "<frozen importlib._bootstrap_external>", line 1656, in _fill_cache
    TypeError: a bytes-like object is required, not 'str'

    And on cygwin 3.8.12:

    $ python demo.py
    Traceback (most recent call last):
      File "<frozen importlib._bootstrap_external>", line 1346, in _path_importer_cache
    KeyError: b'/tmp/tmprpymgive'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "demo.py", line 23, in <module>
        sys.exit(main())
      File "demo.py", line 18, in main
        import module
      File "<frozen importlib._bootstrap>", line 991, in _find_and_load
      File "<frozen importlib._bootstrap>", line 971, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 914, in _find_spec
      File "<frozen importlib._bootstrap_external>", line 1407, in find_spec
      File "<frozen importlib._bootstrap_external>", line 1376, in _get_spec
      File "<frozen importlib._bootstrap_external>", line 1348, in _path_importer_cache
      File "<frozen importlib._bootstrap_external>", line 1324, in _path_hooks
      File "<frozen importlib._bootstrap_external>", line 1594, in path_hook_for_FileFinder
      File "<frozen importlib._bootstrap_external>", line 1469, in __init__
      File "<frozen importlib._bootstrap_external>", line 177, in _path_isabs
    TypeError: startswith first arg must be bytes or a tuple of bytes, not str

    @graingert
    Copy link
    Mannequin Author

    graingert mannequin commented Mar 15, 2022

    this is a regression from 3.2:

    Python 3.2.6 (default, Jan 18 2016, 19:21:14) 
    [GCC 4.9.2] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import tempfile
    >>> tempfile.TemporaryDirectory()
    <TemporaryDirectory '/tmp/tmpd4jzut'>
    >>> v = _
    >>> tmp_dir = str(v.__enter__())
    >>> tmp_dir
    '/tmp/tmpd4jzut'
    >>> f = open(tmp_dir + "/module.py", "w")
    >>> f.write("def function():\n    return 1\n")
    29
    >>> f.close()
    >>> import sys
    >>> sys.path.append(tmp_dir.encode())
    >>> import module
    >>> module
    <module 'module' from '/tmp/tmpd4jzut/module.py'>
    >>> 
    

    @graingert
    Copy link
    Mannequin Author

    graingert mannequin commented Mar 15, 2022

    https://docs.python.org/3/reference/import.html#path-entry-finders says "The encoding of bytes entries is determined by the individual path entry finders." see 82c1c78

    @graingert
    Copy link
    Mannequin Author

    graingert mannequin commented Mar 15, 2022

    interestingly bytes filenames pointing to zip files on sys.path do support bytes (see zipfile_demo.py)

    @eryksun
    Copy link
    Contributor

    eryksun commented Mar 15, 2022

    this is a regression from 3.2

    In Windows, bytes paths in sys.path do not work in 3.2+. I didn't test 3.0 and 3.1, but practically one can say that bytes paths were never supported in Python 3 on Windows.

    @graingert
    Copy link
    Mannequin Author

    graingert mannequin commented Mar 15, 2022

    zipimporter.zipimporter handles non-bytes paths here:

    cpython/Lib/zipimport.py

    Lines 65 to 67 in 2cf7f86

    if not isinstance(path, str):
    import os
    path = os.fsdecode(path)
    I think FileFinder should do the same

    @jaraco
    Copy link
    Member

    jaraco commented Mar 16, 2022

    I'd advocate for not supporting bytes paths and instead updating the documentation to require strings.

    @graingert
    Copy link
    Mannequin Author

    graingert mannequin commented Mar 16, 2022

    I'd advocate for not supporting bytes paths and instead updating the documentation to require strings.

    I've got PR #76115 started to do this

    @brettcannon
    Copy link
    Member

    I think it depends on whether we want to say the standard/included/built-in import mechanisms don't support byte paths, or byte paths are entirely not supported even for 3rd-party code? I definitely think we should at least do the former, but I'm hesitant to close the door for those that need it by doing the latter.

    @brettcannon
    Copy link
    Member

    Looking at the tracebacks that Eric provided, these are caused by:

    name, dot, suffix = item.partition('.')
    if dot:
    new_name = '{}.{}'.format(name, suffix.lower())

    root = _os._path_splitroot(path)[0].replace('/', '\\')
    return len(root) > 1 and (root.startswith('\\\\') or root.endswith('\\'))

    return path.startswith(path_separators)

    It's basically bad assumptions in importlib where string constants are used instead of string or bytes constants based on what's being compared against.

    As for decoding, https://docs.python.org/3/library/sys.html#sys.getfilesystemencoding suggests it's the right way to go like @graingert has in his PR. It probably isn't fully backwards-compatible as I bet we used to not care about the bytes involved and just did a straight comparison, but in this instance you have bigger problems if your file paths are not decoding appropriately based on what Python believes is your file system encoding.

    But, this has been broken since at least Python 3.6 based on Eric's demo.py file and Python 3.6.15 that I have on my machine. That also suggests people don't seem too bothered by this considering this only came up in 2022 and it's been broken since the end of 2016. As such, I think the PR that @graingert wrote where we do the encoding is the best option. I'm going to think about this a little bit, though, before acting on the PR (and to see if there are any other opinions).

    @graingert
    Copy link
    Contributor

    graingert commented Jun 24, 2022

    As for decoding, https://docs.python.org/3/library/sys.html#sys.getfilesystemencoding suggests it's the right way to go like @graingert has in his PR.

    Which PR? There's two: one to drop support in general, and one to wedge support back into FileFinder?

    @brettcannon
    Copy link
    Member

    Which PR?

    #31897

    @brettcannon
    Copy link
    Member

    After thinking about it, I think we should add support back, so I'm reviewing #31897 and I'm going to close the PR to drop bytes support.

    @brettcannon
    Copy link
    Member

    https://discuss.python.org/t/drop-supporting-bytes-on-sys-path/17225

    I'm now suggesting dropping bytes support in sys.path. We will see how that conversation goes.

    And for bookkeeping, the PR to drop bytes support is #31934 (which I closed, so it can be reopened if things go towards dropping support).

    @brettcannon
    Copy link
    Member

    FYI it's looking good to drop bytes support! I will give the topic on discuss.python.org a little while longer, but my expectation is we will go with #31934 and backport it 3.11 and 3.10 since it's already broken in those versions. I will then add a What's New entry in 3.11 to help get the word out.

    brettcannon added a commit that referenced this issue Jul 17, 2022
    Support for bytes broke sometime between Python 3.2 and 3.6 and has been broken ever since. Trying to bring back supports is surprisingly difficult in the face of -b and checking for keys in sys.path_importer_cache. Since the support was broken for so long, trying to overcome the difficulty of bringing back the support has been deemed not worth it.
    
    Co-authored-by: Eryk Sun <eryksun@gmail.com>
    Co-authored-by: Brett Cannon <brett@python.org>
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 17, 2022
    Support for bytes broke sometime between Python 3.2 and 3.6 and has been broken ever since. Trying to bring back supports is surprisingly difficult in the face of -b and checking for keys in sys.path_importer_cache. Since the support was broken for so long, trying to overcome the difficulty of bringing back the support has been deemed not worth it.
    
    Co-authored-by: Eryk Sun <eryksun@gmail.com>
    Co-authored-by: Brett Cannon <brett@python.org>
    (cherry picked from commit 6da988a)
    
    Co-authored-by: Thomas Grainger <tagrain@gmail.com>
    miss-islington added a commit that referenced this issue Jul 17, 2022
    Support for bytes broke sometime between Python 3.2 and 3.6 and has been broken ever since. Trying to bring back supports is surprisingly difficult in the face of -b and checking for keys in sys.path_importer_cache. Since the support was broken for so long, trying to overcome the difficulty of bringing back the support has been deemed not worth it.
    
    Co-authored-by: Eryk Sun <eryksun@gmail.com>
    Co-authored-by: Brett Cannon <brett@python.org>
    (cherry picked from commit 6da988a)
    
    Co-authored-by: Thomas Grainger <tagrain@gmail.com>
    @brettcannon
    Copy link
    Member

    The fix for main is merged and #94914 should auto-merge for 3.11 once CI is finished. I decided not to backport to 3.10 because we are changing the wording of the import reference, even if the functionality is broken in Python 3.10.

    I'm going to leave this issue open to remind me to write up a What's New entry for 3.11 (which still needs to go into main first).

    @brettcannon
    Copy link
    Member

    #94919 is the What's New entry for 3.11 and it's set to auto-merge, so I consider this issue done! Thanks everyone!

    miss-islington added a commit that referenced this issue Jul 17, 2022
    …path` (GH-94918)
    
    (cherry picked from commit ec4745b)
    
    Co-authored-by: Brett Cannon <brett@python.org>
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir topic-importlib
    Projects
    None yet
    Development

    Successfully merging a pull request may close this issue.

    5 participants