Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible duplicate entries in sys.path if .pth files are used with zip's #70774

Closed
tds333 mannequin opened this issue Mar 18, 2016 · 13 comments
Closed

Possible duplicate entries in sys.path if .pth files are used with zip's #70774

tds333 mannequin opened this issue Mar 18, 2016 · 13 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@tds333
Copy link
Mannequin

tds333 mannequin commented Mar 18, 2016

BPO 26587
Nosy @brettcannon, @tds333
Files
  • issue26587.diff
  • site.patch: Patch for site with test
  • site2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/brettcannon'
    closed_at = <Date 2016-04-08.22:10:20.298>
    created_at = <Date 2016-03-18.10:23:00.832>
    labels = ['type-feature', 'library']
    title = "Possible duplicate entries in sys.path if .pth files are used with zip's"
    updated_at = <Date 2016-04-08.22:10:20.296>
    user = 'https://github.com/tds333'

    bugs.python.org fields:

    activity = <Date 2016-04-08.22:10:20.296>
    actor = 'brett.cannon'
    assignee = 'brett.cannon'
    closed = True
    closed_date = <Date 2016-04-08.22:10:20.298>
    closer = 'brett.cannon'
    components = ['Library (Lib)']
    creation = <Date 2016-03-18.10:23:00.832>
    creator = 'tds333'
    dependencies = []
    files = ['42222', '42225', '42296']
    hgrepos = []
    issue_num = 26587
    keywords = ['patch']
    message_count = 13.0
    messages = ['261958', '261974', '262059', '262070', '262078', '262193', '262195', '262196', '262200', '262493', '263045', '263046', '263047']
    nosy_count = 5.0
    nosy_names = ['brett.cannon', 'tds333', 'SilentGhost', 'python-dev', 'mgrang']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue26587'
    versions = ['Python 2.7', 'Python 3.5', 'Python 3.6']

    @tds333
    Copy link
    Mannequin Author

    tds333 mannequin commented Mar 18, 2016

    In site.py there is the internal function _init_pathinfo() This function builds a set of path entries from sys.path. This is used to avoid duplicate entries in sys.path.
    But this function has a check if it is a directory with os.path.isdir(...). All this is fine as long as someone has a .zip file in sys.path or a zipfile subpath. Then the path entry is not part of the set. With this duplicate detection with none directories does not work.

    The fix is as simple as removing the os.path.isdir(...) line and fixing the indent. Also the docstring should be modified.

    Detected by using this function in a project reusing addsitedir(...) functionality to add another path with .pth processing.

    @tds333 tds333 mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Mar 18, 2016
    @SilentGhost
    Copy link
    Mannequin

    SilentGhost mannequin commented Mar 18, 2016

    Could you provide a code example of your using addsitedir that results in duplicates?

    @mgrang
    Copy link
    Mannequin

    mgrang mannequin commented Mar 19, 2016

    Here is a testcase to reproduce the issue:

    cat test.py
    import site, sys
    site.addsitedir('/foo/bar')
    print (sys.path)

    This prints just a single instance of '/foo/bar':

    ['/local/mnt/workspace/python/tst', '/foo/bar', '/usr/local/lib/python36.zip', '/local/mnt/workspace/python/src/Lib', '/local/mnt/workspace/python/src/Lib/plat-linux', '/local/mnt/workspace/python/build/build/lib.linux-x86_64-3.6-pydebug']

    Now if we explicitly set PYTHONPATH to include '/foo/bar'

    export PYTHONPATH=/foo/bar

    and then run test.py here is the output:

    ['/local/mnt/workspace/python/tst', '/foo/bar', '/usr/local/lib/python36.zip', '/local/mnt/workspace/python/src/Lib', '/local/mnt/workspace/python/src/Lib/plat-linux', '/local/mnt/workspace/python/build/build/lib.linux-x86_64-3.6-pydebug', '/foo/bar']

    We see that there are duplicate entries for '/foo/bar'.

    As Wolfgang rightly said the issue comes from the check for
    "if os.path.isdir(dir)" inside _init_pathinfo() in site.py.
    On removing this check I no longer see the duplicate entry for '/foo/bar'.

    But since this is the first bug I am looking at I am not sure of the implications of removing this check. Can someone please confirm that what I see is indeed a failing test case, or is this the intended behavior?

    Thanks,
    Mandeep

    @SilentGhost
    Copy link
    Mannequin

    SilentGhost mannequin commented Mar 20, 2016

    Thanks for this, Mandeep. I don't think it is entirely the same issue that Wolfgang is describing. He's particularly concerned about the clash of .zip files from the sys.path with ones coming from .pth files for example. And while the proposed fix would resolve the issue, I don't feel it would be entirely correct. For the record, I'm not even sure sys.path is guaranteed to be contain no duplicates, at least I wasn't able to find a statement to that effect in the docs. In any case, I'm adding Brett here who was seemingly the last one to touch that bit of code.

    I'm also attaching a path with what I think is a more appropriate and smaller fix.

    @tds333
    Copy link
    Mannequin Author

    tds333 mannequin commented Mar 20, 2016

    Extended unit test for the issue and patch for site.py.

    @tds333
    Copy link
    Mannequin Author

    tds333 mannequin commented Mar 22, 2016

    I think a fix for 3.6 only is ok, because it changes behaviour.
    But this is only an internal function with a "_".

    Should I add a test with a temporary created pth file?

    @brettcannon
    Copy link
    Member

    brettcannon commented Mar 22, 2016

    Unfortunately you can't simply remove that directory check because you don't want to blindly normalize case. If someone put some token value on sys.path for their use that was case-sensitive even on a case-insensitive OS then the proposed change would break those semantics (e.g. if someone put a URL in sys.path for a REST-based importer).

    The possibilities I see are:

    1. Don't change anything; duplicate entries don't really hurt anything
    2. Remove duplicate entries, but only normalize case for directories
    3. Remove duplicate entries, but normalize case for anything that points to something on the filesystem (i.e. both directories and files)

    @brettcannon
    Copy link
    Member

    brettcannon commented Mar 22, 2016

    And the code under discussion can be found at https://hg.python.org/cpython/file/default/Lib/site.py#l133

    @SilentGhost
    Copy link
    Mannequin

    SilentGhost mannequin commented Mar 22, 2016

    I still think my fix is more appropriate as it ensures that known_paths and sys.path stay connected somehow.

    @tds333
    Copy link
    Mannequin Author

    tds333 mannequin commented Mar 26, 2016

    Ok, I implemented point 3.
    Check if it is a dir or file and makepath only in this case.
    All other entries are added unmodified to the set.

    Added a test case also for an URL path.

    I think duplicate detection is now improved and it should break nothing.

    @brettcannon brettcannon self-assigned this Mar 26, 2016
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 8, 2016

    New changeset bd1af1a97c2e by Brett Cannon in branch 'default':
    Issue bpo-26587: Allow .pth files to specify file paths as well as
    https://hg.python.org/cpython/rev/bd1af1a97c2e

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 8, 2016

    New changeset 09dc97edf454 by Brett Cannon in branch '3.5':
    Issue bpo-26587: Remove an incorrect statement from the docs
    https://hg.python.org/cpython/rev/09dc97edf454

    New changeset 94d5c57ee835 by Brett Cannon in branch 'default':
    Merge w/ 3.5 for issue bpo-26587
    https://hg.python.org/cpython/rev/94d5c57ee835

    @brettcannon
    Copy link
    Member

    brettcannon commented Apr 8, 2016

    I simplified Wolfgang's patch by simply using os.path.exists(). That eliminated the one place where os.path.isdir() was in use that was too specific to directories where files were reasonable to expect.

    I also quickly tweaked the docs for the site module in 3.5 to not promise that files would work.

    If you think there is still an issue with keeping things tied together, SilentGhost, feel free to open another issue to track it.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant