Skip to content

pkg_resources: do not call stat() and access() #1135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 3, 2017
Merged

Conversation

jd
Copy link
Contributor

@jd jd commented Aug 25, 2017

The current code in find_on_path is doing a lot of stat() calls which are
actually useless and prone to race conditions.

As described in Python documentation
(https://docs.python.org/3/library/os.html#os.access), os.access must not be
used before opening a file. Same goes for a directory.

This patch removes those checks by handling exceptions correctly when using
os.listdir() instead, which improves pkg_resources import time.

path_item, metadata=PathMetadata(
path_item, os.path.join(path_item, 'EGG-INFO')
)
if _is_unpacked_egg(path_item):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the code heads down this path now, but path_item is not a dir or is not readable, you'll get a different result than before. That seems to be a backward-incompatible change. Is this code path not expected to be impacted by this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it probably needs some more work to, I'll dig into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually… the code check that path_item/EGG-INFO/PKG-INFO is a file, so if that's the case, path_item has to be a directory. So the check seems useless.

The permission is not that useful either: the fact that the directory is readable is only useful when calling os.listdir, but no code in that path uses os.listdir – and if it does, it'd needs to learn to handle errors. What is interesting is to be sure the directory has +x permission, and that is done by using os.path.isfile in _is_unpacked_egg: it'll return False if it can't traverse the directory.

So… I can't see of a case where things would go wrong. If there's any we should definitely add a test!

@jaraco
Copy link
Member

jaraco commented Aug 26, 2017

I like your instinct here. I'm concerned about the test failure on Windows, which does seem implicated in the change. What's your thought?

@jd
Copy link
Contributor Author

jd commented Aug 28, 2017

Interesting… The Windows failure is exactly why Python 3 has a NotADirectoryError. There seems to be no smart way to catch that in Python 2. I've updated the patch to catch that too, it's ugly, but if you know any better way let me know.

The current code in find_on_path is doing a lot of stat() calls which are
actually useless and prone to race conditions.

As described in Python documentation
(https://docs.python.org/3/library/os.html#os.access), os.access must not be
used before opening a file. Same goes for a directory.

This patch removes those checks by handling exceptions correctly when using
os.listdir() instead, which improves pkg_resources import time.
@jd
Copy link
Contributor Author

jd commented Aug 28, 2017

Pull-request updated, HEAD is now df23966

@jaraco jaraco merged commit 6d8381a into pypa:master Sep 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants