New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg_resources: filter entries before parsing them #1134

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
2 participants
@jd
Contributor

jd commented Aug 24, 2017

_by_version_descending is quite heavy as it tries to parse versioning and
sort ALL files listed. This is one of the heaviest method.

Actually, there might not be any need to parse those file names it there will
be ignored in the loop just after, so let filter them before sorting.

On my system, this brings down the loading of time of pkg_resources by 20%:

$ python -m timeit -n 5 -r 3 -c 'import pkg_resources; reload(pkg_resources)'
5 loops, best of 3: 1.11 sec per loop

$ python -m timeit -n 5 -r 3 -c 'import pkg_resources; reload(pkg_resources)'
5 loops, best of 3: 889 msec per loop

entries = (
entry
for entry in os.listdir(path_item)
if (entry.lower().endswith('.egg-info')

This comment has been minimized.

@jaraco

jaraco Aug 26, 2017

Member

Would it be possible to apply this test as an argument to filter, such that entries = filter(of_interest, os.listdir(path_item))? That function of_interest could then have a docstring describing its purpose. My only worry is that the additional n function calls might obviate the performance advantages.

This comment has been minimized.

@jd

jd Aug 28, 2017

Contributor

Considering how long is the parsing of version (50% cost of exec time IIRC), I don't think it can be worse. I'll change that.

@jd jd force-pushed the jd:pre-filter-find-on-path branch from f5b8d4d to cb989a2 Aug 28, 2017

@jd

This comment has been minimized.

Contributor

jd commented Aug 28, 2017

Pull-request updated, HEAD is now cb989a2

@jaraco

This comment has been minimized.

Member

jaraco commented Sep 3, 2017

Thanks for this. I was going to accept this change, but then I went to extract the of_interest function and realized that the 'only' parameter has to be included. This fact makes all of this code quite complicated. I'm not even quite sure what 'only' is for. Blaming the code, I find it's very old (c721485).

I think I want to mull over this one a bit more before rolling it in because it does add complication to the already complicated code, adds redundant operations, and it's not clear to me if the benefit outweighs the cost.

@jd

This comment has been minimized.

Contributor

jd commented Sep 4, 2017

I feel you @jaraco. Run cProfile and you'll see how heavy the parsing is. I've 400 packages installed on my system, so the time pkg_resources lose trying to parse things it does not want in the end has a real cost. That being said, if you have a better way to refactor this, I'm all for it.

The branch is in conflict for now, I'll update it.

pkg_resources: filter entries before parsing them
`_by_version_descending` is quite heavy as it tries to parse versioning and
sort ALL files listed. This is one of the heaviest method.

Actually, there might not be any need to parse those file names it there will
be ignored in the loop just after, so let filter them before sorting.

On my system, this brings down the loading of time of pkg_resources by 20%:

$ python -m timeit -n 5 -r 3 -c 'import pkg_resources; reload(pkg_resources)'
5 loops, best of 3: 1.11 sec per loop

$ python -m timeit -n 5 -r 3 -c 'import pkg_resources; reload(pkg_resources)'
5 loops, best of 3: 889 msec per loop
@jd

This comment has been minimized.

Contributor

jd commented Sep 4, 2017

Pull-request updated, HEAD is now 4481608

@jd jd force-pushed the jd:pre-filter-find-on-path branch from cb989a2 to 4481608 Sep 4, 2017

jaraco added a commit that referenced this pull request Sep 9, 2017

@jaraco

This comment has been minimized.

Member

jaraco commented Sep 9, 2017

@jd Please have a look at the latest commit (referenced above). I believe it accomplishes the same thing you were after.

@jaraco jaraco closed this Sep 9, 2017

@jd

This comment has been minimized.

Contributor

jd commented Sep 12, 2017

@jaraco looks great, thanks for taking care!

@jd jd deleted the jd:pre-filter-find-on-path branch Sep 12, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment