New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iter_entry_point often spending a lot of time on parsing a dummy requirements string #1132

Closed
tmmorin opened this Issue Aug 11, 2017 · 3 comments

Comments

Projects
None yet
2 participants
@tmmorin

tmmorin commented Aug 11, 2017

When investigating a performance issue on a tool that makes use of setuptools entry points [1], profiling pointed at iter_entry_points as a hot point (see flamegraph attached to [1]), with iter_entry_points surprisingly spending a lot of time in pyparsing.

Digging further it appears that when an EntryPoint is created, even without anything in "extras", Requirements.parse is still called (see [2]) and ultimately results in parsing a "dummy" (because mostly empty) requirements string: "x[]".

[1] https://bugs.launchpad.net/python-openstackclient/+bug/1702483
[2] https://github.com/pypa/setuptools/blob/master/pkg_resources/__init__.py#L2289

@tmmorin

This comment has been minimized.

tmmorin commented Aug 11, 2017

Avoiding the call to Requirements.parse when extras is empty (patch below), leads to a significant improvement of iter_entry_points performance.

--- /usr/lib/python2.7/dist-packages/pkg_resources/__init__.py	2017-08-02 00:40:42.000000000 +0200
+++ /var/tmp/alt__init__.py	2017-08-11 15:05:16.319781794 +0200
@@ -2286,7 +2286,10 @@
         self.name = name
         self.module_name = module_name
         self.attrs = tuple(attrs)
-        self.extras = Requirement.parse(("x[%s]" % ','.join(extras))).extras
+        if extras:
+            self.extras = Requirement.parse(("x[%s]" % ','.join(extras))).extras
+        else:
+            self.extras = ()
         self.dist = dist
 
     def __str__(self):

test:

timeit.timeit(
    setup='import pkg_resources',
    stmt='for ep in pkg_resources.iter_entry_points("openstack.cli.extension"): pass',
    number=100)

(with 17 entry points in the "openstack.cli.extension" group)

time with original code: 0.969233036041
time with the patch below: 0.0956060886383

@tmmorin

This comment has been minimized.

tmmorin commented Aug 11, 2017

(a possibly better fix may possibly be, in pkg_resources._vendor.packaging.requirements, to avoid calling pyparsing in the case of a trivial requirements string)

tmmorin added a commit to tmmorin/setuptools that referenced this issue Sep 8, 2017

EntryPoint: avoid costly pyparsing for dummy requirement string
In the case where an EntryPoint is created with no extra, a Requirement
object is instanciated with a dummy requirement string 'x[]', which incurs
a useless significant cost due to the use of pyparsing to parse this string.

This change skips the instanciation of a Requirements objects in the
case where extra is empty.

See github issue pypa#1132 for more details, including information on the
performance improvement brought by this change.
@jaraco

This comment has been minimized.

Member

jaraco commented Oct 12, 2017

I've been unable to replicate your findings. When I try running timeit on iter_entry_points, I'm seeing results in the ~0.09 ballpark:

$ python -m timeit -s "import pkg_resources" -n 100 "for ep in pkg_resources.iter_entry_points('distutils.commands'): pass"
100 loops, best of 3: 95 usec per loop
:0: UserWarning: The test results are likely unreliable. The worst
time (680 usec) was more than four times slower than the best time.

Is there a test that can replicate the undesirable behavior without a third-party package? If not, can you describe steps to create an environment that does readily show the slow behavior?

@jaraco jaraco closed this in 4ee5c65 Oct 12, 2017

jaraco added a commit that referenced this issue Oct 12, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment