Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg_resources causes a slow import; any way to avoid? #291

Closed
jason-s opened this issue Oct 23, 2017 · 6 comments
Closed

pkg_resources causes a slow import; any way to avoid? #291

jason-s opened this issue Oct 23, 2017 · 6 comments

Comments

@jason-s
Copy link

jason-s commented Oct 23, 2017

The use of pkg_resources.parse_version (see #162) causes the import time of numexpr to slow down significantly, from approx 0.4 seconds without pkg_resources.parse_version to 2.1 seconds with pkg_resources.parse_version.

(perhaps I should note that I am not a direct consumer of numexpr and have never used it, but pandas does and I use pandas)

Please consider alternate methods... if there are any.

The culprit is in expressions.py

import numpy
from pkg_resources import parse_version
_np_version = parse_version(numpy.__version__)

(my "speedup" was to comment out those last two lines referencing parse_version and substitute _np_version = '1.12.1' just to see how this impacts import time)


(see also pypa/setuptools#510)

@robbmcleod
Copy link
Member

Shouldn't be a problem with replacing this with a tuple.

@robbmcleod
Copy link
Member

Tuple comparison on splitting on '.' seems to work fine. Technically setuptools isn't required by numexpr so this shouldn't be in the code base.

Test fragments:

>>> ('1','12','1') > ('1','12','0','dev0+2342345435')
True
>>> ('1','12','0') > ('1','12','0b1')
False

I can't think of any other weird numbered variations that should break the tuple of strings comparison. Testing import times:

from time import perf_counter
t0 = perf_counter()
import numexpr
t1 = perf_counter()
import numexpr3
t2 = perf_counter()
import numba
t3 = perf_counter()
print(f'Import time for NumExpr 2.6: {t1-t0} s')
print(f'Import time for NumExpr 3.0: {t2-t1} s') 
print(f'Import time for Numba: {t3-t2} s')

Before patch:

Import time for NumExpr 2.6: 0.328s
Import time for NumExpr 3.0: 0.026 s
Import time for Numba: 0.277 s

After patch:

Import time for NumExpr 2.6: 0.162 s

So something is still slow about NumExpr 2.6 compared to the dev branch. cpu_info.py?

@robbmcleod
Copy link
Member

I moved the imports for cpuinfo.cpu into print_versions in the test submodule and this shaved another 50 ms off, to ~110 ms. I'm really not sure what the slow-down is coming from compared to dev, it doesn't seem to be any of the imports. That said, it's 3x faster now.

@robbmcleod
Copy link
Member

Suggested changes:

d4b5b6e

The only remaining slow-down that I can think of is that the circular imports between necompiler.py and expressions.py are the cause, which I'm not willing to deal with (as it would be better to get the 3.0 branch pushed out to beta). Maybe Jason can clone and build from the latest and see how this does on his machine?

@FrancescAlted
Copy link
Contributor

Agreed. A 3x improvement in numexpr 2 is good enough already 👍

@robbmcleod
Copy link
Member

Apparently resolved? Maybe we could get a heads up whenever the next pandas release is so we can release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants