-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements - mostly in curry
#57
Conversation
This makes getting elements from tuples and lists much faster but significantly slows down getting elements from iterators I believe that first(indexable) occurs more frequently in common operations.
benchmarking with bench/test_curry.py Before: Ran 1 test in 0.246s After: Ran 1 test in 0.183s
Follows Ask foregiveness, not permission principle benchmarking with bench/test_curry.py Before: Ran 1 test in 0.183s After: Ran 1 test in 0.122s
If we know that we only need one more argument then we return a `partial` object rather than a `curry`. This is done for performance reasons
For example the following no longer works >>> from toolz.curried import get >>> get(1)(default=None)(data) But curried get *is* about 20% faster Before: Ran 1 test in 0.085s After: Ran 1 test in 0.067s
…o memoize-curry-inspect
I've updated this. Performance of |
try: | ||
return seq[1] | ||
except TypeError: | ||
return nth(1, seq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know this will raise an exception in nth
that will need to be caught, which is relatively costly, so might as well do the following here instead of calling nth(1, seq)
:
return next(itertools.islice(iter(x), 1, None))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I've really just edited first and second to be copies of nth
. Maybe both should be partials on nth
.
This patch looks pretty good, but I offer four points to consider:
|
Regarding changing |
In my usage of
I hadn't thought of this. This is particularly an issue with indexable objects like
I doubt that this type checking will be an issue. I like your suggetions below about making
|
Get is presumably most often used with a single input where the desired operation of get(ind, val) is just val[ind]. We do this up front with a try-except block. This hurts performance on the list syntax get(indicies, val) -> [val[i] for i in indices] if isinstance(indices, list) But performance there wasn't so great to begin with and it doesn't degrade as significantly as the single index case improves Single index Before: Ran 1 test in 0.058s After: Ran 1 test in 0.025s Multi-index Before: Ran 1 test in 0.307s After: Ran 1 test in 0.408s
This degrades performance but maintains correctness. See tests Benchmarked with bench/test_first.py Before: Ran 2 tests in 0.285s After: Ran 2 tests in 0.705s
This is no longer necessary because we now usually only call this function once per seen argument
OK, I've reverted the behavior of The I also sped up |
Excellent changes. Regarding isinstance(seq, collections.Sequence) Regarding try:
return seq[ind]
except TypeError: # `ind` may be a list
if isinstance(ind, list):
return tuple(get(i, seq, default) for i in ind)
elif default is not no_default:
return default
else
raise
except (KeyError, IndexError): # we know `ind` is not a list
if default is no_default:
raise
else:
return default |
I lied. I benchmarked some variations of def _get(ind, seq, default):
try:
return seq[ind]
except (KeyError, IndexError):
return default
# "New" in benchmarks
def get(ind, seq, default=no_default):
try:
return seq[ind]
except TypeError: # `ind` may be a list
if isinstance(ind, list):
if default is no_default:
return tuple(seq[i] for i in ind)
else:
return tuple(_get(i, seq, default) for i in ind)
elif default is not no_default:
return default
else:
raise
except (KeyError, IndexError): # we know `ind` is not a list
if default is no_default:
raise
else:
return default
# "Alt" in benchmarks
def alt_get(ind, seq, default=no_default):
try:
return seq[ind]
except:
pass
if isinstance(ind, list):
if default is no_default:
return tuple(seq[i] for i in ind)
else:
return tuple(_get(i, seq, default) for i in ind)
if default is no_default:
return seq[ind]
else:
try:
return seq[ind]
except (KeyError, IndexError):
return default Below are the result of the benchmarks. "Old" is master, "Cur" is current PR, "New" is my code from above, and "Alt" is a variation of the current PR: tuples = [(1, 2, 3) for i in range(100000)]
large_tuples = [range(1000) for i in range(1000)]
# test single
%timeit for tup in tuples: get(1, tup)
Old: 10 loops, best of 3: 120 ms per loop
*Cur: 10 loops, best of 3: 60.2 ms per loop
*New: 10 loops, best of 3: 60.5 ms per loop
*Alt: 10 loops, best of 3: 60.3 ms per loop
# test list
%timeit for tup in tuples: get([1, 2], tup)
Old: 1 loops, best of 3: 668 ms per loop
Cur: 1 loops, best of 3: 866 ms per loop
New: 1 loops, best of 3: 789 ms per loop
Alt: 1 loops, best of 3: 756 ms per loop
# test long list
%timeit for tup in large_tuples: get(range(100), tup)
Old: 10 loops, best of 3: 145 ms per loop
Cur: 10 loops, best of 3: 82.7 ms per loop
*New: 10 loops, best of 3: 28.2 ms per loop
*Alt: 10 loops, best of 3: 27.4 ms per loop
# test longer list
%timeit for tup in large_tuples: get(range(1000), tup)
Old: 1 loops, best of 3: 1.41 s per loop
Cur: 1 loops, best of 3: 735 ms per loop
*New: 1 loops, best of 3: 189 ms per loop
*Alt: 10 loops, best of 3: 188 ms per loop
# test single with default
%timeit for tup in tuples: get(3, tup, 1)
Old: 1 loops, best of 3: 430 ms per loop
Cur: 1 loops, best of 3: 591 ms per loop
*New: 1 loops, best of 3: 393 ms per loop
Alt: 1 loops, best of 3: 591 ms per loop
# test short list with default
%timeit for tup in tuples: get([10, 11], tup, 1)
Old: 1 loops, best of 3: 1.39 s per loop
Cur: 1 loops, best of 3: 2.02 s per loop
*New: 1 loops, best of 3: 1.45 s per loop
*Alt: 1 loops, best of 3: 1.45 s per loop
# test list with default
%timeit for tup in tuples: get(range(10, 20), tup, 1)
Old: 1 loops, best of 3: 5.11 s per loop
Cur: 1 loops, best of 3: 6.77 s per loop
*New: 1 loops, best of 3: 4.2 s per loop
*Alt: 1 loops, best of 3: 4.15 s per loop The only benchmark above in which "New" and "Alt" differ significantly on my machine is "test single with default". |
Awesome. Should I include the |
You may go ahead. |
This PR is getting large. It's also mostly helpful. I've started doing work off of it rather than master. I'm going to merge it and open an issue for |
Performance improvements - mostly in `curry`
PR pytoolz#57 put an emphasis on the performance of `get` because it is so frequently used. Some timings for `get` were given in that thread. This commit improves those benchmarks as follows: "test list" is 50% faster (using a two-element list of indices). "test long list" is 100% faster (using a 100-element list of indices). This is achieved because `operator.itemgetter(*ind)(seq)` is significantly faster than `tuple(seq[i] for i in ind)`. For me, it is 3x faster.
Edit: This PR has grown (sorry) to include various performance improvements. At times they sacrifice clarity and in rare cases correctness (see mrocklin@6ba6737) but small operations like
first
,second
, andcurry(get)
run significantly faster in the common case. I often use these functions inside ofmap
on large datasets so this is, I think worth looking into.I list performance improvements in some of the commits. I also have included benchmarks in
bench/
. I usenosetests
on individual files while benchmarking.Original description follows.
In #52 I leveraged the
inspect
module to catch more errors in curried functions before execution. Unfortunately when the same curried function is called many times (for example if it is mapped onto a list) then this results in very many inspect calls. These are both expensive and needless.A simple fix is to memoize the function that calls
inspect
. This performs well in practice but introduces a potential memory leak. Alternatives might include introducing some additional state intocurry
or implementing a fixed-sized dictionary for use as an LRU cache in memoize.Edit: I've extended this PR with a small fix to memoize. We now ask forgiveness, not permission, when dealing with non-hashable inputs to memoized functions. This results in a significant speed improvement for memoized functions.
In general this PR reduces the overhead of using transformed functions. This is significant when transforming lightweight functions like
get
.