Performance improvements - mostly in `curry` #57

mrocklin · 2013-10-31T21:16:06Z

Edit: This PR has grown (sorry) to include various performance improvements. At times they sacrifice clarity and in rare cases correctness (see mrocklin@6ba6737) but small operations like first, second, and curry(get) run significantly faster in the common case. I often use these functions inside of map on large datasets so this is, I think worth looking into.

I list performance improvements in some of the commits. I also have included benchmarks in bench/. I use nosetests on individual files while benchmarking.

Original description follows.

In #52 I leveraged the inspect module to catch more errors in curried functions before execution. Unfortunately when the same curried function is called many times (for example if it is mapped onto a list) then this results in very many inspect calls. These are both expensive and needless.

A simple fix is to memoize the function that calls inspect. This performs well in practice but introduces a potential memory leak. Alternatives might include introducing some additional state into curry or implementing a fixed-sized dictionary for use as an LRU cache in memoize.

Edit: I've extended this PR with a small fix to memoize. We now ask forgiveness, not permission, when dealing with non-hashable inputs to memoized functions. This results in a significant speed improvement for memoized functions.

In general this PR reduces the overhead of using transformed functions. This is significant when transforming lightweight functions like get.

This makes getting elements from tuples and lists much faster but significantly slows down getting elements from iterators I believe that first(indexable) occurs more frequently in common operations.

benchmarking with bench/test_curry.py Before: Ran 1 test in 0.246s After: Ran 1 test in 0.183s

Follows Ask foregiveness, not permission principle benchmarking with bench/test_curry.py Before: Ran 1 test in 0.183s After: Ran 1 test in 0.122s

If we know that we only need one more argument then we return a `partial` object rather than a `curry`. This is done for performance reasons

For example the following no longer works >>> from toolz.curried import get >>> get(1)(default=None)(data) But curried get *is* about 20% faster Before: Ran 1 test in 0.085s After: Ran 1 test in 0.067s

…o memoize-curry-inspect

mrocklin · 2013-11-02T01:41:25Z

I've updated this. Performance of first, second, memoize, and curry(foo) are all substantially improved. Some ugliness was added.

eriknw · 2013-11-02T03:10:50Z

toolz/itertoolz/core.py

+    try:
+        return seq[1]
+    except TypeError:
+        return nth(1, seq)


We know this will raise an exception in nth that will need to be caught, which is relatively costly, so might as well do the following here instead of calling nth(1, seq):

return next(itertools.islice(iter(x), 1, None))

Looks like I've really just edited first and second to be copies of nth. Maybe both should be partials on nth.

eriknw · 2013-11-02T03:44:21Z

This patch looks pretty good, but I offer four points to consider:

On my computer, first(seq) seems to be about four times slower on iterators, but twice as fast on sequences. Although my benchmarking of this function is limited, it's clear there's a tradeoff between input types and optimal performance.
Regarding first, second, and nth, their behaviors have been changed for some objects, including OrderedDict, dict, and set. Although using these functions on dict and set may be questionable, using these functions on OrderedDict is perfectly reasonable, and it's even more reasonable to expect iteration--not indexing--to be done on a user-type object.
Changing a curry object to partial when one arg left is an interesting trick. The only possible downside is it changes the type of the curries object from toolz.functoolz.core.curry to functools.partial, so if a user ever checked for the type of object--whether or not it's a good idea to do so--they may need to check for both. Not a show stopper at all, especially if its benchmarks are better.
I agree that memoization of _num_required_args would be a great place for an LRU cache to limit memory usage.

eriknw · 2013-11-02T03:53:26Z

Regarding changing curry object to a partial objects, their members should be made consistent. Specifically, curried.kwargs should be changed to curried.keywords, and keywords should be None if no keywords were curried. Note that args is always a tuple even if no args were curried.

mrocklin · 2013-11-02T04:59:13Z

This patch looks pretty good, but I offer four points to consider:

On my computer, first(seq) seems to be about four times slower on iterators, but twice as fast on sequences. Although my benchmarking of this function is limited, it's clear there's a tradeoff between input types and optimal performance.

In my usage of toolz when I call first on an iterator I call it once but when i call first on a tuple I often am actually mapping it across a large collection of tuples. I have yet to run into an application where the delay of first on an iterator in relevant. I struggle with the delays of first and get on tuples on a daily basis. Of course, this is just my application. It'd be nice to see what other hypothetical users are doing.

Regarding first, second, and nth, their behaviors have been changed for some objects, including OrderedDict, dict, and set. Although using these functions on dict and set may be questionable, using these functions on OrderedDict is perfectly reasonable, and it's even more reasonable to expect iteration--not indexing--to be done on a user-type object.

I hadn't thought of this. This is particularly an issue with indexable objects like dicts where, instead of returning the "first" key we might return a value. In general I'm fine with calling first on sets.

Changing a curry object to partial when one arg left is an interesting trick. The only possible downside is it changes the type of the curries object from toolz.functoolz.core.curry to functools.partial, so if a user ever checked for the type of object--whether or not it's a good idea to do so--they may need to check for both. Not a show stopper at all, especially if its benchmarks are better.

I doubt that this type checking will be an issue. I like your suggetions below about making curry and partial have similar interfaces. They'll have similar duck types.

I agree that memoization of _num_required_args would be a great place for an LRU cache to limit memory usage.

_num_required_args is less important now that curry.__call__ tries to call the function before much else. I may remove the memoization.

Get is presumably most often used with a single input where the desired operation of get(ind, val) is just val[ind]. We do this up front with a try-except block. This hurts performance on the list syntax get(indicies, val) -> [val[i] for i in indices] if isinstance(indices, list) But performance there wasn't so great to begin with and it doesn't degrade as significantly as the single index case improves Single index Before: Ran 1 test in 0.058s After: Ran 1 test in 0.025s Multi-index Before: Ran 1 test in 0.307s After: Ran 1 test in 0.408s

This degrades performance but maintains correctness. See tests Benchmarked with bench/test_first.py Before: Ran 2 tests in 0.285s After: Ran 2 tests in 0.705s

This is no longer necessary because we now usually only call this function once per seen argument

mrocklin · 2013-11-02T16:10:02Z

OK, I've reverted the behavior of first and second. I've kept nth the way it is for now.

The curry interface now mimics partial.

I also sped up get in the common case following my newfound obsession with ask-foregiveness-not-permission.

eriknw · 2013-11-02T20:52:54Z

Excellent changes.

Regarding nth and last, perhaps consider using the following, which works for python built-ins (and types derived therefrom) like list, tuple, str, but not for dict. On non-sequence objects, this will allow iteration to be done as expected as is done in first and second. There is a constant overhead for calling isinstance, but the payoff can be well worth it for large indices:

isinstance(seq, collections.Sequence)

Regarding get, I'm not about to start benchmarking various alternatives, but I'll provide one alternative as food-for-thought:

try:
    return seq[ind]
except TypeError:  # `ind` may be a list
    if isinstance(ind, list):
        return tuple(get(i, seq, default) for i in ind)
    elif default is not no_default:
        return default
    else
        raise
except (KeyError, IndexError):  # we know `ind` is not a list
    if default is no_default:
        raise
    else:
        return default

eriknw · 2013-11-03T14:38:08Z

I lied. I benchmarked some variations of get and came up with the below:

def _get(ind, seq, default):
    try:
        return seq[ind]
    except (KeyError, IndexError):
        return default

# "New" in benchmarks
def get(ind, seq, default=no_default):
    try:
        return seq[ind]
    except TypeError:  # `ind` may be a list
        if isinstance(ind, list):
            if default is no_default:
                return tuple(seq[i] for i in ind)
            else:
                return tuple(_get(i, seq, default) for i in ind)
        elif default is not no_default:
            return default
        else:
            raise
    except (KeyError, IndexError):  # we know `ind` is not a list
        if default is no_default:
            raise
        else:
            return default

# "Alt" in benchmarks
def alt_get(ind, seq, default=no_default):
    try:
        return seq[ind]
    except:
        pass
    if isinstance(ind, list):
        if default is no_default:
            return tuple(seq[i] for i in ind)
        else:
            return tuple(_get(i, seq, default) for i in ind)
    if default is no_default:
        return seq[ind]
    else:
        try:
            return seq[ind]
        except (KeyError, IndexError):
            return default

Below are the result of the benchmarks. "Old" is master, "Cur" is current PR, "New" is my code from above, and "Alt" is a variation of the current PR:

tuples = [(1, 2, 3) for i in range(100000)]
large_tuples = [range(1000) for i in range(1000)]

# test single
%timeit for tup in tuples: get(1, tup)
 Old: 10 loops, best of 3: 120 ms per loop
*Cur: 10 loops, best of 3: 60.2 ms per loop
*New: 10 loops, best of 3: 60.5 ms per loop
*Alt: 10 loops, best of 3: 60.3 ms per loop

# test list
%timeit for tup in tuples: get([1, 2], tup)
 Old: 1 loops, best of 3: 668 ms per loop
 Cur: 1 loops, best of 3: 866 ms per loop
 New: 1 loops, best of 3: 789 ms per loop
 Alt: 1 loops, best of 3: 756 ms per loop

# test long list
%timeit for tup in large_tuples: get(range(100), tup)
 Old: 10 loops, best of 3: 145 ms per loop
 Cur: 10 loops, best of 3: 82.7 ms per loop
*New: 10 loops, best of 3: 28.2 ms per loop
*Alt: 10 loops, best of 3: 27.4 ms per loop

# test longer list
%timeit for tup in large_tuples: get(range(1000), tup)
 Old: 1 loops, best of 3: 1.41 s per loop
 Cur: 1 loops, best of 3: 735 ms per loop
*New: 1 loops, best of 3: 189 ms per loop
*Alt: 10 loops, best of 3: 188 ms per loop

# test single with default
%timeit for tup in tuples: get(3, tup, 1)
 Old: 1 loops, best of 3: 430 ms per loop
 Cur: 1 loops, best of 3: 591 ms per loop
*New: 1 loops, best of 3: 393 ms per loop
 Alt: 1 loops, best of 3: 591 ms per loop

# test short list with default
%timeit for tup in tuples: get([10, 11], tup, 1)
 Old: 1 loops, best of 3: 1.39 s per loop
 Cur: 1 loops, best of 3: 2.02 s per loop
*New: 1 loops, best of 3: 1.45 s per loop
*Alt: 1 loops, best of 3: 1.45 s per loop

# test list with default
%timeit for tup in tuples: get(range(10, 20), tup, 1)
 Old: 1 loops, best of 3: 5.11 s per loop
 Cur: 1 loops, best of 3: 6.77 s per loop
*New: 1 loops, best of 3: 4.2 s per loop
*Alt: 1 loops, best of 3: 4.15 s per loop

The only benchmark above in which "New" and "Alt" differ significantly on my machine is "test single with default".

mrocklin · 2013-11-03T14:51:56Z

Awesome. Should I include the new implementation or do you want to?

eriknw · 2013-11-03T14:59:17Z

You may go ahead.

@eriknw

By @eriknw See pytoolz#57 (comment) for timings

mrocklin · 2013-11-03T17:40:41Z

This PR is getting large. It's also mostly helpful. I've started doing work off of it rather than master.

I'm going to merge it and open an issue for nth.

Performance improvements - mostly in `curry`

PR pytoolz#57 put an emphasis on the performance of `get` because it is so frequently used. Some timings for `get` were given in that thread. This commit improves those benchmarks as follows: "test list" is 50% faster (using a two-element list of indices). "test long list" is 100% faster (using a 100-element list of indices). This is achieved because `operator.itemgetter(*ind)(seq)` is significantly faster than `tuple(seq[i] for i in ind)`. For me, it is 3x faster.

mrocklin added 17 commits October 31, 2013 14:05

memoize _num_required_args

fb254ca

handle non-hashable args in memoize with try-except

4dc3068

add benchmarks for first/second

00b87e3

Add try seq[0] to first

f1433f9

This makes getting elements from tuples and lists much faster but significantly slows down getting elements from iterators I believe that first(indexable) occurs more frequently in common operations.

memoize _num_required_args

f95341e

handle non-hashable args in memoize with try-except

fcf4897

add bench/test_curry

2ccbed0

speed up curry.__call__ with kwargs check

619ed02

benchmarking with bench/test_curry.py Before: Ran 1 test in 0.246s After: Ran 1 test in 0.183s

Try calling function before validating in curry

b9cacaa

Follows Ask foregiveness, not permission principle benchmarking with bench/test_curry.py Before: Ran 1 test in 0.183s After: Ran 1 test in 0.122s

add baseline comparison for curry with partial

6860f6f

clean up curry.__call__ error handling

485be98

curry invokes partial if it can

a24cf16

If we know that we only need one more argument then we return a `partial` object rather than a `curry`. This is done for performance reasons

Curry returns partial even with unsatisfied kwargs

6ba6737

For example the following no longer works >>> from toolz.curried import get >>> get(1)(default=None)(data) But curried get *is* about 20% faster Before: Ran 1 test in 0.085s After: Ran 1 test in 0.067s

Merge branch 'memoize-curry-inspect' of github.com:mrocklin/toolz int…

7a5b650

…o memoize-curry-inspect

Merge branch 'performance' into memoize-curry-inspect

055a1c6

remove old code

5cc0b40

move iters within tests - was burning up before

a8528bb

eriknw reviewed Nov 2, 2013
View reviewed changes

mrocklin added 4 commits November 1, 2013 22:12

curry mimics partial's interface

a844694

first and second iterate rather than index

4453b32

This degrades performance but maintains correctness. See tests Benchmarked with bench/test_first.py Before: Ran 2 tests in 0.285s After: Ran 2 tests in 0.705s

don't curry _num_required_args

73859e2

This is no longer necessary because we now usually only call this function once per seen argument

mrocklin added 2 commits November 3, 2013 07:03

Improve get implementation

1b538f8

By @eriknw See pytoolz#57 (comment) for timings

Merge branch 'master' into memoize-curry-inspect

5ee5184

mrocklin added a commit that referenced this pull request Nov 3, 2013

Merge pull request #57 from mrocklin/memoize-curry-inspect

8df10a8

Performance improvements - mostly in `curry`

mrocklin merged commit 8df10a8 into pytoolz:master Nov 3, 2013

mrocklin mentioned this pull request Nov 3, 2013

Indexing in nth not always correct #62

Closed

eriknw mentioned this pull request Nov 10, 2013

get is faster for list of indices with no default #86

Merged

EricTheMagician mentioned this pull request Jan 10, 2019

memory leak with curry #436

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements - mostly in `curry` #57

Performance improvements - mostly in `curry` #57

mrocklin commented Oct 31, 2013

mrocklin commented Nov 2, 2013

eriknw Nov 2, 2013

mrocklin Nov 2, 2013

eriknw commented Nov 2, 2013

eriknw commented Nov 2, 2013

mrocklin commented Nov 2, 2013

mrocklin commented Nov 2, 2013

eriknw commented Nov 2, 2013

eriknw commented Nov 3, 2013

mrocklin commented Nov 3, 2013

eriknw commented Nov 3, 2013

mrocklin commented Nov 3, 2013

Performance improvements - mostly in curry #57

Performance improvements - mostly in curry #57

Conversation

mrocklin commented Oct 31, 2013

mrocklin commented Nov 2, 2013

eriknw Nov 2, 2013

Choose a reason for hiding this comment

mrocklin Nov 2, 2013

Choose a reason for hiding this comment

eriknw commented Nov 2, 2013

eriknw commented Nov 2, 2013

mrocklin commented Nov 2, 2013

mrocklin commented Nov 2, 2013

eriknw commented Nov 2, 2013

eriknw commented Nov 3, 2013

mrocklin commented Nov 3, 2013

eriknw commented Nov 3, 2013

mrocklin commented Nov 3, 2013

Performance improvements - mostly in `curry` #57

Performance improvements - mostly in `curry` #57