Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements - mostly in curry #57

Merged
merged 23 commits into from
Nov 3, 2013

Conversation

mrocklin
Copy link
Member

Edit: This PR has grown (sorry) to include various performance improvements. At times they sacrifice clarity and in rare cases correctness (see mrocklin@6ba6737) but small operations like first, second, and curry(get) run significantly faster in the common case. I often use these functions inside of map on large datasets so this is, I think worth looking into.

I list performance improvements in some of the commits. I also have included benchmarks in bench/. I use nosetests on individual files while benchmarking.

Original description follows.

In #52 I leveraged the inspect module to catch more errors in curried functions before execution. Unfortunately when the same curried function is called many times (for example if it is mapped onto a list) then this results in very many inspect calls. These are both expensive and needless.

A simple fix is to memoize the function that calls inspect. This performs well in practice but introduces a potential memory leak. Alternatives might include introducing some additional state into curry or implementing a fixed-sized dictionary for use as an LRU cache in memoize.

Edit: I've extended this PR with a small fix to memoize. We now ask forgiveness, not permission, when dealing with non-hashable inputs to memoized functions. This results in a significant speed improvement for memoized functions.

In general this PR reduces the overhead of using transformed functions. This is significant when transforming lightweight functions like get.

This makes getting elements from tuples and lists much faster
but significantly slows down getting elements from iterators

I believe that first(indexable) occurs more frequently in common
operations.
benchmarking with bench/test_curry.py
Before: Ran 1 test in 0.246s
After:  Ran 1 test in 0.183s
Follows Ask foregiveness, not permission principle

benchmarking with bench/test_curry.py
Before: Ran 1 test in 0.183s
After:  Ran 1 test in 0.122s
If we know that we only need one more argument then we
return a `partial` object rather than a `curry`.

This is done for performance reasons
For example the following no longer works

>>> from toolz.curried import get
>>> get(1)(default=None)(data)

But curried get *is* about 20% faster

Before: Ran 1 test in 0.085s
After:  Ran 1 test in 0.067s
@mrocklin
Copy link
Member Author

mrocklin commented Nov 2, 2013

I've updated this. Performance of first, second, memoize, and curry(foo) are all substantially improved. Some ugliness was added.

try:
return seq[1]
except TypeError:
return nth(1, seq)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know this will raise an exception in nth that will need to be caught, which is relatively costly, so might as well do the following here instead of calling nth(1, seq):

        return next(itertools.islice(iter(x), 1, None))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I've really just edited first and second to be copies of nth. Maybe both should be partials on nth.

@eriknw
Copy link
Member

eriknw commented Nov 2, 2013

This patch looks pretty good, but I offer four points to consider:

  1. On my computer, first(seq) seems to be about four times slower on iterators, but twice as fast on sequences. Although my benchmarking of this function is limited, it's clear there's a tradeoff between input types and optimal performance.
  2. Regarding first, second, and nth, their behaviors have been changed for some objects, including OrderedDict, dict, and set. Although using these functions on dict and set may be questionable, using these functions on OrderedDict is perfectly reasonable, and it's even more reasonable to expect iteration--not indexing--to be done on a user-type object.
  3. Changing a curry object to partial when one arg left is an interesting trick. The only possible downside is it changes the type of the curries object from toolz.functoolz.core.curry to functools.partial, so if a user ever checked for the type of object--whether or not it's a good idea to do so--they may need to check for both. Not a show stopper at all, especially if its benchmarks are better.
  4. I agree that memoization of _num_required_args would be a great place for an LRU cache to limit memory usage.

@eriknw
Copy link
Member

eriknw commented Nov 2, 2013

Regarding changing curry object to a partial objects, their members should be made consistent. Specifically, curried.kwargs should be changed to curried.keywords, and keywords should be None if no keywords were curried. Note that args is always a tuple even if no args were curried.

@mrocklin
Copy link
Member Author

mrocklin commented Nov 2, 2013

This patch looks pretty good, but I offer four points to consider:

On my computer, first(seq) seems to be about four times slower on iterators, but twice as fast on sequences. Although my benchmarking of this function is limited, it's clear there's a tradeoff between input types and optimal performance.

In my usage of toolz when I call first on an iterator I call it once but when i call first on a tuple I often am actually mapping it across a large collection of tuples. I have yet to run into an application where the delay of first on an iterator in relevant. I struggle with the delays of first and get on tuples on a daily basis. Of course, this is just my application. It'd be nice to see what other hypothetical users are doing.

Regarding first, second, and nth, their behaviors have been changed for some objects, including OrderedDict, dict, and set. Although using these functions on dict and set may be questionable, using these functions on OrderedDict is perfectly reasonable, and it's even more reasonable to expect iteration--not indexing--to be done on a user-type object.

I hadn't thought of this. This is particularly an issue with indexable objects like dicts where, instead of returning the "first" key we might return a value. In general I'm fine with calling first on sets.

Changing a curry object to partial when one arg left is an interesting trick. The only possible downside is it changes the type of the curries object from toolz.functoolz.core.curry to functools.partial, so if a user ever checked for the type of object--whether or not it's a good idea to do so--they may need to check for both. Not a show stopper at all, especially if its benchmarks are better.

I doubt that this type checking will be an issue. I like your suggetions below about making curry and partial have similar interfaces. They'll have similar duck types.

  1. I agree that memoization of _num_required_args would be a great place for an LRU cache to limit memory usage.

_num_required_args is less important now that curry.__call__ tries to call the function before much else. I may remove the memoization.

Get is presumably most often used with a single input where the desired
operation of get(ind, val) is just val[ind].  We do this up front with a
try-except block.

This hurts performance on the list syntax

    get(indicies, val) -> [val[i] for i in indices]
                          if isinstance(indices, list)

But performance there wasn't so great to begin with and it doesn't
degrade as significantly as the single index case improves

Single index
Before: Ran 1 test in 0.058s
After:  Ran 1 test in 0.025s

Multi-index
Before: Ran 1 test in 0.307s
After:  Ran 1 test in 0.408s
This degrades performance but maintains correctness.  See tests

Benchmarked with bench/test_first.py
Before: Ran 2 tests in 0.285s
After:  Ran 2 tests in 0.705s
This is no longer necessary because we now usually only call this
function once per seen argument
@mrocklin
Copy link
Member Author

mrocklin commented Nov 2, 2013

OK, I've reverted the behavior of first and second. I've kept nth the way it is for now.

The curry interface now mimics partial.

I also sped up get in the common case following my newfound obsession with ask-foregiveness-not-permission.

@eriknw
Copy link
Member

eriknw commented Nov 2, 2013

Excellent changes.

Regarding nth and last, perhaps consider using the following, which works for python built-ins (and types derived therefrom) like list, tuple, str, but not for dict. On non-sequence objects, this will allow iteration to be done as expected as is done in first and second. There is a constant overhead for calling isinstance, but the payoff can be well worth it for large indices:

isinstance(seq, collections.Sequence)

Regarding get, I'm not about to start benchmarking various alternatives, but I'll provide one alternative as food-for-thought:

try:
    return seq[ind]
except TypeError:  # `ind` may be a list
    if isinstance(ind, list):
        return tuple(get(i, seq, default) for i in ind)
    elif default is not no_default:
        return default
    else
        raise
except (KeyError, IndexError):  # we know `ind` is not a list
    if default is no_default:
        raise
    else:
        return default

@eriknw
Copy link
Member

eriknw commented Nov 3, 2013

I lied. I benchmarked some variations of get and came up with the below:

def _get(ind, seq, default):
    try:
        return seq[ind]
    except (KeyError, IndexError):
        return default

# "New" in benchmarks
def get(ind, seq, default=no_default):
    try:
        return seq[ind]
    except TypeError:  # `ind` may be a list
        if isinstance(ind, list):
            if default is no_default:
                return tuple(seq[i] for i in ind)
            else:
                return tuple(_get(i, seq, default) for i in ind)
        elif default is not no_default:
            return default
        else:
            raise
    except (KeyError, IndexError):  # we know `ind` is not a list
        if default is no_default:
            raise
        else:
            return default

# "Alt" in benchmarks
def alt_get(ind, seq, default=no_default):
    try:
        return seq[ind]
    except:
        pass
    if isinstance(ind, list):
        if default is no_default:
            return tuple(seq[i] for i in ind)
        else:
            return tuple(_get(i, seq, default) for i in ind)
    if default is no_default:
        return seq[ind]
    else:
        try:
            return seq[ind]
        except (KeyError, IndexError):
            return default

Below are the result of the benchmarks. "Old" is master, "Cur" is current PR, "New" is my code from above, and "Alt" is a variation of the current PR:

tuples = [(1, 2, 3) for i in range(100000)]
large_tuples = [range(1000) for i in range(1000)]

# test single
%timeit for tup in tuples: get(1, tup)
 Old: 10 loops, best of 3: 120 ms per loop
*Cur: 10 loops, best of 3: 60.2 ms per loop
*New: 10 loops, best of 3: 60.5 ms per loop
*Alt: 10 loops, best of 3: 60.3 ms per loop

# test list
%timeit for tup in tuples: get([1, 2], tup)
 Old: 1 loops, best of 3: 668 ms per loop
 Cur: 1 loops, best of 3: 866 ms per loop
 New: 1 loops, best of 3: 789 ms per loop
 Alt: 1 loops, best of 3: 756 ms per loop

# test long list
%timeit for tup in large_tuples: get(range(100), tup)
 Old: 10 loops, best of 3: 145 ms per loop
 Cur: 10 loops, best of 3: 82.7 ms per loop
*New: 10 loops, best of 3: 28.2 ms per loop
*Alt: 10 loops, best of 3: 27.4 ms per loop

# test longer list
%timeit for tup in large_tuples: get(range(1000), tup)
 Old: 1 loops, best of 3: 1.41 s per loop
 Cur: 1 loops, best of 3: 735 ms per loop
*New: 1 loops, best of 3: 189 ms per loop
*Alt: 10 loops, best of 3: 188 ms per loop

# test single with default
%timeit for tup in tuples: get(3, tup, 1)
 Old: 1 loops, best of 3: 430 ms per loop
 Cur: 1 loops, best of 3: 591 ms per loop
*New: 1 loops, best of 3: 393 ms per loop
 Alt: 1 loops, best of 3: 591 ms per loop

# test short list with default
%timeit for tup in tuples: get([10, 11], tup, 1)
 Old: 1 loops, best of 3: 1.39 s per loop
 Cur: 1 loops, best of 3: 2.02 s per loop
*New: 1 loops, best of 3: 1.45 s per loop
*Alt: 1 loops, best of 3: 1.45 s per loop

# test list with default
%timeit for tup in tuples: get(range(10, 20), tup, 1)
 Old: 1 loops, best of 3: 5.11 s per loop
 Cur: 1 loops, best of 3: 6.77 s per loop
*New: 1 loops, best of 3: 4.2 s per loop
*Alt: 1 loops, best of 3: 4.15 s per loop

The only benchmark above in which "New" and "Alt" differ significantly on my machine is "test single with default".

@mrocklin
Copy link
Member Author

mrocklin commented Nov 3, 2013

Awesome. Should I include the new implementation or do you want to?

@eriknw
Copy link
Member

eriknw commented Nov 3, 2013

You may go ahead.

@mrocklin
Copy link
Member Author

mrocklin commented Nov 3, 2013

This PR is getting large. It's also mostly helpful. I've started doing work off of it rather than master.

I'm going to merge it and open an issue for nth.

mrocklin added a commit that referenced this pull request Nov 3, 2013
Performance improvements - mostly in `curry`
@mrocklin mrocklin merged commit 8df10a8 into pytoolz:master Nov 3, 2013
eriknw added a commit to eriknw/toolz that referenced this pull request Nov 10, 2013
PR pytoolz#57 put an emphasis on the performance of `get` because it is
so frequently used.  Some timings for `get` were given in that thread.
This commit improves those benchmarks as follows:

"test list" is 50% faster (using a two-element list of indices).
"test long list" is 100% faster (using a 100-element list of indices).

This is achieved because `operator.itemgetter(*ind)(seq)` is significantly
faster than `tuple(seq[i] for i in ind)`.  For me, it is 3x faster.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants