Avoid using .index() in partition_all #399

groutr · 2018-06-05T19:41:12Z

This PR fixes #387.

There is a slight performance hit, however for large sequences it is relatively small:

In [31]: seq = list(range(10000000))
In [32]: %timeit list(partition_all_old(3, seq))
411 ms ± 6.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [33]: %timeit list(partition_all(3, seq))
409 ms ± 1.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The overhead is a little more noticeable for small sequences:

In [40]: seq = list(range(11))
In [41]: %timeit list(partition_all_old(2, seq))
1.87 µs ± 5.79 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [42]: %timeit list(partition_all(2, seq))
2.17 µs ± 21.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

eriknw · 2018-06-05T19:52:46Z

Cool, thanks @groutr!

TravisCI isn't happy about pep8 check. Need to add a space between - operator.

Also, any chance of adding a regression test?

groutr · 2018-06-05T19:57:40Z

@eriknw I can add some more tests. I think I might be able to be smarter about calculating the index of no_pad and avoid looping altogether.
Also, my performance tests are a little flawed because I misunderstood the role of n. In the worst case, performance can actually drop quite a bit.

# seq = list(range(10000001))
In [23]: %timeit list(partition_all_old(len(seq)//2, seq))
381 ms ± 2.11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [24]: %timeit list(partition_all(len(seq)//2, seq))
684 ms ± 19.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I'm going to think about this a bit more tonight.

eriknw · 2018-06-05T19:58:52Z

Thanks, I appreciate your attention to detail.

groutr · 2018-06-05T21:18:16Z

In [75]: seq = list(range(1000001))
In [76]: %timeit list(partition_all(len(seq)//2, seq))
34.6 ms ± 75.9 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [77]: %timeit list(partition_all_old(len(seq)//2, seq))
34.7 ms ± 143 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [78]: %timeit list(partition_all_old(len(seq)//2, iter(seq)))
34.7 ms ± 211 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [79]: %timeit list(partition_all(len(seq)//2, iter(seq)))
35.3 ms ± 655 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Awww, better. (These times are not comparable to previous times, different machine).

groutr · 2018-06-05T21:39:17Z

@eriknw I'll add regression test and pep8 stuff later tonight.

groutr · 2018-06-06T04:01:23Z

@eriknw, one nice benefit of this PR is that we no longer do any actual equality tests here. Only identity testing is done.
One pathological case for the old version of partition_all would be a list of these objects

class SlowCompare(object):
    def __eq__(self, other):
        time.sleep(1)
        return self.__class__ == other.__class__

And the numbers

In [5]: %timeit list(partition_all_old(11, [SlowCompare()]*21)) # <--- *very* slow
10 s ± 753 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
# Use iter(...) to trigger slow path.
In [6]: %timeit list(partition_all(11, iter([SlowCompare()]*21)))
4.8 µs ± 55.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

groutr · 2018-06-12T05:17:53Z

@eriknw does this look good to you?

eriknw · 2018-06-13T15:22:54Z

LGTM, thanks again @groutr! Merging.

Avoid using .index()

a33b578

Try to find first index of no_pad more intelligently.

90ea648

len(prev) == n

c323698

groutr added 2 commits June 5, 2018 22:40

Add regression test.

0b24bb1

Fix comments to make pep8 happy.

0373145

groutr added 2 commits June 6, 2018 09:00

Remove added whitespace from tests.

7534eef

Test both fast and slow paths and make sure they give same result.

86953ab

eriknw merged commit 2bd9139 into pytoolz:master Jun 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid using .index() in partition_all #399

Avoid using .index() in partition_all #399

Uh oh!

groutr commented Jun 5, 2018 •

edited

Loading

Uh oh!

eriknw commented Jun 5, 2018

Uh oh!

groutr commented Jun 5, 2018

Uh oh!

eriknw commented Jun 5, 2018

Uh oh!

groutr commented Jun 5, 2018

Uh oh!

groutr commented Jun 5, 2018

Uh oh!

groutr commented Jun 6, 2018 •

edited

Loading

Uh oh!

groutr commented Jun 12, 2018

Uh oh!

eriknw commented Jun 13, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Avoid using .index() in partition_all #399

Avoid using .index() in partition_all #399

Uh oh!

Conversation

groutr commented Jun 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eriknw commented Jun 5, 2018

Uh oh!

groutr commented Jun 5, 2018

Uh oh!

eriknw commented Jun 5, 2018

Uh oh!

groutr commented Jun 5, 2018

Uh oh!

groutr commented Jun 5, 2018

Uh oh!

groutr commented Jun 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

groutr commented Jun 12, 2018

Uh oh!

eriknw commented Jun 13, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

groutr commented Jun 5, 2018 •

edited

Loading

groutr commented Jun 6, 2018 •

edited

Loading