Skip to content

Conversation

@groutr
Copy link
Contributor

@groutr groutr commented Nov 3, 2018

Make join much faster by avoiding the exception handling.

# Python 2.7 with len(leftseq) = 1000 and len(rightseq) = 10000
# 30% of the keys in the left side appear in the right side.
# first run is old join, second run is new join
Inner Join
100 loops, best of 3: 12.6 ms per loop
100 loops, best of 3: 5.44 ms per loop
Left Join
100 loops, best of 3: 17 ms per loop
100 loops, best of 3: 6.68 ms per loop
Right Join
100 loops, best of 3: 12.8 ms per loop
100 loops, best of 3: 5.74 ms per loop
Full Join
100 loops, best of 3: 17.5 ms per loop
100 loops, best of 3: 6.94 ms per loop

Also use iteritems(d) instead of d.items() to avoid creating a copy of left items on Python 2.

@groutr
Copy link
Contributor Author

groutr commented Nov 3, 2018

@eriknw

@groutr
Copy link
Contributor Author

groutr commented Nov 7, 2018

Working with 4 variations of join.
join1 is the original implementation of join.
join2 is changing the try/except to an if statement
join3 is adding the special case for inner/right joins
join4 detects each of the 4 join cases and defines a generator purpose-built for that join and returns the result of that generator. (Inspired by the cytoolz implementation).

Implementations here: https://gist.github.com/groutr/19b9f6ba71686e683a30af3eecd3f895

In [15]: test(1000, 1000000) # len(leftseq) = 1000, len(rightseq)=1000000
Inner Join (20% overlap)
665 ms ± 2.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
424 ms ± 1.25 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
219 ms ± 4.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
213 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Right Join (20% overlap)
775 ms ± 3.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
543 ms ± 9.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
327 ms ± 640 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
328 ms ± 5.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Left Join (20% overlap)
665 ms ± 7.36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
427 ms ± 6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
429 ms ± 6.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
426 ms ± 4.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Full Join (20% overlap)
775 ms ± 4.06 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
550 ms ± 19.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
577 ms ± 84 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
539 ms ± 5.21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Inner Join (50% overlap)
662 ms ± 3.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
430 ms ± 12.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
216 ms ± 461 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
213 ms ± 919 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Right Join (50% overlap)
788 ms ± 13.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
537 ms ± 2.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
327 ms ± 2.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
325 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Left Join (50% overlap)
741 ms ± 24.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
448 ms ± 9.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
474 ms ± 9.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
433 ms ± 9.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Full Join (50% overlap)
802 ms ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
567 ms ± 20.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
598 ms ± 21.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
559 ms ± 13.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Inner Join (99% overlap)
691 ms ± 18.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
444 ms ± 27.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
231 ms ± 5.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
226 ms ± 4.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Right Join (99% overlap)
826 ms ± 25.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
549 ms ± 11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
336 ms ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
354 ms ± 12.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Left Join (99% overlap)
682 ms ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
453 ms ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
451 ms ± 25.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
432 ms ± 7.76 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Full Join (99% overlap)
825 ms ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
570 ms ± 27.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
561 ms ± 24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
566 ms ± 20.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@eriknw
Copy link
Member

eriknw commented Nov 8, 2018

This looks great, thanks again for all your attention recently @groutr! I'm back home now and more or less caught up with things (it's amazing how disruptive >18000 miles of road trips are), so I'll be giving toolz much more attention than I have been.

@groutr
Copy link
Contributor Author

groutr commented Nov 8, 2018

@eriknw Glad to hear you're back safe. I should probably clean up these commits and open a new PR. The commit history in this PR is somewhat messy.

I would like to know your thoughts on the line between clear/easy-to-read and performance. I wasn't sure if I was crossing that line by treating individual cases separately (and having some duplicated code). I settled on a variation of join4. I think I like having each case wrapped in its own generator, even though it duplicates some code. When profiling/debugging, it becomes very clear what join is happening because all the action happens in a function named after the type of join. We could move the generators out of the join function so they are defined at import time and can be called explicitly, but still not part of the public API.

@groutr groutr mentioned this pull request Nov 13, 2018
@eriknw
Copy link
Member

eriknw commented Nov 13, 2018

I think this looks pretty good.

Generally speaking, cytoolz is where the dirtiest, ugliest optimizations go, but we still care about performance of toolz. We care about readability as well, so I prefer this version over #429, because having all the code in a single function body is much more readable imho (b/c one can do toolz.join?? in IPython). I actually don't mind having these four cases split out explicitly.

@groutr
Copy link
Contributor Author

groutr commented Dec 18, 2018

@eriknw I think is ready for another review.

@eriknw
Copy link
Member

eriknw commented Jun 22, 2019

LGTM. Thanks again @groutr, and sorry again for the long delay. Pretty nice performance gain here!

@eriknw eriknw merged commit 2ac03ce into pytoolz:master Jun 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants