Skip to content

Use zip_longest for more readable and faster code in the paired end iterator#21

Merged
marcelm merged 3 commits into
marcelm:mainfrom
rhpvorderman:pairediter
Dec 16, 2021
Merged

Use zip_longest for more readable and faster code in the paired end iterator#21
marcelm merged 3 commits into
marcelm:mainfrom
rhpvorderman:pairediter

Conversation

@rhpvorderman
Copy link
Copy Markdown
Collaborator

Sorry for the PR spam. I just suddenly find so much stuff to optimize in this library. And even more is coming, but that depends on still pending PRs #14 and #20. (I want to make writing faster still, by avoiding creating new bytes objects).

This is an easy one to review. I know you are busy!

Ideally we would want to use zip(it1, it2, strict=True), but that is only available from python 3.10 and higher. (3.9 on my Debian 11 machine).

Since try ... except has some expense I tried using next differently. It has a default argument. So you can use next(it1, None) and it will yield None if the iterator is exhausted. This leads to a cleaner looking code when checking.

I had a look at itertools if there was something in the standard library, and yes there was: zip_longest. Keeps on yielding until the longest iterator is exhausted and yields None for the shortest iterator.

The code looks cleaner now now all the exhaustion checking is in zip_longest instead of in try except loops. As an added bonus, itertools is fully implemented in C so it is faster as well:

before:

Benchmark #1: python dnaio_read_paired.py ~/test/big2.fastq
  Time (mean ± σ):      4.884 s ±  0.031 s    [User: 4.667 s, System: 0.216 s]
  Range (min … max):    4.826 s …  4.919 s    10 runs

after

Benchmark #1: python dnaio_read_paired.py ~/test/big2.fastq
  Time (mean ± σ):      4.746 s ±  0.049 s    [User: 4.540 s, System: 0.206 s]
  Range (min … max):    4.691 s …  4.835 s    10 runs

It is 2-3% faster. Not bad for such a small change.

Comment thread src/dnaio/pairedend.py Outdated
Comment thread src/dnaio/pairedend.py Outdated
@marcelm
Copy link
Copy Markdown
Owner

marcelm commented Dec 16, 2021

Good improvement, thanks once again! I think the comment at the top of __iter__ is a bit verbose, but let’s leave it as it is for now.

@marcelm marcelm merged commit f4773dc into marcelm:main Dec 16, 2021
@rhpvorderman rhpvorderman deleted the pairediter branch December 16, 2021 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants