Add records_are_mates function#79
Conversation
marcelm
left a comment
There was a problem hiding this comment.
This looks mostly ok although I don’t like the code duplication between records_are_mates and record_ids_match that much. In theory, SequenceRecord.is_mate could just call records_are_mates, but I guess that adds some overhead, so I’m willing to accept the duplication if you think it’s worth it.
I have refactored to remove the duplication. Adding strcspn calls it is quite expensive as shown here: #22. I will perform some benchmarks about records_are_mates now and add the result later to this post. EDIT: Hmm. Records_are_mates barely justifies itself. There seems to be quite some overhead involved in Cython's parsing. It does seem that at the 4-way comparison records_are_mates will be the better choice. But that is not a use case now. I will check if I can scrape off some of the overhead. |
|
Trimmed some of the fat and the results are satisfactory: I benchmarked with this script. I did a reference leak check with sys.getrefcount, and everything works as expected. |
|
Super, thanks for the improvements, I’m happy with this now :-). Let me know if we should make a new release. |
|
Thanks! |
|
Version 0.9.0 is now available. |
|
Thank you! Always a pleasure working with you! |
|
I think it works well in practice. Thanks again for merging. |
Fixes #78
I simply added
records_are_mates(*args)because this is more suitable for the use case. I noted the most typical use case (in my opinion) in the example in the docstring. Having aSequenceRecord.are_mates(self, *args)function seemed a bit more clunky in the typical use case.records_are_mates(r1, r2, r3)vsr1.are_mates(r2,r3).This function accepts any number of SequenceRecords as long as it is 2 or larger. Immediately future proof for some other type of FASTQ files that might pop up later.
Cython does generate less optimal code here.
There are no keywords!
METH_VARARGSis sufficient. And then I can also not simply access the tuple with an index and expect this to become PyTuple_GET_ITEM. No.... The index has to be converted to a Python Integer first. So I try to use PyTuple_GET_ITEM myself, and I get these complaints aboutobjectvsPyObject *. There is no difference Cython, internally it is allPyObject *, you made that up yourself, and now you are complaining to me?! So I do some casts, but then I can not see clearly if Cython handles the references properly. Then I gave up. This should have been pretty straightforward as Python guarantees that any object passed as an argument has an increased reference count. So we can simply borrow the reference with PyTuple_GET_ITEM and don't worry about it.So the code is not as optimized as it could be (and as I wanted it to be). But it should be more optimal than two separate
is_matecalls. (Although I do not have time to benchmark that now).