Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement reverse complement. #65

Merged
merged 19 commits into from
Mar 25, 2022

Conversation

rhpvorderman
Copy link
Collaborator

@rhpvorderman rhpvorderman commented Mar 23, 2022

Testing is not as extensive as it could be. Not all possible conversions are checked. Now added, thanks to @fjossandon.

Since I have to create a lot of conversion tables for htspy already I had some code at hand for generating the conversion table. I copied it into this project. The original is here: https://github.com/rhpvorderman/htspy/blob/develop/generate_conversion_tables.py
This fixes #64

@fjossandon
Copy link

Hello @rhpvorderman ,
I'm very interested in this feature, so I was checking out the nucleotide_complements_table, and found that there are 3 ambiguous letters missing:
S (Strong 3H bonds, C & G) -> S
W (Weak 2H bonds, A & T) -> W
N (aNy) -> N
On these 3 cases, the complementary letter is the same than the original. you can also check it out in this other site: https://arep.med.harvard.edu/labgc/adnan/projects/Utilities/revcomp.html

Best regards,

@rhpvorderman
Copy link
Collaborator Author

rhpvorderman commented Mar 23, 2022

On these 3 cases, the complementary letter is the same than the original.

So there is no need to convert them. For all undefined cases, the table returns the same character. I can add a comment in the conversion table code.

EDIT: To eloborate further: in C a char is a 1-byte signed integer from -128 to 127. A quoted literal character such as 'C' is automatically converted to such an integer in the source code. Since all the other characters do not change they are left at their number value.

@fjossandon
Copy link

Excellent! Thanks for the clarification.

@fjossandon
Copy link

fjossandon commented Mar 24, 2022

Hello again @rhpvorderman ,
I wanted to ask if you can include these tests that uses all the nucleotide valid letters for completeness sake =)

    def test_reverse_complement(self):
        assert SequenceRecord("name1",
                              "ACGTUMRWSYKVHDBNacgtumrwsykvhdbn",
                              "/AAAA/6E/EEEEEEEEEEEE/EEEEA///E/"
                              ).reverse_complement() == \
               SequenceRecord("name1",
                              "nvhdbmrswykaacgtNVHDBMRSWYKAACGT",
                              "/E///AEEEE/EEEEEEEEEEEE/E6/AAAA/")
        
        
    def test_reverse_complement(self):
        assert BytesSequenceRecord(b"name1",
                                   b"ACGTUMRWSYKVHDBNacgtumrwsykvhdbn",
                                   b"/AAAA/6E/EEEEEEEEEEEE/EEEEA///E/"
                                   ).reverse_complement() == \
               BytesSequenceRecord(b"name1",
                                   b"nvhdbmrswykaacgtNVHDBMRSWYKAACGT",
                                   b"/E///AEEEE/EEEEEEEEEEEE/E6/AAAA/")

@rhpvorderman
Copy link
Collaborator Author

Thanks! That saved me quite some work!

Copy link
Owner

@marcelm marcelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I thought you had limited time, though ... :-P

Feel free to ignore my comments on generate_conversion_tables.py.

src/dnaio/_core.pyx Outdated Show resolved Hide resolved
src/dnaio/_core.pyx Outdated Show resolved Hide resolved
src/dnaio/_core.pyx Outdated Show resolved Hide resolved
src/dnaio/_core.pyx Outdated Show resolved Hide resolved
src/dnaio/_conversions.h Show resolved Hide resolved
generate_conversion_tables.py Outdated Show resolved Hide resolved
generate_conversion_tables.py Outdated Show resolved Hide resolved
generate_conversion_tables.py Outdated Show resolved Hide resolved
generate_conversion_tables.py Outdated Show resolved Hide resolved
generate_conversion_tables.py Outdated Show resolved Hide resolved
@rhpvorderman
Copy link
Collaborator Author

@marcelm , all done! Thanks for the review comments on the conversion script. It looks quite a it cleaner now.

Copy link
Owner

@marcelm marcelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! I found two more small things. Can you fix that as well? Feel free to merge yourself when you have done so.

Shall I make a new release after this? I think version 0.8.0 would be appropriate.

src/dnaio/_core.pyx Show resolved Hide resolved
SequenceRecord("name1",
"nvhdbmrswykaacgtNVHDBMRSWYKAACGT",
"/E///AEEEE/EEEEEEEEEEEE/E6/AAAA/")

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized you need to test whether this works:

SequenceRecord("name", "AACT", None).reverse_complement() == SequenceRecord("name", "AGTT", None)

Copy link
Collaborator Author

@rhpvorderman rhpvorderman Mar 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify? Isn't that already tested? Edit. I see now. None qualities. Didn't think of that, will do.

@rhpvorderman
Copy link
Collaborator Author

Shall I make a new release after this? I think version 0.8.0 would be appropriate.

Please let me remove BytesSequenceRecord first (#66). I won't delay. I promise :).

rhpvorderman and others added 2 commits March 25, 2022 19:52
Co-authored-by: Marcel Martin <mail@marcelm.net>
@rhpvorderman rhpvorderman merged commit de00f57 into marcelm:main Mar 25, 2022
@rhpvorderman rhpvorderman deleted the reverse_complement branch March 25, 2022 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reverse complement
3 participants