You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I put BytesSequenceRecord in there were two major reasons:
PyBytes_FromStringAndSize was much faster than PyUnicode_DecodeASCII
Retrieving pointers from strings was deemed difficult, as strings do not have the buffer protocol.
I think both issues are now gone.
ASCII checking the entire buffer and then using PyUnicode_New(..., 127) is only slightly slower than PyBytes_FromStringAndSize. (!0%)
Retrieving pointers from strings can be done very fast with PyUnicode_DATA. Since SequenceRecord ensures that strings can never be anything else than ASCII this is as fast as PyBytes_AS_STRING.
On top of that, strings are more useful than bytes. Names should be strings. Sequences of nucleotides work more intuitive as strings. And qualities, are phred scores. These are an ASCII representation of the proper score and thus work best as strings.
I was working on #65 when I realised that BytesSequenceRecord is now just a maintenance burden at this point.
The text was updated successfully, but these errors were encountered:
When I put BytesSequenceRecord in there were two major reasons:
I think both issues are now gone.
On top of that, strings are more useful than bytes. Names should be strings. Sequences of nucleotides work more intuitive as strings. And qualities, are phred scores. These are an ASCII representation of the proper score and thus work best as strings.
I was working on #65 when I realised that BytesSequenceRecord is now just a maintenance burden at this point.
The text was updated successfully, but these errors were encountered: