Remove BytesSequenceRecord #66

rhpvorderman · 2022-03-25T08:36:24Z

When I put BytesSequenceRecord in there were two major reasons:

PyBytes_FromStringAndSize was much faster than PyUnicode_DecodeASCII
Retrieving pointers from strings was deemed difficult, as strings do not have the buffer protocol.

I think both issues are now gone.

ASCII checking the entire buffer and then using PyUnicode_New(..., 127) is only slightly slower than PyBytes_FromStringAndSize. (!0%)
Retrieving pointers from strings can be done very fast with PyUnicode_DATA. Since SequenceRecord ensures that strings can never be anything else than ASCII this is as fast as PyBytes_AS_STRING.

On top of that, strings are more useful than bytes. Names should be strings. Sequences of nucleotides work more intuitive as strings. And qualities, are phred scores. These are an ASCII representation of the proper score and thus work best as strings.

I was working on #65 when I realised that BytesSequenceRecord is now just a maintenance burden at this point.

marcelm · 2022-03-25T12:17:54Z

I’m glad this can be dropped. It’s good you noticed this before 1.0! This will also simplify the documentation a bit.

rhpvorderman · 2022-03-25T12:55:14Z

Glad you agree. In retrospect I should never have put it there, but there we go. "Voortschrijdend inzicht" as it is called in Dutch.

This was referenced Mar 25, 2022

Implement reverse complement. #65

Merged

Remove BytesSequenceRecord #67

Merged

marcelm closed this as completed in #67 Mar 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove BytesSequenceRecord #66

Remove BytesSequenceRecord #66

rhpvorderman commented Mar 25, 2022

marcelm commented Mar 25, 2022

rhpvorderman commented Mar 25, 2022

Remove BytesSequenceRecord #66

Remove BytesSequenceRecord #66

Comments

rhpvorderman commented Mar 25, 2022

marcelm commented Mar 25, 2022

rhpvorderman commented Mar 25, 2022