Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error indexing FASTQ; are there header length or character limitations? #13

Closed
nextgenusfs opened this issue Apr 13, 2020 · 4 comments
Closed

Comments

@nextgenusfs
Copy link

Hello @lmdu, thanks for your tool. I'm looking for fast random access to both fasta and fastq files, therefore I'm giving pyfastx a try. I'm running into the following issue with indexing FASTQ file, I'm guessing its either related to length or perhaps the brackets in the header??

>>> import pyfastx
>>> fa = pyfastx.Fasta('test.fasta')
>>> fa[1]
<Sequence> MK674167.1 with length of 1328
>>> fa['MK674167.1']
<Sequence> MK674167.1 with length of 1328
>>> fa['MK674167.1'].seq
'TAATAAGTGTTTTATGGCACTTTTTTAAATCCATATCCACCTTGTGTGCAATGTCAGGGTTGGTTTCTCTCTTTTGAGAGATCAACCCCAAACATCAACTCTATCTTAACTCTTTGTCTGAAAAATATTATGAATAAAACAATTCAAAATACAACTTTCAACAACGGATCTCTTGGCTCTCGCATCGATGAAGAACGCAGCGAAATGCGATACGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCATATTGCGCTCTTTGGTATTCCGAAGAGCATGCTTGTTTGAGTATCAGTAAACACCTCAAAGCCTTTTTTCTTTTTTGAAGAAAGGCTTTGGACTTGAGCAATCCCAACACCAATCTTTTAAAGAGAGGGGGCGGGTTGCTTGAAATGCAGGTGCAGCTGGACATTCTCCTGAGCTAAAAGCATATTCATTTAGTCCCGTCAAACGGATTATTACTTTTGCTGCAGCTAACATAAAGGGAGTTTGACCATTTTGGCTGACTGATGCAGGATTTTCACAAGAGTCTTCAAAACCTCTTGTTAAACTCGATCTCAAATCAAGTAAGACTACCCGCTGAACTTAAGCATATCAATAAGCGGAGGAAAAGAAACTAACAAGGATTCCCCTAGTAACGGCGAGTGAAGCGGGATGAGCTCAAATTTAAAATCTGGCTGGTTTACTAGTCCGAATTGTAGTTTATAGAAGCGTTTTCGGTGGCAGCCTGGGTATAAATCCTTTGGAATAGGGTATCATAGAGGGTGAGAATCCCGTCTTTGACTCAGTGCATGCTACTATGTGATACGTTTTCAAAGAGTCGAGTTGTTTGGGAATGCAGCTCAAAATGGGTGGTAAATTTCATCTAAAGCTAAATATTGGCGAGAGACCGATAGCGAACAAGTACCGTGAGGGAAAGATGAAAAGAACTTTGAAAAGAGAGTTAAACAGTGCGTGAAATTGTTGAAAGGGAAACGCTTGACACCAGTCATGCGAGTGGAAAATCAGTCTTTTGCAATGGGGAGTTGTGGGCGTTCAGACCGCAAGGCTGGCGTTTGCTTCATCTTTGTTGTAAGTGATGCACTTTTTCATTTGCAGGTCAACATCAGTTTGCTTTGCTGGACAAAACCCCAAGGGAAGGTGGCAGCTTAGGCTGTGTTATAGCCCTGGGGCGATACAGTGGAGTGGACTGAGGTTTTCGCAGTGTGTGCTCTCTGGGCAAGGCTGACTGGGTGCTATGGGATCGTTCGGCGTACAATGCATGCATTTTGCGTCGTGTCTTTTTCATACTCGCTCAACTCGGCTCTTCCACACTT'
>>> fq = pyfastx.Fastq('test.fastq')
>>> fq[1]
<Read> 2;barcodelabel=SRR10121280;flow_cell_id=unknown;len=1429;avgqual=7.49;primers=[71,1319];orient=minus with length of 1429
>>> fq['2;barcodelabel=SRR10121280;flow_cell_id=unknown;len=1429;avgqual=7.49;primers=[71,1319];orient=minus']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: '2;barcodelabel=SRR10121280;flow_cell_id=unknown;len=1429;avgqual=7.49;primers=[71,1319];orient=minus does not exist in fastq file'
>>> 
@lmdu
Copy link
Owner

lmdu commented Apr 13, 2020

Thank you for reporting. Could you send me your test file and operating system information. I replaced read header in my fastq test file with your header name. It worked well.

@nextgenusfs
Copy link
Author

Thanks @lmdu for speedy reply. I'm on Mac OS using python 3.6.

$ python
Python 3.6.10 | packaged by conda-forge | (default, Mar  5 2020, 09:56:10) 
[GCC Clang 9.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyfastx
>>> fq = pyfastx.Fastq('test.fastq.gz')
>>> len(fq)
10
>>> fq[4]
<Read> 5;barcodelabel=SRR10121280;flow_cell_id=unknown;len=1497;avgqual=7.02;primers=[107,1406];orient=plus with length of 1497
>>> fq['5;barcodelabel=SRR10121280;flow_cell_id=unknown;len=1497;avgqual=7.02;primers=[107,1406];orient=plus']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: '5;barcodelabel=SRR10121280;flow_cell_id=unknown;len=1497;avgqual=7.02;primers=[107,1406];orient=plus does not exist in fastq file'
>>> 

test.fastq.gz

@lmdu
Copy link
Owner

lmdu commented Apr 22, 2020

A new version 0.6.10 was released to fix this bug. Hope it can help you!

@nextgenusfs
Copy link
Author

Fantastic thank you!

@lmdu lmdu closed this as completed Apr 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants