-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MicrobeCensus crashes, "ValueError: Sequence and quality captions differ." #4
Comments
Could you send me a test input file that generates this error? Thanks! On Sun, Apr 26, 2015 at 8:24 PM, Tomer Altman notifications@github.com
|
Here is the offending sequence in your dataset: @SRR172902.422002 The quality and sequence headers must be the same, otherwise BioPython throws an error. |
Well, the same, modulo the first char, right? :-) Thanks for catching this. I will pass along the error to the SPAdes team, as I used their read corrector for trimming. Though, based on the looks of that read, I'm having my doubts... |
The odd thing is that I ran this file through DIAMOND as well, without any complaints. I guess the BioPython parser is strict. |
I've modified the code so that this should no longer be an issue. Could you try pulling the latest code? |
Not that Wikipedia is authoritative, but: https://en.wikipedia.org/wiki/FASTQ_format "Line 3 begins with a '+' character and is optionally followed by the same sequence identifier (and any description) again." It indeed looks like the BioPython parser is needlessly strict. |
I installed from the tarball rather than cloning. I'll try cloning now. |
Now I get this error, with exit status 1, and no output:
|
What command did you use to run it? Did you use defaults? On Sun, Apr 26, 2015 at 10:03 PM, Tomer Altman notifications@github.com
|
Exact same as in original post. No changes. |
Does it run to completion for you? |
You've specified 500 bp reads (-l 500), but the input file contains only short reads. If you remove -l 500, and let MicrobeCensus pick the read length to use, it should work. Also, you specified 40,711 reads, but in general you will need more reads than this to get an accurate estimate of AGS. I'd suggest at least 500,000. But I can understand using fewer reads just for testing. If you try running the program again using default parameters (at least removing -l 500) it should run to completion. |
I specified -l 500, because my reads have already been trimmed, and I'd rather not have MicrobeCensus re-trim my trimmed reads. I read the option documentation as meaning: any reads longer than 500 will be trimmed to 500. Is there a different way to achieve this? As for the low # of reads, that was a mistake. Sorry to bother you. |
I can confirm that the program now works. Excellent! I did get this line in the terminal, though: |
MicrobeCensus trims reads to a uniform length, because it uses read-length specific parameters when estimating AGS. The documentation should read: all reads are trimmed to this length, and reads shorter than this length are discarded. You can use the verbose flag (-v) to get a better sense of what the software is actually doing at each step. It might help things make more sense. |
Thanks for the advice! I'll add that. |
Incremented version to 1.1.0 New sequence parser -avoids using BioPython, which was slow and threw errors -solves Issue #4: #4 -improved detection of quality score encoding New command line options -added option '-r' to specify external RAPsearch v2.15 binary -removed options '-f' and '-c'; file formats and quality encodings are now always auto-detected -fixed option '-e' for just estimating AGS
I've finally fixed this issue in MicrobeCensus (v1.1.0). The program should no longer crash when sequence and quality captions differ. |
I let MC decide the file type and the FASTQ quality score encoding, so not sure how this happened.
Any help in figuring this out would be great. Thanks!
taltman1@corn02:/dev/shm/taltman1_tmp/MicrobeCensus$ time run_microbe_census.py -n 40711 -l 500 -t 16 my.fastq test.out
Traceback (most recent call last):
File "/afs/ir/users/t/a/taltman1/farmshare/third-party/bin/MicrobeCensus/MicrobeCensus-1.0.3/scripts/run_microbe_census.py", line 48, in
est_ags, args = microbe_census.run_pipeline(args)
File "/afs/ir/users/t/a/taltman1/farmshare/third-party/bin/MicrobeCensus/MicrobeCensus-1.0.3/microbe_census/microbe_census.py", line 480, in run_pipeline
process_seqfile(args, paths)
File "/afs/ir/users/t/a/taltman1/farmshare/third-party/bin/MicrobeCensus/MicrobeCensus-1.0.3/microbe_census/microbe_census.py", line 273, in process_seqfile
for rec in parse(open_file(args['seqfile']), args['fastq_format'] if args['file_type'] == 'fastq' else 'fasta'):
File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/init.py", line 582, in parse
for r in i:
File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/QualityIO.py", line 1033, in FastqPhredIterator
for title_line, seq_string, quality_string in FastqGeneralIterator(handle):
File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/QualityIO.py", line 922, in FastqGeneralIterator
raise ValueError("Sequence and quality captions differ.")
ValueError: Sequence and quality captions differ.
real 0m9.063s
user 0m8.930s
sys 0m0.169s
The text was updated successfully, but these errors were encountered: