MicrobeCensus crashes, "ValueError: Sequence and quality captions differ." #4

taltman · 2015-04-27T03:24:06Z

I let MC decide the file type and the FASTQ quality score encoding, so not sure how this happened.

Any help in figuring this out would be great. Thanks!

taltman1@corn02:/dev/shm/taltman1_tmp/MicrobeCensus$ time run_microbe_census.py -n 40711 -l 500 -t 16 my.fastq test.out
Traceback (most recent call last):
File "/afs/ir/users/t/a/taltman1/farmshare/third-party/bin/MicrobeCensus/MicrobeCensus-1.0.3/scripts/run_microbe_census.py", line 48, in
est_ags, args = microbe_census.run_pipeline(args)
File "/afs/ir/users/t/a/taltman1/farmshare/third-party/bin/MicrobeCensus/MicrobeCensus-1.0.3/microbe_census/microbe_census.py", line 480, in run_pipeline
process_seqfile(args, paths)
File "/afs/ir/users/t/a/taltman1/farmshare/third-party/bin/MicrobeCensus/MicrobeCensus-1.0.3/microbe_census/microbe_census.py", line 273, in process_seqfile
for rec in parse(open_file(args['seqfile']), args['fastq_format'] if args['file_type'] == 'fastq' else 'fasta'):
File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/init.py", line 582, in parse
for r in i:
File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/QualityIO.py", line 1033, in FastqPhredIterator
for title_line, seq_string, quality_string in FastqGeneralIterator(handle):
File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/QualityIO.py", line 922, in FastqGeneralIterator
raise ValueError("Sequence and quality captions differ.")
ValueError: Sequence and quality captions differ.

real 0m9.063s
user 0m8.930s
sys 0m0.169s

snayfach · 2015-04-27T03:36:00Z

Could you send me a test input file that generates this error?

Thanks!

On Sun, Apr 26, 2015 at 8:24 PM, Tomer Altman notifications@github.com
wrote:

I let MC decide the file type and the FASTQ quality score encoding, so not
sure how this happened.

Any help in figuring this out would be great. Thanks!

taltman1@corn02:/dev/shm/taltman1_tmp/MicrobeCensus$ time
run_microbe_census.py -n 40711 -l 500 -t 16 my.fastq test.out
Traceback (most recent call last):
File
"/afs/ir/users/t/a/taltman1/farmshare/third-party/bin/MicrobeCensus/MicrobeCensus-1.0.3/scripts/run_microbe_census.py",
line 48, in
est_ags, args = microbe_census.run_pipeline(args)
File
"/afs/ir/users/t/a/taltman1/farmshare/third-party/bin/MicrobeCensus/MicrobeCensus-1.0.3/microbe_census/microbe_census.py",
line 480, in run_pipeline
process_seqfile(args, paths)
File
"/afs/ir/users/t/a/taltman1/farmshare/third-party/bin/MicrobeCensus/MicrobeCensus-1.0.3/microbe_census/microbe_census.py",
line 273, in process_seqfile
for rec in parse(open_file(args['seqfile']), args['fastq_format'] if
args['file_type'] == 'fastq' else 'fasta'):
File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/init.py", line 582, in
parse
for r in i:
File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/QualityIO.py", line 1033,
in FastqPhredIterator
for title_line, seq_string, quality_string in FastqGeneralIterator(handle):
File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/QualityIO.py", line 922,
in FastqGeneralIterator
raise ValueError("Sequence and quality captions differ.")
ValueError: Sequence and quality captions differ.

real 0m9.063s
user 0m8.930s
sys 0m0.169s

—
Reply to this email directly or view it on GitHub
#4.

snayfach · 2015-04-27T04:40:25Z

Here is the offending sequence in your dataset:

@SRR172902.422002
NCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+SRR172902.422002 ltrim=1
#.4455543555555555555654554455545554445555334346344554445555555556555555555

The quality and sequence headers must be the same, otherwise BioPython throws an error.

taltman · 2015-04-27T04:48:12Z

Well, the same, modulo the first char, right? :-)

Thanks for catching this. I will pass along the error to the SPAdes team, as I used their read corrector for trimming. Though, based on the looks of that read, I'm having my doubts...

taltman · 2015-04-27T04:48:51Z

The odd thing is that I ran this file through DIAMOND as well, without any complaints. I guess the BioPython parser is strict.

snayfach · 2015-04-27T04:55:17Z

I've modified the code so that this should no longer be an issue. Could you try pulling the latest code?

taltman · 2015-04-27T04:55:31Z

Not that Wikipedia is authoritative, but:

https://en.wikipedia.org/wiki/FASTQ_format

"Line 3 begins with a '+' character and is optionally followed by the same sequence identifier (and any description) again."

It indeed looks like the BioPython parser is needlessly strict.

taltman · 2015-04-27T04:56:41Z

I installed from the tarball rather than cloning. I'll try cloning now.

taltman · 2015-04-27T05:03:14Z

Now I get this error, with exit status 1, and no output:

Warning: sequence record could not be parsed from input file. Skipping...
Error! No reads remaining after filtering!

snayfach · 2015-04-27T05:05:06Z

What command did you use to run it? Did you use defaults?

On Sun, Apr 26, 2015 at 10:03 PM, Tomer Altman notifications@github.com
wrote:

Now I get this error, with exit status 1, and no output:

Warning: sequence record could not be parsed from input file. Skipping...
Error! No reads remaining after filtering!

—
Reply to this email directly or view it on GitHub
#4 (comment)
.

taltman · 2015-04-27T05:07:44Z

Exact same as in original post. No changes.

taltman · 2015-04-27T05:09:11Z

Does it run to completion for you?

snayfach · 2015-04-27T05:13:19Z

You've specified 500 bp reads (-l 500), but the input file contains only short reads. If you remove -l 500, and let MicrobeCensus pick the read length to use, it should work.

Also, you specified 40,711 reads, but in general you will need more reads than this to get an accurate estimate of AGS. I'd suggest at least 500,000. But I can understand using fewer reads just for testing.

If you try running the program again using default parameters (at least removing -l 500) it should run to completion.

taltman · 2015-04-27T05:19:38Z

I specified -l 500, because my reads have already been trimmed, and I'd rather not have MicrobeCensus re-trim my trimmed reads. I read the option documentation as meaning: any reads longer than 500 will be trimmed to 500. Is there a different way to achieve this?

As for the low # of reads, that was a mistake. Sorry to bother you.

taltman · 2015-04-27T05:26:02Z

I can confirm that the program now works. Excellent!

I did get this line in the terminal, though:
Warning: sequence record could not be parsed from input file. Skipping...
Not exactly sure what that means. Might be helpful to specify the line number for the offending input, along with the FAS{A|Q} identifier.

snayfach · 2015-04-27T05:26:17Z

MicrobeCensus trims reads to a uniform length, because it uses read-length specific parameters when estimating AGS. The documentation should read: all reads are trimmed to this length, and reads shorter than this length are discarded.

You can use the verbose flag (-v) to get a better sense of what the software is actually doing at each step. It might help things make more sense.

snayfach · 2015-04-27T05:28:05Z

Thanks for the advice! I'll add that.

Incremented version to 1.1.0 New sequence parser -avoids using BioPython, which was slow and threw errors -solves Issue #4: #4 -improved detection of quality score encoding New command line options -added option '-r' to specify external RAPsearch v2.15 binary -removed options '-f' and '-c'; file formats and quality encodings are now always auto-detected -fixed option '-e' for just estimating AGS

snayfach · 2016-12-20T20:25:02Z

I've finally fixed this issue in MicrobeCensus (v1.1.0). The program should no longer crash when sequence and quality captions differ.

taltman closed this as completed Apr 27, 2015

taltman reopened this Apr 27, 2015

snayfach closed this as completed Oct 1, 2015

finchnSNPs mentioned this issue Oct 10, 2018

ValueError: Sequence and quality captions differ mossmatters/HybPiper#40

Closed

Magdoll mentioned this issue Apr 8, 2019

ValueError: Sequence and quality captions differ. Magdoll/cDNA_Cupcake#62

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MicrobeCensus crashes, "ValueError: Sequence and quality captions differ." #4

MicrobeCensus crashes, "ValueError: Sequence and quality captions differ." #4

taltman commented Apr 27, 2015

snayfach commented Apr 27, 2015

snayfach commented Apr 27, 2015

taltman commented Apr 27, 2015

taltman commented Apr 27, 2015

snayfach commented Apr 27, 2015

taltman commented Apr 27, 2015

taltman commented Apr 27, 2015

taltman commented Apr 27, 2015

snayfach commented Apr 27, 2015

taltman commented Apr 27, 2015

taltman commented Apr 27, 2015

snayfach commented Apr 27, 2015

taltman commented Apr 27, 2015

taltman commented Apr 27, 2015

snayfach commented Apr 27, 2015

snayfach commented Apr 27, 2015

snayfach commented Dec 20, 2016

MicrobeCensus crashes, "ValueError: Sequence and quality captions differ." #4

MicrobeCensus crashes, "ValueError: Sequence and quality captions differ." #4

Comments

taltman commented Apr 27, 2015

snayfach commented Apr 27, 2015

snayfach commented Apr 27, 2015

taltman commented Apr 27, 2015

taltman commented Apr 27, 2015

snayfach commented Apr 27, 2015

taltman commented Apr 27, 2015

taltman commented Apr 27, 2015

taltman commented Apr 27, 2015

snayfach commented Apr 27, 2015

taltman commented Apr 27, 2015

taltman commented Apr 27, 2015

snayfach commented Apr 27, 2015

taltman commented Apr 27, 2015

taltman commented Apr 27, 2015

snayfach commented Apr 27, 2015

snayfach commented Apr 27, 2015

snayfach commented Dec 20, 2016