Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FASTQ file ended prematurely #291

Closed
dk267 opened this issue Feb 19, 2018 · 7 comments

Comments

@dk267
Copy link

commented Feb 19, 2018

I am using the cutadapt plugin in QIIME2 to trim primers from paired end reads. For most samples it works fine, but one sample is returning an error and I can't seem to track down the source of the problem. Any help would be appreciated.

Below is the command in QIIME2 along with resulting output:
qiime cutadapt trim-paired
--i-demultiplexed-sequences ginseng/untrimmed/all-bacteria.qza
--p-front-f GGACTACHVGGGTWTCTAAT
--p-front-r GTGCCAGCMGCCGCGGTAA
--p-cores 2
--output-dir ginseng/trimmed/all-bacteria

ERROR: Traceback (most recent call last):
File "/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/cutadapt/pipeline.py", line 454, in run
(n, bp1, bp2) = self._pipeline.process_reads()
File "/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/cutadapt/pipeline.py", line 282, in process_reads
for read1, read2 in self._reader:
File "/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/cutadapt/seqio.py", line 414, in iter
r2 = next(it2)
File "src/cutadapt/_seqio.pyx", line 234, in iter (cd ~_seqio.c:5816)
cutadapt.seqio.FormatError: FASTQ file ended prematurely

cutadapt: error: FASTQ file ended prematurely
Traceback (most recent call last):
File "/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/q2cli/commands.py", line 224, in call
results = action(**arguments)
File "", line 2, in trim_paired
File "/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/qiime2/sdk/action.py", line 228, in bound_callable
output_types, provenance)
File "/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/qiime2/sdk/action.py", line 363, in callable_executor
output_views = self._callable(**view_args)
File "/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/q2_cutadapt/_trim.py", line 172, in trim_paired
run_commands(cmds)
File "/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/q2_cutadapt/_trim.py", line 28, in run_commands
subprocess.run(cmd, check=True)
File "/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/subprocess.py", line 398, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['cutadapt', '--cores', '2', '--error-rate', '0.1', '--times', '1', '--overlap', '3', '-o', '/tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-q57zg2fe/P91_Y16_B_284_L001_R1_001.fastq.gz', '-p', '/tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-q57zg2fe/P91_Y16_B_285_L001_R2_001.fastq.gz', '--front', 'GGACTACHVGGGTWTCTAAT', '-G', 'GTGCCAGCMGCCGCGGTAA', '/tmp/qiime2-archive-kptfm8np/464bb15a-a936-48d4-8420-6f1249e567f9/data/P91_Y16_B_284_L001_R1_001.fastq.gz', '/tmp/qiime2-archive-kptfm8np/464bb15a-a936-48d4-8420-6f1249e567f9/data/P91_Y16_B_285_L001_R2_001.fastq.gz']' returned non-zero exit status 1

Plugin error from cutadapt:

Command '['cutadapt', '--cores', '2', '--error-rate', '0.1', '--times', '1', '--overlap', '3', '-o', '/tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-q57zg2fe/P91_Y16_B_284_L001_R1_001.fastq.gz', '-p', '/tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-q57zg2fe/P91_Y16_B_285_L001_R2_001.fastq.gz', '--front', 'GGACTACHVGGGTWTCTAAT', '-G', 'GTGCCAGCMGCCGCGGTAA', '/tmp/qiime2-archive-kptfm8np/464bb15a-a936-48d4-8420-6f1249e567f9/data/P91_Y16_B_284_L001_R1_001.fastq.gz', '/tmp/qiime2-archive-kptfm8np/464bb15a-a936-48d4-8420-6f1249e567f9/data/P91_Y16_B_285_L001_R2_001.fastq.gz']' returned non-zero exit status 1

See above for debug info.

@marcelm

This comment has been minimized.

Copy link
Owner

commented Feb 19, 2018

I will need to make the error message better, but from what I can tell, the problem is that the second FASTQ file P91_Y16_B_285_L001_R2_001.fastq.gz contains fewer reads than the first file or possibly an incomplete record.

Are you sure that P91_Y16_B_284_L001_R1_001.fastq.gz and P91_Y16_B_285_L001_R2_001.fastq.gz belong to the same dataset? Note 284 in the first file name vs 285 in the second file name.

@dk267

This comment has been minimized.

Copy link
Author

commented Feb 19, 2018

Both files are from the same dataset, each containing 42275 reads.

The last four lines of P91_Y16_B_284_L001_R1_001.fastq.gz are:
@M04026:231:000000000-BKHCL:1:2119:12135:25102 1:N:0:GCATACAG+TGTTCCGT
GGACTACCGGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGCACCTCAGCGTCAGTACCGGGCCAGTGAGCCGCCTTCGCCACTGGTGTTCTTGCGAATATCTACGAATTTCACCTCTACACTCGCAGTTCCACTCACCTCTCCCGGACTCGAGCTTTCCAGTATCGAAGGCAGTTCTGGAGTTGAGCTCCAGGATTTCACCCCCGACTTAGTTGGCCACCTACGAGCCCTTTACGCCCAGAAATTCCGAACAACGCTTTCCCCCCCCGTATTCCCGCGGCTGCGGGCACG
+
FGCDFFGGGGGEGGGAFC6EFCGCFGGCGGGF@@CFGGGGGGGGGGCFFGGGG7EEGGGG9EGGGGEFGFGFEEGC>FFFGC?FGFE8A?FGGFGGC@++@<FFFEGGGGGGGFGGGGGGGFCGGGFCF@FGGFGGGGGGEGGFG8FE@8C@@FCG7FF9AF=CGGGG*:@CCCC76DFGGGDFGGGGGGFFFGCF*?CGGFFGF8::8EGGGGF948B@FFFFGG3CDC7;F?CC@EE>=C***/2:6**1)4;35)<+9:071()1.7?)14-39(-,-2(((,-((

And the last four lines of P91_Y16_B_285_L001_R2_001.fastq.gz are:
@M04026:231:000000000-BKHCL:1:2119:12135:25102 2:N:0:GCATACAG+TGTTCCGT
GTGCCAGCAGCCGCGGTAATACGGGGGGGGCAAGCGTTGTTCGGAATTACTGGGCGTAAAGGGCTCGTAGGTGGCCAACTAAGTCGGGGGTGAAATCCTGGAGCTCAACTCCAGAACTGCCTTCGATACTGGAAAGCTCGAGTCCGGGAGAGGGGATTGGAACTGCGAGGGTAGAGGTGAAATTCGTAGTTATTCGCAAGAAAACCAGTGGCGAAGGGGGCTCACTGGCCGGGAACGGACGTCAGNGNGNNGNNNGNNGGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNT
+
GFFFGFGDGGGGGGGGGDCAFGGGGGGGGGGGGCEEEEEGFFGGG5EGGFF9FFGDGED>CFGGG;:>?FE9E@EFFF>EGG6E7?8EGC2AE?<FFCG888CEFFGCFGCFG,@CCFFGGF>8,/8+<C?C:8,3<8**<CFEC?/;85:C;CG?+0A?52+1<C31:C:CGF=87:+<9CCFEC:C**;AECGGF75288C87DD357>CC65>)))),(()(--0()#(#0#####(##/2;################################.

@marcelm

This comment has been minimized.

Copy link
Owner

commented Feb 19, 2018

Yes, I should have remembered that the error message is different when the files come from different datasets.

Can you make the entire dataset available to me, so I can try to reproduce the problem? Privately via e-mail to marcel.martin@scilifelab.se if you prefer. Cutadapt 1.15 splits the input FASTQ file into chunks so it can work on them in parallel. I wonder whether something goes wrong while creating those chunks.

If I cannot reproduce it, I’ll ask you to report this to the developers of the QIIME plugin.

@marcelm

This comment has been minimized.

Copy link
Owner

commented Feb 20, 2018

Thanks for sending the data. I can reproduce the problem locally. I’ll try to find out what is going wrong.

@marcelm marcelm closed this in a3c5c52 Feb 20, 2018

@marcelm

This comment has been minimized.

Copy link
Owner

commented Feb 21, 2018

Thanks a lot for reporting this; there was indeed a bug in the way in which paired-end FASTQ files are split into chunks. The last chunk of the R2 reads could under (rare) circumstances be incomplete. Fortunately, this problem is then caught by the code that parses the FASTQ file, which is why you would get the “FASTQ file ended prematurely” message (even though the file on disk is complete). (At least this is better than silently getting incorrect results.)

I’ll make a bugfix release as soon as possible.

@marcelm

This comment has been minimized.

Copy link
Owner

commented Feb 21, 2018

I’ve released cutadapt 1.16 with the fix.

@dk267

This comment has been minimized.

Copy link
Author

commented Feb 22, 2018

Wonderful. Thanks much for your quick work diagnosing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.