Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError: CRC check failed 0x4088b1f != 0xaaf28074 #520

Closed
afaranda opened this issue Mar 23, 2021 · 9 comments
Closed

OSError: CRC check failed 0x4088b1f != 0xaaf28074 #520

afaranda opened this issue Mar 23, 2021 · 9 comments

Comments

@afaranda
Copy link

Cutadapt Version: 3.3
Python Version: 3.7.7

I just installed the latest version of cutadapt, and I'm getting weird error message. I've previously had no issues processing this file with cutadapt.

Cutadapt is installed in my home directory on a slurm based HPC cluster. I've tested this on both the login node and on an interactive slurm node and I get the same error. I was hoping someone might be suggest where I can start troubleshooting.

Call to cutadapt:
(base) [abf@biomix test]$ cutadapt ../fastq/WT_0_hr_1_S1_L002_R1_001.fastq.gz > test.txt

Last Passing Read (Read # 19500):

@A00547:18:HGCLFDMXX:2:1101:24795:11397 1:N:0:ATTACTCG+TATAGCCT
CGCACGCGTTAGACTCCTTGGTCCGTGTTACAAGACGGGTCGGGTGGGAAGCCGACATCGCCGCCGACCCCGTGCGCTCGGCTTCGTCGGAGACGCGTGAC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

First Failed Read:

@A00547:18:HGCLFDMXX:2:1101:25807:11397 1:N:0:ATTACTCG+TATAGCCT
CGCCCTAGGACACCTGCGTTACCGTTTGACAGGTGTACCGCCCCAGTCAAACTCCCCACCTGGCACTGTCCCCGGAGCGGGTCGCGCCCGCCCGCACGCGC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Error Message:
This is cutadapt 3.3 with Python 3.7.7
Command line parameters: ../fastq/WT_0_hr_1_S1_L002_R1_001.fastq.gz
Processing reads on 1 core in single-end mode ...
Traceback (most recent call last):
  File "/home/abf/.local/bin/cutadapt", line 8, in <module>
    sys.exit(main_cli())
  File "/home/abf/.local/lib/python3.7/site-packages/cutadapt/__main__.py", line 848, in main_cli
    main(sys.argv[1:])
  File "/home/abf/.local/lib/python3.7/site-packages/cutadapt/__main__.py", line 913, in main
    stats = r.run()
  File "/home/abf/.local/lib/python3.7/site-packages/cutadapt/pipeline.py", line 866, in run
    (n, total1_bp, total2_bp) = self._pipeline.process_reads(progress=self._progress)
  File "/home/abf/.local/lib/python3.7/site-packages/cutadapt/pipeline.py", line 326, in process_reads
    for read in self._reader:
  File "src/dnaio/_core.pyx", line 173, in fastq_iter
  File "/home/abf/anaconda3/lib/python3.7/gzip.py", line 287, in read
    return self._buffer.read(size)
  File "/home/abf/anaconda3/lib/python3.7/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/home/abf/anaconda3/lib/python3.7/gzip.py", line 465, in read
    self._read_eof()
  File "/home/abf/anaconda3/lib/python3.7/gzip.py", line 512, in _read_eof
    hex(self._crc)))
OSError: CRC check failed 0x4088b1f != 0xaaf28074
@afaranda
Copy link
Author

I tested the "gzip" module in an interactive Python session, I can read the fastq.gz file in and iterate over lines without an issue.

@Deeeeen
Copy link

Deeeeen commented Mar 23, 2021

Hi, I ran into a similar issue too when I am trying to use cutadapt to trim illumina TruSeq adapters on paired-end reads.

Cutadapt 3.3 with Python 3.8.8

Command used:
cutadapt -j 12 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -o After_R1.fastq.gz -p After_R2.fastq.gz Before_R1.fastq.gz Before_R2.fastq.gz

Error msg:

This is cutadapt 3.3 with Python 3.8.8
Command line parameters: -j 12 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -o After_R1.fastq.gz -p After_R2.fastq.gz Before_R1.fastq.gz Before_R2.fastq.gz
Processing reads on 12 cores in paired-end mode ...
ERROR: Traceback (most recent call last):
  File "/home/dec037/miniconda3/envs/cutadaptenv/lib/python3.8/site-packages/cutadapt/pipeline.py", line 555, in run
    for chunk_index, (chunk1, chunk2) in enumerate(
  File "/home/dec037/miniconda3/envs/cutadaptenv/lib/python3.8/site-packages/dnaio/chunks.py", line 119, in read_paired_chunks
    bufend2 = f2.readinto(memoryview(buf2)[start2:]) + start2  # type: ignore
  File "/home/dec037/miniconda3/envs/cutadaptenv/lib/python3.8/gzip.py", line 292, in read
    return self._buffer.read(size)
  File "/home/dec037/miniconda3/envs/cutadaptenv/lib/python3.8/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/home/dec037/miniconda3/envs/cutadaptenv/lib/python3.8/gzip.py", line 470, in read
    self._read_eof()
  File "/home/dec037/miniconda3/envs/cutadaptenv/lib/python3.8/gzip.py", line 516, in _read_eof
    raise BadGzipFile("CRC check failed %s != %s" % (hex(crc32),
gzip.BadGzipFile: CRC check failed 0x4088b1f != 0x2dc8e0a1

I saw the other closed issue saying it may because that my fastq.gz files are corrupted, but when I ran zcat on my Before_R1.fastq.gz and Before_R2.fastq.gz, it worked fine.

@afaranda
Copy link
Author

afaranda commented Mar 23, 2021

Yeah, I just checked the md5sum on my file . . . its good, and I've processed it with cutadapt before.

If I "zcat" the file into cutadapt like so:
cutadapt <(zcat ../fastq/WT_0_hr_1_S1_L002_R1_001.fastq.gz)

Everything functions properly.

@mbyott
Copy link

mbyott commented Mar 24, 2021

I have the same issue as well, my fastq's are fine.

@fjossandon
Copy link

fjossandon commented Mar 24, 2021

It happened to me today and after a lengthy search and many tests, I found the reason.
Cutadapt uses "dnaio" to handle files, dnaio uses "xopen" to open files, and xopen uses "isal" under the hood to open gzip files (https://github.com/pycompression/python-isal). I found that the files worked fine with version 0.6.1, but the CRC error appear when using version 0.7.0 and 0.8.0. Look:

ERROR: Traceback (most recent call last):
  File "/home/fossandon/.local/lib/python3.6/site-packages/cutadapt/pipeline.py", line 626, in run
    raise e
OSError: CRC check failed 0x88b1f != 0x6fe5d9e4

Traceback (most recent call last):
  File "/home/fossandon/.local/bin/cutadapt", line 8, in <module>
    sys.exit(main_cli())
  File "/home/fossandon/.local/lib/python3.6/site-packages/cutadapt/__main__.py", line 848, in main_cli
    main(sys.argv[1:])
  File "/home/fossandon/.local/lib/python3.6/site-packages/cutadapt/__main__.py", line 913, in main
    stats = r.run()
  File "/home/fossandon/.local/lib/python3.6/site-packages/cutadapt/pipeline.py", line 825, in run
    raise e
OSError: CRC check failed 0x88b1f != 0x6fe5d9e4
fossandon@ubuntu:~/Documents/temp$ pip3 list | egrep "cutadapt|dnaio|isal|xopen"
cutadapt              3.3
dnaio                 0.5.0               /home/fossandon/.local/lib/python3.6/site-packages
isal                  0.8.0
xopen                 1.1.0

But after reverting to 0.6.1 (pip3 install isal==0.6.1, last one before the bug) it works again:

299	3047	0.0	8	2963 57 11 6 5 2 3
300	8	0.0	8	0 0 0 0 0 3 4 0 1
301	15028	0.0	8	0 14646 270 64 24 15 8 0 1


WARNING:
    One or more of your adapter sequences may be incomplete.
    Please see the detailed output above.
fossandon@ubuntu:~/Documents/temp$ pip3 list | egrep "cutadapt|dnaio|isal|xopen"
cutadapt              3.3
dnaio                 0.5.0               /home/fossandon/.local/lib/python3.6/site-packages
isal                  0.6.1
xopen                 1.1.0

@afaranda
Copy link
Author

Great detective work @fjossandon! Downgrading isal from 0.8.0 to 0.6.1 appears to have resolved this for me as well.

@rhpvorderman
Copy link
Collaborator

rhpvorderman commented Mar 29, 2021

Hi everyone. Thanks to @fjossandon's excellent bug reporting I was able to solve this bug fairly quickly this morning.

I released a new version of isa-l just now (0.8.1). Could you install it and check if the bug still occurs?

The cause for the error was concatenated gzip files. These are fairly common, but were not included in the python-isal test suite. This is now corrected. Python-isal has inherited the same tests from Cpython's gzip module + many more, but unfortunately concatenated gzip's of reasonable size were not part of these tests. Concatenated gzips are also tested in xopen, but only for very small concatenated gzips.
A regression test is now added to python-isal to ensure this does not happen again.

The error is caused when reasonably sized concatenated gzip files are used. With version 0.7.0 I made some changes to the codebase of python-isal which allowed much more reuse of the code from gzip.py by inheritance. Since this code is much more battle-tested than python-isal's code I assumed this would make the project more stable. Unfortunately there is a small offset issue in isa-l when raw deflate streams are compressed. This causes unused_data to be incorrectly reported as there is some data left in a bitbuffer. Since 0.7.0 this data is correctly reported, which allows reuse of CPython's code. Unfortunately this causes unused_data to be larger than gzip.py's implementation expects which leads to an offset error in _PaddedFile which assumes that prepend data can never be bigger than its internal _buffer. This is patched now in python-isal.

@marcelm I am sorry for breaking cutadapt this week :(.

@rhpvorderman
Copy link
Collaborator

rhpvorderman commented Mar 29, 2021

The offending versions 0.7.0 and 0.8.0 have been yanked from PyPI. These cannot be installed by accident anymore.

@marcelm
Copy link
Owner

marcelm commented Mar 30, 2021

Thanks @rhpvorderman for fixing this so quickly! I was on vacation last week, and it is nice to see this resolved on coming back.

@marcelm marcelm closed this as completed Mar 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants