-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSError: CRC check failed 0x4088b1f != 0xaaf28074 #520
Comments
I tested the "gzip" module in an interactive Python session, I can read the fastq.gz file in and iterate over lines without an issue. |
Hi, I ran into a similar issue too when I am trying to use cutadapt to trim illumina TruSeq adapters on paired-end reads. Cutadapt 3.3 with Python 3.8.8 Command used: Error msg:
I saw the other closed issue saying it may because that my fastq.gz files are corrupted, but when I ran |
Yeah, I just checked the md5sum on my file . . . its good, and I've processed it with cutadapt before. If I "zcat" the file into cutadapt like so: Everything functions properly. |
I have the same issue as well, my fastq's are fine. |
It happened to me today and after a lengthy search and many tests, I found the reason.
But after reverting to 0.6.1 (
|
Great detective work @fjossandon! Downgrading isal from 0.8.0 to 0.6.1 appears to have resolved this for me as well. |
Hi everyone. Thanks to @fjossandon's excellent bug reporting I was able to solve this bug fairly quickly this morning. I released a new version of isa-l just now (0.8.1). Could you install it and check if the bug still occurs? The cause for the error was concatenated gzip files. These are fairly common, but were not included in the python-isal test suite. This is now corrected. Python-isal has inherited the same tests from Cpython's gzip module + many more, but unfortunately concatenated gzip's of reasonable size were not part of these tests. Concatenated gzips are also tested in xopen, but only for very small concatenated gzips. The error is caused when reasonably sized concatenated gzip files are used. With version 0.7.0 I made some changes to the codebase of python-isal which allowed much more reuse of the code from gzip.py by inheritance. Since this code is much more battle-tested than python-isal's code I assumed this would make the project more stable. Unfortunately there is a small offset issue in isa-l when raw deflate streams are compressed. This causes unused_data to be incorrectly reported as there is some data left in a bitbuffer. Since 0.7.0 this data is correctly reported, which allows reuse of CPython's code. Unfortunately this causes unused_data to be larger than gzip.py's implementation expects which leads to an offset error in _PaddedFile which assumes that @marcelm I am sorry for breaking cutadapt this week :(. |
The offending versions 0.7.0 and 0.8.0 have been yanked from PyPI. These cannot be installed by accident anymore. |
Thanks @rhpvorderman for fixing this so quickly! I was on vacation last week, and it is nice to see this resolved on coming back. |
Cutadapt Version: 3.3
Python Version: 3.7.7
I just installed the latest version of cutadapt, and I'm getting weird error message. I've previously had no issues processing this file with cutadapt.
Cutadapt is installed in my home directory on a slurm based HPC cluster. I've tested this on both the login node and on an interactive slurm node and I get the same error. I was hoping someone might be suggest where I can start troubleshooting.
Call to cutadapt:
(base) [abf@biomix test]$ cutadapt ../fastq/WT_0_hr_1_S1_L002_R1_001.fastq.gz > test.txt
Last Passing Read (Read # 19500):
First Failed Read:
The text was updated successfully, but these errors were encountered: