Skip to content

Possible regression: reading gzip files generates a CRC check failed error in version 0.11.0 #87

@fjossandon

Description

@fjossandon

Hello @rhpvorderman,
Some months ago I reported a bug in the decompression of Gzip files (#60), and today while using cutadapt in a different computer it happened again. I remembered the previous time and checked the compressed files, and found that "zcat" and "gzip -t" were not giving any errors, so I suspected of isal.

In my personal computer I have installed version 0.8.1 which work fine with the files, so without changing anything else I tried installing the next isal versions one by one, and found that the the files are decompressed fine with isal versions 0.9.0 and 0.10.0, but breaks on last version 0.11.0:

fossandon@ubuntu:~/Documents/download$ pip3 install isal==0.11.0
Defaulting to user installation because normal site-packages is not writeable
Collecting isal==0.11.0
  Using cached isal-0.11.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
Installing collected packages: isal
  Attempting uninstall: isal
    Found existing installation: isal 0.10.0
    Uninstalling isal-0.10.0:
      Successfully uninstalled isal-0.10.0
Successfully installed isal-0.11.0

fossandon@ubuntu:~/Documents/download$ cutadapt -a "AACTTTYARCAAYGGATCTC;max_error_rate=0.1;min_overlap=20" -A "TGATCCYTCCGCAGGT;max_error_rate=0.5;min_overlap=16" --pair-adapters --pair-filter any --cores 2 --output 136727_R1.fastq --paired-output 136727_R2.fastq 136727_S159_L001_R1_001.fastq.gz 136727_S159_L001_R2_001.fastq.gz
This is cutadapt 3.4 with Python 3.6.9
Command line parameters: -a AACTTTYARCAAYGGATCTC;max_error_rate=0.1;min_overlap=20 -A TGATCCYTCCGCAGGT;max_error_rate=0.5;min_overlap=16 --pair-adapters --pair-filter any --cores 2 --output 136727_R1.fastq --paired-output 136727_R2.fastq 136727_S159_L001_R1_001.fastq.gz 136727_S159_L001_R2_001.fastq.gz
Processing reads on 2 cores in paired-end mode ...
ERROR: Traceback (most recent call last):
  File "/home/fossandon/.local/lib/python3.6/site-packages/cutadapt/pipeline.py", line 556, in run
    dnaio.read_paired_chunks(f, f2, self.buffer_size)):
  File "/home/fossandon/Documents/Github_repos/dnaio/src/dnaio/chunks.py", line 118, in read_paired_chunks
    bufend1 = f.readinto(memoryview(buf1)[start1:]) + start1  # type: ignore
  File "/usr/lib/python3.6/gzip.py", line 276, in read
    return self._buffer.read(size)
  File "/usr/lib/python3.6/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/home/fossandon/.local/lib/python3.6/site-packages/isal/igzip.py", line 265, in read
    self._read_eof()
  File "/usr/lib/python3.6/gzip.py", line 501, in _read_eof
    hex(self._crc)))
OSError: CRC check failed 0x8b1f001a != 0xd2f5dc20

ERROR: Traceback (most recent call last):
  File "/home/fossandon/.local/lib/python3.6/site-packages/cutadapt/pipeline.py", line 626, in run
    raise e
OSError: CRC check failed 0x8b1f001a != 0xd2f5dc20

Traceback (most recent call last):
  File "/home/fossandon/.local/bin/cutadapt", line 8, in <module>
    sys.exit(main_cli())
  File "/home/fossandon/.local/lib/python3.6/site-packages/cutadapt/__main__.py", line 848, in main_cli
    main(sys.argv[1:])
  File "/home/fossandon/.local/lib/python3.6/site-packages/cutadapt/__main__.py", line 913, in main
    stats = r.run()
  File "/home/fossandon/.local/lib/python3.6/site-packages/cutadapt/pipeline.py", line 825, in run
    raise e
OSError: CRC check failed 0x8b1f001a != 0xd2f5dc20

Inspecting the changes in the last release, I found that a couple of lines added in 0.8.1 fix were modified:

Could it be that the modification caused a regression??

I shared the the files pair that caused the error in this folder, so you can reproduce it on your end:
https://drive.google.com/drive/folders/1iOqvXbDQQd8NDtnZhzutmOxx4wUONO-k?usp=sharing

Best regards,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions