Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastxFile silently prematurely truncates gzipped FASTQ #919

Closed
jbloom opened this issue May 11, 2020 · 3 comments
Closed

FastxFile silently prematurely truncates gzipped FASTQ #919

jbloom opened this issue May 11, 2020 · 3 comments

Comments

@jbloom
Copy link
Contributor

jbloom commented May 11, 2020

test.fastq.gz
The attached *.fastq.gz file has 24520 entries.

But if I read them using FastxFile, it reports that there are only 114 entries:

>>>  len([_ for _ in pysam.FastxFile('test.fastq.gz')])
114

The problem appears to have something to do with the fact that the file is gzipped. If I gunzip it and then do the same, it correctly reports all the entries:

>>> len([_ for _ in pysam.FastxFile('test.fastq')])
24520

I'm not sure of the source of the bug, but I think FastxFile should either read all entries from the gzipped FASTQ or raise an error if there is some problem with the gzipped file.

This is with pysam version 0.15.4.

@jmarshall
Copy link
Member

Your gzipped file contains ~200 gzip members — was it perhaps created by concatenating lots of individual .gz files?

You have encountered samtools/htslib#742: htslib 1.9 did not handle such multi-member plain gzipped files. This will be fixed in pysam when it moves from htslib 1.9 to 1.10 — or in the meantime you could apply PR #905 to your pysam yourself.

@jmarshall
Copy link
Member

Duplicate of #738

@jmarshall jmarshall marked this as a duplicate of #738 May 12, 2020
@jmarshall
Copy link
Member

This has been fixed since pysam 0.16.0 (when used with HTSlib 1.10 or later).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants