Bug while reading 1000 genomes data #49

Closed
ilyaminkin opened this Issue Jun 14, 2012 · 11 comments

Comments

Projects
None yet
2 participants

It seems that this lib fails to parse file:
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.wgs.phase1_release_v3.20101123.snps_indels_sv.sites.vcf.gz

Code:

import vcf
import sys

vcfReader = vcf.Reader(open(sys.argv[1]))
for record in vcfReader:
print record

Fails with:

Traceback (most recent call last):
File "C:\Users\HP\My Documents\Aptana Studio 3 Workspace\ProcSNP\CalcProb.py", line 5, in
for record in vcfReader:
File "C:\Program Files\Python27\lib\site-packages\vcf\parser.py", line 872, in next
info = self._parse_info(row[7])
File "C:\Program Files\Python27\lib\site-packages\vcf\parser.py", line 717, in _parse_info
val = entry[1]
IndexError: list index out of range

Python 2.7, Win 7 x64.

Oops, it seems that this file in VCF 4.1. format. Sorry.

@ilyaminkin ilyaminkin closed this Jun 15, 2012

Owner

jamescasbon commented Jun 15, 2012

Can you try with HEAD? (i.e. the latest code, not a release).

That file is VCF 4.1, and that is not supported in the release version. We're just evaluating the API for SV.

Also, can you send me a smaller example file, can't download that file from here too big.

Smaller file is here: http://nopaste.info/0d9f39ee4a.html
Latest version also crashes with the same message.

jamescasbon pushed a commit that referenced this issue Jun 15, 2012

Owner

jamescasbon commented Jun 15, 2012

Strictly speaking, it's a malformed VCF file. HOMSEQ is set in the header as a 'String' but is a 'Flag' in the file.

I have pushed a working change to HEAD.

I downloaded the master branch from git, but it still crashes.

P.S. Thank you for very fast response.

Owner

jamescasbon commented Jun 15, 2012

Same traceback?

Yes.

Traceback (most recent call last):
File "C:\Users\HP\My Documents\Aptana Studio 3 Workspace\ProcSNP\CalcProb.py", line 6, in
for record in vcfReader:
File "C:\Program Files\Python27\lib\site-packages\vcf\parser.py", line 756, in next
info = self._parse_info(row[7])
File "C:\Program Files\Python27\lib\site-packages\vcf\parser.py", line 637, in _parse_info
val = entry[1]

Almost the same traceback.

Oh, Im sorry, thats my fault -- forgot to replace old files. Everything is ok. Thank you very much.

Owner

jamescasbon commented Jun 15, 2012

Are you absolutely sure? It's crashing on line 637 in there, but in the source it's on line 718 and wrapped in a try/except.
https://github.com/jamescasbon/PyVCF/blob/master/vcf/parser.py#L718

Owner

jamescasbon commented Jun 15, 2012

Ah great.

gotgenes pushed a commit to gotgenes/PyVCF that referenced this issue May 13, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment