Skip to content
This repository

Changed the rule to split records into columns #77

Closed
wants to merge 1 commit into from

3 participants

Marco Falcioni James Casbon Martijn Vermaat
Marco Falcioni

According to the specification the columns must be tab separated. I encountered an VCF file from NCBI that has spaces in the INFO column, which caused PyVCF to fail.
http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41

Marco Falcioni Changed the rule to split records into columns
According to the specification the columns must be tab separated. I encountered an VCF file from NCBI that has spaces in the INFO column, which caused PyVCF to fail.
http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41
df5c1e8
James Casbon
Owner

Thanks for picking this up!

That should be '\t' and not '\t+', right? It has to be only one tab.

I'm tempted to have a permissive mode as I'm sure I've seen space separated files somewhere. Did you try running the test suite as there might be some in there?

Martijn Vermaat
Collaborator

At this point you might just as well make it

row = line.split('\t')

which is probably faster.

I would be interested in seeing an example of a space separated file. I'd say @marcofalcioni's use case is a valid one (especially if he can point us to such a file), although spaces in the INFO column are not allowed by the specification.

Marco Falcioni

Indeed - issue 16 and issue 49 both fail, as they have spaces instead of tabs. I feel unclean.

James Casbon
Owner

Added 'strict_whitespace' version to 0.6.1 release

James Casbon jamescasbon closed this November 27, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Showing 1 unique commit by 1 author.

Nov 14, 2012
Marco Falcioni Changed the rule to split records into columns
According to the specification the columns must be tab separated. I encountered an VCF file from NCBI that has spaces in the INFO column, which caused PyVCF to fail.
http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41
df5c1e8
This page is out of date. Refresh to see the latest.

Showing 1 changed file with 1 addition and 1 deletion. Show diff stats Hide diff stats

  1. 2  vcf/parser.py
2  vcf/parser.py
@@ -437,7 +437,7 @@ def _parse_alt(self, str):
437 437
     def next(self):
438 438
         '''Return the next record in the file.'''
439 439
         line = self.reader.next()
440  
-        row = re.split('\t| +', line)
  440
+        row = re.split('\t+', line)
441 441
         chrom = row[0]
442 442
         if self._prepend_chr:
443 443
             chrom = 'chr' + chrom
Commit_comment_tip

Tip: You can add notes to lines in a file. Hover to the left of a line to make a note

Something went wrong with that request. Please try again.