Skip to content

Commit

Permalink
Fix bug that removes sample data when GT field is not present
Browse files Browse the repository at this point in the history
Some programs (e.g. bcftools) may output VCF files whose samples do not have its
GT field value. When dealing with files like these, PyVCF's writer will
(previously) remove all non-GT data and replace it with './.' since the
`_format_sample` function immediately returns upon failing to find GT data.

This fix addresses the issue, so that the Writer keeps any non-GT data intact.
  • Loading branch information
bow committed Jan 30, 2013
1 parent 3540bb7 commit e3e5484
Showing 1 changed file with 24 additions and 3 deletions.
27 changes: 24 additions & 3 deletions vcf/parser.py
Expand Up @@ -620,9 +620,30 @@ def _format_info(self, info):
return ';'.join([self._stringify_pair(x,y) for x, y in info.iteritems()])

def _format_sample(self, fmt, sample):
if getattr(sample.data, 'GT', None) is None:
return "./."
return ':'.join([self._stringify(x) for x in sample.data])
try:
# Try to get the GT value first.
gt = getattr(sample.data, 'GT')
# PyVCF stores './.' GT values as None, so we need to revert it back
# to './.' when writing.
if gt is None:
gt = './.'
except AttributeError:
# Failing that, try to check whether 'GT' is specified in the FORMAT
# field. If yes, use the recommended empty value ('./.')
if 'GT' in fmt:
gt = './.'
# Otherwise use an empty string as the value
else:
gt = ''
# If gt is an empty string (i.e. not stored), write all other data
if not gt:
return ':'.join([self._stringify(x) for x in sample.data])
# Otherwise use the GT values from above and combine it with the rest of
# the data.
# Note that this follows the VCF spec, where GT is always the first
# item whenever it is present.
else:
return ':'.join([gt] + [self._stringify(x) for x in sample.data[1:]])

def _stringify(self, x, none='.', delim=','):
if type(x) == type([]):
Expand Down

0 comments on commit e3e5484

Please sign in to comment.