Permalink
Browse files

Fix bug that removes sample data when GT field is not present

Some programs (e.g. bcftools) may output VCF files whose samples do not have its
GT field value. When dealing with files like these, PyVCF's writer will
(previously) remove all non-GT data and replace it with './.' since the
`_format_sample` function immediately returns upon failing to find GT data.

This fix addresses the issue, so that the Writer keeps any non-GT data intact.
  • Loading branch information...
1 parent 3540bb7 commit e3e54843fb6b0b88b97a4d5ea936d9b64e69a2d6 @bow bow committed Jan 29, 2013
Showing with 24 additions and 3 deletions.
  1. +24 −3 vcf/parser.py
View
@@ -620,9 +620,30 @@ def _format_info(self, info):
return ';'.join([self._stringify_pair(x,y) for x, y in info.iteritems()])
def _format_sample(self, fmt, sample):
- if getattr(sample.data, 'GT', None) is None:
- return "./."
- return ':'.join([self._stringify(x) for x in sample.data])
+ try:
+ # Try to get the GT value first.
+ gt = getattr(sample.data, 'GT')
+ # PyVCF stores './.' GT values as None, so we need to revert it back
+ # to './.' when writing.
+ if gt is None:
+ gt = './.'
+ except AttributeError:
+ # Failing that, try to check whether 'GT' is specified in the FORMAT
+ # field. If yes, use the recommended empty value ('./.')
+ if 'GT' in fmt:
+ gt = './.'
+ # Otherwise use an empty string as the value
+ else:
+ gt = ''
+ # If gt is an empty string (i.e. not stored), write all other data
+ if not gt:
+ return ':'.join([self._stringify(x) for x in sample.data])
+ # Otherwise use the GT values from above and combine it with the rest of
+ # the data.
+ # Note that this follows the VCF spec, where GT is always the first
+ # item whenever it is present.
+ else:
+ return ':'.join([gt] + [self._stringify(x) for x in sample.data[1:]])
def _stringify(self, x, none='.', delim=','):
if type(x) == type([]):

0 comments on commit e3e5484

Please sign in to comment.