Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Order of INFO fields random #46

Closed
freeseek opened this Issue · 3 comments

2 participants

@freeseek

I am reporting something that is not properly an issue, but I realized that the order of the fields in the INFO column of the VCF file is kind of random. I am obtaining this result since I am using PyVCF to populate some new INFO fields in a given VCF file (PyVCF works great for this purpose) and I realize that these new INFO fields are inserted in random order. The UnifiedGenotyper from GATK seems instead to neatly outputting them in alphabetical order. I suggest two possible options:
1) The order of the fields in the INFO column respect the order of the INFO fields in the header.
2) The order of the fields, both in the header and the INFO column be alphabetically sorted.

@jamescasbon
Owner

If you update to 0.4.5 the order of the fields will be as they are added (i.e. from the source file, then any you manually add).

The infos are stored in an ordereddict. which you can reorder: http://pyvcf.readthedocs.org/en/latest/API.html#vcf.Reader.infos with something like:

reader.infos = OrderedDict(sorted(reader.infos.items()))
@freeseek

The INFO fields in the header are listed as they are added, but this is not so in the INFO column when I call the write_record function are in random order. I am trying to look at the code, but I can't really track the problem. When the INFO column is generated with this code:
def _format_info(self, info):
return ';'.join(["%s=%s" % (x, self._stringify(y)) for x, y in info.iteritems()])
The info.iteritems() iterator must not be ordered (I am using 0.4.5).

@freeseek

Changing "retdict = {}" to "retdict = OrderedDict()" in the parser.py file does the job. Though it would be nice if the order in the INFO column reflected the order in the header, instead of being computed through a different path.

@jamescasbon jamescasbon closed this issue from a commit
James Casbon store order of INFO column, closes #46 ccecafd
@gotgenes gotgenes referenced this issue from a commit in gotgenes/PyVCF
James Casbon store order of INFO column, closes #46 e1a8e6b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.