Order of INFO fields random #46

Closed
freeseek opened this Issue Jun 11, 2012 · 3 comments

Comments

Projects
None yet
2 participants
@freeseek

I am reporting something that is not properly an issue, but I realized that the order of the fields in the INFO column of the VCF file is kind of random. I am obtaining this result since I am using PyVCF to populate some new INFO fields in a given VCF file (PyVCF works great for this purpose) and I realize that these new INFO fields are inserted in random order. The UnifiedGenotyper from GATK seems instead to neatly outputting them in alphabetical order. I suggest two possible options:

  1. The order of the fields in the INFO column respect the order of the INFO fields in the header.
  2. The order of the fields, both in the header and the INFO column be alphabetically sorted.
@jamescasbon

This comment has been minimized.

Show comment
Hide comment
@jamescasbon

jamescasbon Jun 11, 2012

Owner

If you update to 0.4.5 the order of the fields will be as they are added (i.e. from the source file, then any you manually add).

The infos are stored in an ordereddict. which you can reorder: http://pyvcf.readthedocs.org/en/latest/API.html#vcf.Reader.infos with something like:

reader.infos = OrderedDict(sorted(reader.infos.items()))
Owner

jamescasbon commented Jun 11, 2012

If you update to 0.4.5 the order of the fields will be as they are added (i.e. from the source file, then any you manually add).

The infos are stored in an ordereddict. which you can reorder: http://pyvcf.readthedocs.org/en/latest/API.html#vcf.Reader.infos with something like:

reader.infos = OrderedDict(sorted(reader.infos.items()))
@freeseek

This comment has been minimized.

Show comment
Hide comment
@freeseek

freeseek Jun 11, 2012

The INFO fields in the header are listed as they are added, but this is not so in the INFO column when I call the write_record function are in random order. I am trying to look at the code, but I can't really track the problem. When the INFO column is generated with this code:
def _format_info(self, info):
return ';'.join(["%s=%s" % (x, self._stringify(y)) for x, y in info.iteritems()])
The info.iteritems() iterator must not be ordered (I am using 0.4.5).

The INFO fields in the header are listed as they are added, but this is not so in the INFO column when I call the write_record function are in random order. I am trying to look at the code, but I can't really track the problem. When the INFO column is generated with this code:
def _format_info(self, info):
return ';'.join(["%s=%s" % (x, self._stringify(y)) for x, y in info.iteritems()])
The info.iteritems() iterator must not be ordered (I am using 0.4.5).

@freeseek

This comment has been minimized.

Show comment
Hide comment
@freeseek

freeseek Jun 11, 2012

Changing "retdict = {}" to "retdict = OrderedDict()" in the parser.py file does the job. Though it would be nice if the order in the INFO column reflected the order in the header, instead of being computed through a different path.

Changing "retdict = {}" to "retdict = OrderedDict()" in the parser.py file does the job. Though it would be nice if the order in the INFO column reflected the order in the header, instead of being computed through a different path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment