Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reading and then writing doesn't produce byte identical output? #251

Open
eroller opened this issue Aug 24, 2016 · 2 comments
Open

reading and then writing doesn't produce byte identical output? #251

eroller opened this issue Aug 24, 2016 · 2 comments

Comments

@eroller
Copy link

eroller commented Aug 24, 2016

I think this should be a requirement. I noticed the following differences from just reading in and writing out a vcf file:

  1. header lines are in different order
  2. INFO fields are in different order
  3. 0 padding is added to fields (e.g. 2 becomes 2.0)
  4. truncate sample genotype columns are replaced with missing "." for each field (e.g. "." becomes ".:.:.:.:.:." depending on the number of fields)

Neither of these breaks the vcf specification, but I hope this library could be updated to follow this requirement. As modifications are made to the vcf using this library I want to see only the changes that were made, which means all these other trivial modifications are just noise.

@martijnvermaat
Copy link
Collaborator

Hi @eroller, thanks for the report!

While I'm not particularly concerned about this (especially the first two seem purely cosmetic), I can see that this would be nice to have. I'd review and merge PRs for this if they are not too invasive otherwise.

@jamescasbon
Copy link
Owner

This is, I believe, a very hard requirement. I think it needs a source map plus change tracking.

VCF writing was really not proper, so this would possibly need a lot of changes to the parser as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants