Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support VCF 4.1 #17

Closed
jamescasbon opened this issue Feb 9, 2012 · 10 comments
Closed

Support VCF 4.1 #17

jamescasbon opened this issue Feb 9, 2012 · 10 comments
Milestone

Comments

@jamescasbon
Copy link
Owner

Some new metadata in VCF 4.1 spec, notably contigs.

Added test data and tests, need to write code for this.

@dzerbino
Copy link

dzerbino commented Jun 7, 2012

Hello James,

I just started working with your PyVCF library (I had a small quick and dirty one, not quite so complete and clean as yours) and am adding code for VCF 4.1.

I already added the new INFO and FORMAT keywords, parse the ALT lines in the header, and rearrangement breakends in the ALT column. Still have to add in unit tests though...

Anyways, just letting you know what I am up to.

Cheers,

Daniel

PS Your code is real clean and structured, a pleasure to work with.

@jamescasbon
Copy link
Owner Author

Thanks so much @dzerbino #42 and @martijnvermaat #28 for the initial work on this.

The HEAD now is setup as 0.5.0-pre. I want to get full 4.1 support for the 0.5.0 release.

Currently the only ticket is #20, but I'll leave off cutting the release until people have taken the SV code for a spin.

@jamescasbon
Copy link
Owner Author

Please see the proposal on #48 for multiple classes for ALTs

@jamescasbon
Copy link
Owner Author

Anyone think I should release this?

python3 tests currently broken http://travis-ci.org/#!/jamescasbon/PyVCF/jobs/1702018 not sure why.

@dzerbino
Copy link

Hello James,

I tried to address these issues, but at the end of the day it's a backward compatibility issue between Python 3.x and 2.x. This is a matter of policy, but maintaining compatibility would probably involve two separate distributions. It's probably best to stick to Python 2.7, just like the BioPython project did.

Regards,

Daniel

@jamescasbon
Copy link
Owner Author

Thanks @dzerbino, but more importantly: do you think the SV work is ready for a release?

@dzerbino
Copy link

Oh, sorry: it definitely covers the SV spec. There are a few usage details which people may or may not find convenient:

  • Substitutions are automatically classified as SNV or MNV. This is consistent with common practice, but some people might find this pedantic?
  • Internally, when storing INFO and FORMAT lines, the reader stores the expected number of values as None, -2 (for the number of genotypes in a call), -1 (for the number of alleles in the record) or a strictly positive integer. It is doubtful that anyone (who is not coding a format validator that is) will ever refer to these parameters, but this hard coded pairing could catch someone off guard. We could either allow this value to be a string, and store 'A' and 'G' directly, or store a C-like enumeration...

@lennax
Copy link

lennax commented Jun 26, 2012

@dzerbino Re: the A/G integer-casting, see my recent change to store those associations in a dictionary for more foolproof conversion. But I see no obvious problem with storing them as strings, consistent with duck typing. Like you point out, it's unlikely that a casual user will refer directly to those values.

@dzerbino
Copy link

@lennax Thanks, I had missed that change!

@jamescasbon
Copy link
Owner Author

Thanks for the input, I'm closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants