Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Alt classes #48

Merged
merged 6 commits into from Jun 15, 2012

Conversation

Projects
None yet
3 participants

Hello James,

It occurred to me that the changes I made to your code work well if you are simply parsing/editing/re-printing VCF records, but not very handy for actually generating files.

I have therefore re-cast the _AltRecord as a standalone object (no more subclassing str) and defined a proper tree of subclasses (Substitution, SV, SingleBreakend, PairedBreakend). This is far cleaner code wise, but means that you can no longer compare allele call to a string as in:

alt == "A"

but you would need to use its members, as in:

alt.type == "SNV" and alt.sequence == "A"

which is a bit heavier.

Anyways, this is simply a proposition, let me know what you think.

Regards,

Daniel

Owner

jamescasbon commented Jun 14, 2012

I think this is going in the right direction, I'm hoping anyone else might give their opinion on an API breaking change.

@jamescasbon jamescasbon referenced this pull request Jun 14, 2012

Closed

Support VCF 4.1 #17

You could provide a custom comparison on the _Substitution class to allow backwards-compatible string comparisons:

class _Substitution(_AltRecord):
    def __cmp__(self, other):
         if hasattr(other, "sequence"):
             return cmp((self.type, self.sequence), (other.type, other.sequence))
         else if isinstance(other, basestring):
             return cmp(str(self), other)
         else:
              return cmp(self, other)

How are small indels handled with the proposed scheme?

That's a good idea, Brad, thanks a lot.

Regarding small indels, I'll just copy the example from the spec:
20 2 . TCG T . PASS DP=100

It is essentially a substitution, but with a play on the ref and alt sequences. In this scheme, I think that the API should mimic the logic of the spec and also use _Substitution to represent small indels. There is a function is_indel to tell you whether a substitution record is an indel (essentially comparing the lengths of the ref and alt strings).

OK, it works perfectly now. You can do alt == "A" or even alt in ['A', 'T', 'G', 'C']

However, you still need to check that you are working with a _Substitution object.

Daniel,
Thanks for the clarification on the indels. Apologies, I misread MNV as MNP and thought you might need an extra object for these. It makes good sense now.

I like the updated approach. It's good to have support for structural variants but not at the price of making the more common variant case harder. Good stuff.

Owner

jamescasbon commented Jun 15, 2012

A little thought needs to be given to the interaction between AltRecord and all these methods:
https://github.com/jamescasbon/PyVCF/blob/master/vcf/parser.py#L426

i.e. at the moment, we do:

if record.is_snp()

whereas it should be:

if len(record.ALT) == 1 and record.ALT[0].is_snp()

Although, I will add a convenience to record to check if it is has a single ALT.

jamescasbon pushed a commit that referenced this pull request Jun 15, 2012

@jamescasbon jamescasbon merged commit a99a4c4 into jamescasbon:master Jun 15, 2012

I agree that some double examination will be necessary, however I would like to point out a subtlety in the API: if a _Substitution has length 1, it's type in "SNV", else it is "MNV".

Thanks for looking into my changes!

gotgenes pushed a commit to gotgenes/PyVCF that referenced this pull request May 13, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment