Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BaseBreakendVariant #21

Closed
julesjacobsen opened this issue Dec 17, 2020 · 10 comments
Closed

Add BaseBreakendVariant #21

julesjacobsen opened this issue Dec 17, 2020 · 10 comments
Labels
enhancement New feature or request

Comments

@julesjacobsen
Copy link
Contributor

@ielis didn't you need to implement your own version of a BreakendVariant? This would have been useful right?

@ielis
Copy link
Collaborator

ielis commented Dec 17, 2020

Yes, I resorted to implementing BreakendedSvannaVariant from scratch. The class implements SvannaVariant, Breakended, where SvannaVariant interface extends Variant and some other interfaces that add bits specific to SvAnna.

I think having BaseBreakendedVariant would be useful.

@julesjacobsen
Copy link
Contributor Author

OK will add one. Will need one for Exomiser too...

@julesjacobsen julesjacobsen added the enhancement New feature or request label Jan 6, 2021
@julesjacobsen
Copy link
Contributor Author

In progress on this one. So far I've made a few changes such as:

  • Rename BreakendVariant to DefaultBreakendVariant
  • Rename Breakended to BreakendVariant
  • Change signature of BreakendVariant to BreakendVariant extends Variant
  • Delete BreakendVariant.mateId() as this is only relevant to the Breakend

@ielis I have a question which I want to double-check something. Can you confirm that the trailingRef variable is never used? From what I can see it's only ever set as an empty string in the constructor, so its a bit of legacy code which has been replaced with the existence of the VcfBreakendFormatter and soon the VcfBreakendVariantParser (issue #36)

/**
 * When variant is converted to the opposite strand, then the reverse complemented ref allele is stored here.
 * We need to store the allele in order to reconstruct the original ref allele if variant is converted again to the
 * original strand.
 */
private final String trailingRef;

@Override
public DefaultBreakendVariant toOppositeStrand() {
    return new DefaultBreakendVariant(eventId, right.toOppositeStrand(), left.toOppositeStrand(),
            Seq.reverseComplement(trailingRef), Seq.reverseComplement(ref), Seq.reverseComplement(alt));
}

works like this:

org.monarchinitiative.variant.api.impl.DefaultBreakendVariantTest,toOppositeStrand

// 13  123457  bnd_X   A   [17:198983[A    6   PASS    SVTYPE=BND;MATEID=bnd_Z;EVENT=tra3

Original: BreakendVariant{eventId='tra3', left=Breakend{contig=13, id='bnd_X', strand=-, coordinateSystem=FULLY_CLOSED, startPosition=115046422, endPosition=115046422}, right=Breakend{contig=17, id='bnd_Z', strand=+, coordinateSystem=FULLY_CLOSED, startPosition=198983, endPosition=198983}, ref=A, alt=''}
Opposite: BreakendVariant{eventId='tra3', left=Breakend{contig=17, id='bnd_Z', strand=-, coordinateSystem=FULLY_CLOSED, startPosition=83058459, endPosition=83058459}, right=Breakend{contig=13, id='bnd_X', strand=+, coordinateSystem=FULLY_CLOSED, startPosition=123457, endPosition=123457}, ref=, alt=''}

Is it still needed, or could be replaced with a constant empty string? I suspect this probably isn't always the case, for example if a toOppositeStranded BreakendVariant is copied e.g.

BreakendVariant transformed = BreakendVariant.builder().with(original.toOppositeStrand()).build().toOppositeStrand()

transformed should equal original

@julesjacobsen
Copy link
Contributor Author

julesjacobsen commented Jan 16, 2021

Just a note- might be easier to add a sequence to the Breakends, so it is always attached to the correct end and store the inserted sequence, if any, on the BreakendVariant although this is what the alt field currently does.

@julesjacobsen
Copy link
Contributor Author

See commit: 967ff30

@ielis
Copy link
Collaborator

ielis commented Jan 19, 2021

Hi @julesjacobsen ,
the trailingRef field is used in toOppositeStrand() method, where it is passed instead of ref. It's not a legacy code.

When creating a breakend variant from a VCF record, we have base sequence for ref allele for left breakend only. In case we convert the breakend variant to the opposite strand, we must set ref to an empty string, since we do not have the base sequence for the opposite strand. If we want to convert the variant back to the original strand, we will miss the ref allele if we do not store it. That's why we store the ref allele into trailingRef field and use the field during the strand conversion.

@julesjacobsen
Copy link
Contributor Author

julesjacobsen commented Jan 19, 2021

@ielis, it's really just a bit of a mess which is why I dislike it. This is related to the question in issue #39. The problem arises due to the inclusion of the 'left' ref base, but not that on the 'right', so when flipping the strands you end up losing information.

The solutions are:

  1. Keep a reference to the original and only show it when in the original configuration (current, moderate wtf score)
  2. Trim away the ref allele and only include any inserted sequence if present leading to either two empty ends or an empty end and an inserted end. Downside here is that it depends on people doing this and is inconsistent with the presentation of all other Variant types)
  3. Just return this when trying to do a BreakendVariant.toOppositeStrand(). This way nothing changes, and nothing is screwy. The reason being if you're to assess whether a breakend variant overlaps a region (i.e. disrupts the region), you'll need to check each Breakend independently by calling breakend.toStrand(targetRegion.strand()). This is my preferred solution and I think has the lowest wtf score.

@ielis
Copy link
Collaborator

ielis commented Jan 19, 2021

I agree that the solution 2. is not a good one as we should not expect the users to do much in order to make svart's classes work best.

Let's go for the solution 3 then, we can check the breakends independently.

@julesjacobsen
Copy link
Contributor Author

Great - will do 3 is also easiest time implement ;)

I think adding a Variant.isBreakend() method will help with this. Will need to check that isSymbolic() != isBreakend()

@julesjacobsen
Copy link
Contributor Author

Unfortunately BreakendVariant.isSymbolic() == true so will need to have a think about what defines 'symbolic' as opposed to breakend/non-symbolic. In VCF parlance I think a breakend is symbolic, so we'll need to think about wether to change the definition here.

It's possible to do a simple instanceof test, but these always feel a bit out of place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants