Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove overlaps not working as expected #2082

Open
mbhall88 opened this issue Jan 24, 2024 · 6 comments
Open

Remove overlaps not working as expected #2082

mbhall88 opened this issue Jan 24, 2024 · 6 comments

Comments

@mbhall88
Copy link
Contributor

I'm not sure if I am misunderstanding how bcftools +remove-overlaps works, but I still have some overlapping variants in the output afterwards

e.g.,

chromosome      28728   62e866df        T       A       .       PASS    .      GT      1/1
chromosome      28728   e4388ae3        T       TA      .       PASS    .    GT      1/1

and

chromosome      30586   60f6a01c        T       TAACAACACTTGG   .       PASS    .      GT      1/1
chromosome      30588   a1f621d4        G       A       .       PASS    .    GT      1/1
@pd3
Copy link
Member

pd3 commented Jan 24, 2024

These variants are not overlapping. For example, the insertion in the first example inserts A between the positions 28728 and 28729.

@mbhall88
Copy link
Contributor Author

Ahhh okay I see what you mean with the first example. But am struggling to understand how the second example isn't overlapping?

@pd3
Copy link
Member

pd3 commented Jan 24, 2024

It's exactly the same case. The first happens between 30586 and 30587, 30588 is not affected by the insertion.

@mbhall88
Copy link
Contributor Author

mbhall88 commented Jan 24, 2024

Alright, so I did some playing and testing and it seems the order of the variants impacts what bcftools consensus does and how bcftools views them in terms of overlapping variants.

As a way of illustrating, let's reframe the variants for clarity and provide an example fasta sequence

First example:

>chromosome
GTC
chromosome      2   62e866df        T       A       .       PASS    .      GT      1/1
chromosome      2   e4388ae3        T       TA      .       PASS    .    GT      1/1

bcftools consensus would turn this into

>chromosome
GAAC

BUT, if I switch the order of the variants I get

>chromosome
The site chromosome:2 overlaps with another variant, skipping...
GTAC

And indeed, if I run bcftools +remove-overlaps on the first example, with the order swapped, both variants are removed from the VCF.

Second example:

>chromosome
CGTAGT
chromosome      3   60f6a01c        T       TAACAACACTTGG   .       PASS    .      GT      1/1
chromosome      5   a1f621d4        G       A       .       PASS    .    GT      1/1

bcftools consensus would turn this into

>chromosome
CGTAACAACACTTGGAAT

The ordering situation from the first example obviously doesn't apply here given the positions are different.

tl;dr make sure you run bcftools +remove-overlaps before running bcftools consensus

@mbhall88
Copy link
Contributor Author

Me again.

I've just run into an example where bcftools +remove-overlaps doesn't deem the variants overlapping, but bcftools consensus does

chromosome_2    1561582 0cb9eac6        A       ATTTCTTTTGATAAGAAAGTATTAAGTG    .       PASS    .       GT      1/1
chromosome_2    1561582 4bc9851a        A       AT      .       PASS    .       GT      1/1

@mbhall88
Copy link
Contributor Author

Sorry to be a pest @pd3, not sure if this dropped off your radar (I imagine you get a lot of notifications)?

Is my last example a bug?

@pd3 pd3 reopened this Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants