Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge not behaving as expected for DUP #23

Closed
kmurphy902 opened this issue Aug 30, 2021 · 3 comments
Closed

merge not behaving as expected for DUP #23

kmurphy902 opened this issue Aug 30, 2021 · 3 comments

Comments

@kmurphy902
Copy link

Hi Melanie,

I have the following sample input from the GIAB trio processed using DRAGEN:

chr1    144905843       DRAGEN:GAIN:chr1:144905844-144907253    N       <DUP>   .       PASS    SVTYPE=CNV;END=144907253;REFLEN=1410    GT:SM:CN:BC:PE:QS:FT:DN ./1:1.33898:3:1:47,51:8:cnvQual ./1:1.29469:3:1:101,103:6:cnvQual       ./1:1.48387:3:1:50,58:17:PASS:Inherited
chr1    144907253       DRAGEN:GAIN:chr1:144907254-144987033    N       <DUP>   .       PASS    SVTYPE=CNV;END=144987033;REFLEN=79780   GT:SM:CN:BC:PE:QS:FT:DN ./1:2.06779:4:38:51,19:95:PASS  ./1:2.06284:4:38:103,43:105:PASS        ./1:2.06813:4:38:58,25:96:PASS:Inherited
chr1    144987033       DRAGEN:GAIN:chr1:144987034-144998774    N       <DUP>   .       PASS    SVTYPE=CNV;END=144998774;REFLEN=11741   GT:SM:CN:BC:PE:QS:FT:DN ./1:1.41905:3:11:19,7:24:PASS   ./1:1.27235:3:11:43,14:8:cnvQual        ./1:1.52716:3:11:25,11:35:PASS:Inherited

I installed jasmine today using conda, and I am using the following command:

jasmine file_list=GIAB.cnv_filtered_noLowDQ.vcf --normalize_type --allow_intrasample --output_genotypes --comma_filelist --nonlinear_dist --max_dist=5000 --ignore_strand out_file=jasmine_GIAB.cnv_filtered_noLowDQ.vcf

Merging is occurring (I am able to see events that were successfully merged in the output) but the only events that are merged are DEL. For example:

chr1    207541223       0_DRAGEN:LOSS:chr1:207541224-207542261  N       <DEL>   .       PASS    SVTYPE=;END=207542261;REFLEN=1038;SVLEN=-1038;STARTVARIANCE=2146376.000000;ENDVARIANCE=2301368.000000;AVG_LEN=-1560.333333;AVG_START
=207542733.666667;AVG_END=207544294.000000;VARCALLS=3;ALLVARS_EXT=(DRAGEN:LOSS:chr1:207541224-207542261,DRAGEN:LOSS:chr1:207542262-207544717,DRAGEN:LOSS:chr1:207544718-207545904);SUPP_VEC_EXT=1;IDLIST_EXT=DRAGEN:LOSS:chr1:207541
224-207542261;SUPP_EXT=1;STRANDS=??;SUPP_VEC=1;SUPP=1;SVMETHOD=JASMINE;IDLIST=DRAGEN:LOSS:chr1:207541224-207542261;INTRASAMPLE_IDLIST=DRAGEN:LOSS:chr1:207541224-207542261,DRAGEN:LOSS:chr1:207542262-207544717,DRAGEN:LOSS:chr1:207
544718-207545904    GT:IS:OT:DV:DR  0/1:.:CNV:0:.   ./.:.:CNV:0:.   0/1:.:CNV:0:.

All of my DUPs that should be merged are not being merged in the output, for example:

chr1    144905843       0_DRAGEN:GAIN:chr1:144905844-144907253  N       <DUP>   .       PASS    SVTYPE=;END=144907253;REFLEN=1410;SVLEN=1410;STARTVARIANCE=0.000000;ENDVARIANCE=0.000000;AVG_LEN=1410.000000;AVG_START=144905843.000
000;AVG_END=144907253.000000;VARCALLS=1;ALLVARS_EXT=(DRAGEN:GAIN:chr1:144905844-144907253);SUPP_VEC_EXT=1;IDLIST_EXT=DRAGEN:GAIN:chr1:144905844-144907253;SUPP_EXT=1;STRANDS=??;SUPP_VEC=1;SUPP=1;SVMETHOD=JASMINE;IDLIST=DRAGEN:GAI
N:chr1:144905844-144907253;INTRASAMPLE_IDLIST=DRAGEN:GAIN:chr1:144905844-144907253      GT:IS:OT:DV:DR  ./1:.:CNV:0:.   ./1:.:CNV:0:.   ./1:.:CNV:0:.
chr1    144907253       0_DRAGEN:GAIN:chr1:144907254-144987033  N       <DUP>   .       PASS    SVTYPE=;END=144987033;REFLEN=79780;SVLEN=79780;STARTVARIANCE=0.000000;ENDVARIANCE=0.000000;AVG_LEN=79780.000000;AVG_START=144907253.
000000;AVG_END=144987033.000000;VARCALLS=1;ALLVARS_EXT=(DRAGEN:GAIN:chr1:144907254-144987033);SUPP_VEC_EXT=1;IDLIST_EXT=DRAGEN:GAIN:chr1:144907254-144987033;SUPP_EXT=1;STRANDS=??;SUPP_VEC=1;SUPP=1;SVMETHOD=JASMINE;IDLIST=DRAGEN:
GAIN:chr1:144907254-144987033;INTRASAMPLE_IDLIST=DRAGEN:GAIN:chr1:144907254-144987033   GT:IS:OT:DV:DR  ./1:.:CNV:0:.   ./1:.:CNV:0:.   ./1:.:CNV:0:.
chr1    144987033       0_DRAGEN:GAIN:chr1:144987034-144998774  N       <DUP>   .       PASS    SVTYPE=;END=144998774;REFLEN=11741;SVLEN=11741;STARTVARIANCE=0.000000;ENDVARIANCE=0.000000;AVG_LEN=11741.000000;AVG_START=144987033.
000000;AVG_END=144998774.000000;VARCALLS=1;ALLVARS_EXT=(DRAGEN:GAIN:chr1:144987034-144998774);SUPP_VEC_EXT=1;IDLIST_EXT=DRAGEN:GAIN:chr1:144987034-144998774;SUPP_EXT=1;STRANDS=??;SUPP_VEC=1;SUPP=1;SVMETHOD=JASMINE;IDLIST=DRAGEN:
GAIN:chr1:144987034-144998774;INTRASAMPLE_IDLIST=DRAGEN:GAIN:chr1:144987034-144998774   GT:IS:OT:DV:DR  ./1:.:CNV:0:.   ./1:.:CNV:0:.   ./1:.:CNV:0:.

Any ideas? Thank you in advance!

@mkirsche
Copy link
Owner

Hi and thanks for your interest in Jasmine!

Unfortunately it seems like the kind of merging you are interested in is slightly different than what Jasmine does. If I understand your issue correctly, you have three nearby duplicated regions on chr1: 144905844-144907253,144907254-144987033, and 144987034-144998774. And you want to merge those into a single duplicate call spanning 144905844-144998774, meaning that the three original calls represent three parts of the same true SV.

On the other hand, Jasmine is more specialized for merging across samples and identifying cases where the exact same SV is represented slightly differently in different samples. When the --allow_intrasample flag is used, the same logic is applied to calls in the same sample, allowing Jasmine to capture cases where the exact same SV is output multiple times; the most common case of this is when two distinct sets of reads support the same SV call but with slightly different breakpoints. In doing this, Jasmine assumes that each SV call in the input VCFs represents a complete structural variant and "removes" duplicates in the same sample by merging them all into a single call. And because of this assumption about the input it is not able to combine multiple "parts" of the same variant into a single call.

More specifically, the reason the duplication calls are not merging is that Jasmine's distance threshold is based on both breakpoints. So when considering the first two variants, Jasmine represents them as (144905844, 1409 [the length of the SV call, or end - start]) and (144907254, 79779). Since the Euclidean distance between these points exceeds the distance threshold 5000, they are not merged. While increasing the threshold significantly would allow them to be merged, I would not recommend doing so since even if they are merged, the coordinates will not be correctly combined to represent the bigger range, and such a lenient threshold could also lead to false merges elsewhere.

I hope that helps clear things up a bit - please let me know if you have any further questions!

Melanie

@mschatz
Copy link
Collaborator

mschatz commented Aug 30, 2021 via email

@kmurphy902
Copy link
Author

That makes perfect sense. I was hoping to be able to use jasmine on DRAGEN output--their joint genotyper splits events up like this which is less than ideal for my purposes! Thanks again for getting back to me so quickly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants