-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge not behaving as expected for DUP #23
Comments
Hi and thanks for your interest in Jasmine! Unfortunately it seems like the kind of merging you are interested in is slightly different than what Jasmine does. If I understand your issue correctly, you have three nearby duplicated regions on chr1: 144905844-144907253,144907254-144987033, and 144987034-144998774. And you want to merge those into a single duplicate call spanning 144905844-144998774, meaning that the three original calls represent three parts of the same true SV. On the other hand, Jasmine is more specialized for merging across samples and identifying cases where the exact same SV is represented slightly differently in different samples. When the --allow_intrasample flag is used, the same logic is applied to calls in the same sample, allowing Jasmine to capture cases where the exact same SV is output multiple times; the most common case of this is when two distinct sets of reads support the same SV call but with slightly different breakpoints. In doing this, Jasmine assumes that each SV call in the input VCFs represents a complete structural variant and "removes" duplicates in the same sample by merging them all into a single call. And because of this assumption about the input it is not able to combine multiple "parts" of the same variant into a single call. More specifically, the reason the duplication calls are not merging is that Jasmine's distance threshold is based on both breakpoints. So when considering the first two variants, Jasmine represents them as (144905844, 1409 [the length of the SV call, or end - start]) and (144907254, 79779). Since the Euclidean distance between these points exceeds the distance threshold 5000, they are not merged. While increasing the threshold significantly would allow them to be merged, I would not recommend doing so since even if they are merged, the coordinates will not be correctly combined to represent the bigger range, and such a lenient threshold could also lead to false merges elsewhere. I hope that helps clear things up a bit - please let me know if you have any further questions! Melanie |
Thanks Melanie! Karyn, let us know if you have any questions. We would also
be open to discussion if there are other features that you need.
Cheers
Mike
…On Mon, Aug 30, 2021 at 5:45 PM Melanie Kirsche ***@***.***> wrote:
Hi and thanks for your interest in Jasmine!
Unfortunately it seems like the kind of merging you are interested in is
slightly different than what Jasmine does. If I understand your issue
correctly, you have three nearby duplicated regions on chr1:
144905844-144907253,144907254-144987033, and 144987034-144998774. And you
want to merge those into a single duplicate call spanning
144905844-144998774, meaning that the three original calls represent three
parts of the same true SV.
On the other hand, Jasmine is more specialized for merging across samples
and identifying cases where the exact same SV is represented slightly
differently in different samples. When the --allow_intrasample flag is
used, the same logic is applied to calls in the same sample, allowing
Jasmine to capture cases where the exact same SV is output multiple times;
the most common case of this is when two distinct sets of reads support the
same SV call but with slightly different breakpoints. In doing this,
Jasmine assumes that each SV call in the input VCFs represents a complete
structural variant and "removes" duplicates in the same sample by merging
them all into a single call. And because of this assumption about the input
it is not able to combine multiple "parts" of the same variant into a
single call.
More specifically, the reason the duplication calls are not merging is
that Jasmine's distance threshold is based on both breakpoints. So when
considering the first two variants, Jasmine represents them as (144905844,
1409 [the length of the SV call, or end - start]) and (144907254, 79779).
Since the Euclidean distance between these points exceeds the distance
threshold 5000, they are not merged. While increasing the threshold
significantly would allow them to be merged, I would not recommend doing so
since even if they are merged, the coordinates will not be correctly
combined to represent the bigger range, and such a lenient threshold could
also lead to false merges elsewhere.
I hope that helps clear things up a bit - please let me know if you have
any further questions!
Melanie
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#23 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABP346MWYAPUYTZXXIUIOTT7P3V3ANCNFSM5DCSPI7A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
That makes perfect sense. I was hoping to be able to use jasmine on DRAGEN output--their joint genotyper splits events up like this which is less than ideal for my purposes! Thanks again for getting back to me so quickly |
Hi Melanie,
I have the following sample input from the GIAB trio processed using DRAGEN:
I installed jasmine today using conda, and I am using the following command:
jasmine file_list=GIAB.cnv_filtered_noLowDQ.vcf --normalize_type --allow_intrasample --output_genotypes --comma_filelist --nonlinear_dist --max_dist=5000 --ignore_strand out_file=jasmine_GIAB.cnv_filtered_noLowDQ.vcf
Merging is occurring (I am able to see events that were successfully merged in the output) but the only events that are merged are DEL. For example:
All of my DUPs that should be merged are not being merged in the output, for example:
Any ideas? Thank you in advance!
The text was updated successfully, but these errors were encountered: