Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Cluster junctions by candidate selection based on voting #56

Open
Irallia opened this issue Feb 10, 2021 · 0 comments
Open

[FEATURE] Cluster junctions by candidate selection based on voting #56

Irallia opened this issue Feb 10, 2021 · 0 comments

Comments

@Irallia
Copy link
Collaborator

Irallia commented Feb 10, 2021

This clustering method is used in Vaquita:

"2.2 Candidate merging: SE + PE
Two breakpoints with the same orientation can be merged if both the left and right intervals are adjacent or overlapping. A distance of 50 bases is set by default in assessing adjacency. When two breakpoints are merged, the minimum and maximum positions of each left and right intervals are selected to define the merged breakpoint. The original positions are kept in a list, and the median positions are reported as final positions in the last step. We merge all the breakpoints identified by SE [split-read evidence ] or PE [read-pair evidence] according to this principle. For efficiency, the reference genome is divided into equally sized regions that are 1000 bp by default. The left and right intervals of SVs belong to one or more regions according to their size and genomic coordinates. The entire merging process can be efficiently done by identifying breakpoints in the same region.
[...]
2.5.2 Voting based metric for candidate selection
[...] Instead of using a simple sum of signals from different types of evidence, Vaquita provides an additional metric for candidate selection based on voting. In this scheme, each type of evidence for a breakpoint is checked by a relatively lenient cutoff, and then we calculate the number of evidence types that pass the criteria that we denote as VT. For example, a structural variation with VT = 3 is supported by three evidence types."

Source: Kim, Jongkyu and Reinert, Knut (2017) Vaquita: Fast and Accurate Identification of Structural Variation Using Combined Evidence. In: 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). LIPICS (88). Dagstuhl LIPIcs, Saarbrücken/Wadern, 185(13:1)-198(13:14). ISBN 978-3-95977-050-7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant