Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elimination of false positive seed mappings and detection of indels #12

Closed
timmocking opened this issue Aug 31, 2020 · 2 comments
Closed

Comments

@timmocking
Copy link

Hi! I found the following in your preprint:

"Due to the noisy nature of nanopore sequencing, UNCALLED must use very loose thresholds for event/k-mer matches, which produce many false positive seed mappings. We eliminate these false positives under the observation that they will usually map to random locations, while true positives will map to locations consistent with their position on the read."

How will this affect the use of UNCALLED for the detection of indels and other structural variants, considering that these mappings are inconsistent with their position on the read?

@skovaka
Copy link
Owner

skovaka commented Aug 31, 2020

We haven't done extensive testing on how UNCALLED performs in the presence of variants, but I do not think small indels would reduce the mapping rate very much. The algorithm is fairly permissive of slight shifts in read/reference positions because the signal is so noisy. Even if one seed (~10-12bp) fails to map because of an indel, seeds after the indel would map correctly and would probably be clustered with TP seeds before the indel.

Larger structural variants may be more of an issue, and we are looking into this. Some SVs might be impossible to fully handle with a standard linear reference, such as large insertions or translocation. In these cases you can rely on the flanking sequence to map correctly, which would hopefully provide enough coverage over the SV, depending on the SV size.

On a side note, one thing you should definitely NOT do currently is include multiple copies of the same locus with different variants. This may introduce many exact repeats to the reference, which would disrupt the indexing process. Again, we're currently working on improving this and hope to have an update focusing on variants soon.

@timmocking
Copy link
Author

Thanks! I will keep that in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants