-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How should overlapper handle N's? #86
Comments
I'll chime in on what it currently does (though I am not sure I agree with it). To start the alignment, the kmer profiling treats N's like its own base. So, kmers with N's in them are considered, however, there needs to be an exact string match between the two reads (N's included) to initiate the overlap. N's are currently treated on par with the other bases. So if a 'N' mismatches with a 'A' the algorithm will take the highest q score still (hopefully it is the A). Additionally, it will subtract the qscores just as if N was normal base. @msettles What are your thoughts? |
So I think that strategy is likely a good strategy. I’ve never been a fan of ‘N’ representing one character (like a ‘A’ as it does in some applications) or any character like it does others. But rather should be really represented as unknown, or none.
In the Kmer matching, I think it highly unlikely that an N will end up matching to another N, so the 99.9999% likely scenario is that it will never match a kmer from the other read, as I believe it should. So maybe as an extension on the current scheme, it might be best to not consider any kmers with Ns in them during alignment??
On the score, the algorithm was, when the two bases mismatch each other the base with the largest Qscore ‘wins’ and the quality becomes bestQ – worstQ, be definition N characters have a Q score of 0, so the other base (as long as its not N) will always win and the Q will be the same as its Q. If it happens to be N,N then N with Q0 will be result.
This does raise one issue I think, recall that when the bases match the algorithm uses the highest yes?
So a A30, N0 -> A30 and a A30, A29 -> A30 to me it seems like 2 As are better than 1A and an N?? They are both the ‘best case scenario for a score. So may a change is in order on Qs for matching, so maybe +1 (max 40) so a A30, A29 -> A31?
Matt
From: David Streett <notifications@github.com>
Reply-To: ibest/HTStream <reply@reply.github.com>
Date: Friday, July 28, 2017 at 7:15 PM
To: ibest/HTStream <HTStream@noreply.github.com>
Cc: Matt Settles <mattsettles@gmail.com>, Mention <mention@noreply.github.com>
Subject: Re: [ibest/HTStream] How should overlapper handle N's? (#86)
I'll chime in on what it currently does (though I am not sure I agree with it).
To start the alignment, the kmer profiling treats N's like its own base. So, kmers with N's in them are considered, however, there needs to be an exact string match between the two reads (N's included) to initiate the overlap.
N's are currently treated on par with the other bases. So if a 'N' mismatches with a 'A' the algorithm will take the highest q score still (hopefully it is the A). Additionally, it will subtract the qscores just as if N was normal base.
@msettles What are your thoughts?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Any movement on this? Can it be closed? |
I'm not sure, going to have to let David comment. I think we did end up having one difference from your version of the approach above. For matching bases, I think we are adding q-scores with a ceiling of 40. This improves qualities for low-quality ends that overlap and match. |
That part is ok, though not sure I still agree, more on the Q of Ns in
overlap? I think there seemed to be two options, treat as own character, or
ignore any Kmer with N, I would suspect both would produce very similar
results.
M
On Wed, Oct 18, 2017 at 8:00 PM Sam Hunter ***@***.***> wrote:
I'm not sure, going to have to let David comment.
I think we did end up having one difference from your version of the
approach above. For matching bases, I think we are adding q-scores with a
ceiling of 40. This improves qualities for low-quality ends that overlap
and match.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<https://github.com/ibest/HTStream/issues/86#issuecomment-337786048>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAno5imhyM47sKx98vTvCcDhsrFjlAbkks5strtIgaJpZM4OmD4e>
.
--
Sent from Gmail Mobile on my IPAD Mini 4
|
2+ years later going to go ahead and close |
Should N's match other N's?
How does overlapper deal with Ns?
Should overlapper subtract the quality value associated with an N, or just use the overlapped base + qual score?
The text was updated successfully, but these errors were encountered: