Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Added a "RANDOM" duplicate scoring strategy. #688
Conversation
|
@yfarjoun This is a trivial little addition. There don't appear to be standalone tests for the DuplicateScoringStrategy, and I'm not really up for writing comprehensive tests to enable my ~3 line addition. Can you take a look and let me know what you think? |
coveralls
commented
Aug 19, 2016
yfarjoun
and 1 other
commented on an outdated diff
Aug 19, 2016
| @@ -80,6 +81,8 @@ public static short computeDuplicateScore(final SAMRecord record, final ScoringS | ||
| score += SAMUtils.getMateCigar(record).getReferenceLength(); | ||
| } | ||
| break; | ||
| + case RANDOM: | ||
| + score += (short) Math.sqrt(record.getReadName().hashCode()); |
yfarjoun
Contributor
|
|
Requested change made @yfarjoun. Also since Murmur3 is more random, I switched to just right shifting 16 bits to get a short out, instead of sqrt'ing. |
coveralls
commented
Aug 19, 2016
yfarjoun
commented on the diff
Aug 19, 2016
| @@ -80,6 +86,8 @@ public static short computeDuplicateScore(final SAMRecord record, final ScoringS | ||
| score += SAMUtils.getMateCigar(record).getReferenceLength(); | ||
| } | ||
| break; | ||
| + case RANDOM: | ||
| + score += (short) (hasher.hashUnencodedChars(record.getReadName()) >> 16); |
yfarjoun
Contributor
|
yfarjoun
and 1 other
commented on an outdated diff
Aug 19, 2016
| } | ||
| + /** Hash used for the RANDOM scoring strategy. */ | ||
| + private static Murmur3 hasher = new Murmur3(1); |
tfenne
Owner
|
|
looks good. merge when tests pass. |
tfenne commentedAug 19, 2016
Description
When evaluating error rates in data with UMIs and contrasting different approaches to duplicate marking, it is useful to have an unbiased read picked as the representative for a duplicate set instead of some heuristic for "best".
Checklist