Added a "RANDOM" duplicate scoring strategy. #688

Merged
merged 1 commit into from Aug 19, 2016
Jump to file or symbol
Failed to load files and symbols.
+9 −1
Split
@@ -24,6 +24,8 @@
package htsjdk.samtools;
+import htsjdk.samtools.util.Murmur3;
+
/**
* This class helps us compute and compare duplicate scores, which are used for selecting the non-duplicate
* during duplicate marking (see MarkDuplicates).
@@ -33,9 +35,13 @@
public enum ScoringStrategy {
SUM_OF_BASE_QUALITIES,
- TOTAL_MAPPED_REFERENCE_LENGTH
+ TOTAL_MAPPED_REFERENCE_LENGTH,
+ RANDOM,
}
+ /** Hash used for the RANDOM scoring strategy. */
+ private static final Murmur3 hasher = new Murmur3(1);
+
/** An enum to use for storing temporary attributes on SAMRecords. */
private static enum Attr { DuplicateScore }
@@ -80,6 +86,8 @@ public static short computeDuplicateScore(final SAMRecord record, final ScoringS
score += SAMUtils.getMateCigar(record).getReferenceLength();
}
break;
+ case RANDOM:
+ score += (short) (hasher.hashUnencodedChars(record.getReadName()) >> 16);
@yfarjoun

yfarjoun Aug 19, 2016

Contributor

I'm curious if there's a reason you didn't opt to take the low order bits using & 0xFFFF ? Speed is not an issue here, I presume, but clarity?

@tfenne

tfenne Aug 19, 2016

Owner

No particular reason other than that >> 16 came to mind for me first.

@yfarjoun

yfarjoun Aug 19, 2016

Contributor

I think that there should be no difference in the randomness, due to how
murmur works (lots of >>>'s, which mixes up low and high order bits), so
I'm happy.

On Fri, Aug 19, 2016 at 2:54 PM, Tim Fennell notifications@github.com
wrote:

In src/main/java/htsjdk/samtools/DuplicateScoringStrategy.java
#688 (comment):

@@ -80,6 +86,8 @@ public static short computeDuplicateScore(final SAMRecord record, final ScoringS
score += SAMUtils.getMateCigar(record).getReferenceLength();
}
break;

  •            case RANDOM:
    
  •                score += (short) (hasher.hashUnencodedChars(record.getReadName()) >> 16);
    

No particular reason other than that >> 16 came to mind for me first.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/samtools/htsjdk/pull/688/files/1a5616268a3d8f0a69a734a7ab26a7ca3ef4f0b2#r75534218,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACnk0lD5n5D0lZ7vnJ2e8oNbksmwEBuhks5qhfvWgaJpZM4Joo0U
.

}
storedScore = score;