Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Adds option to deep copy attributes in ReverseComplement #717
Conversation
meganshand
added some commits
Sep 29, 2016
coveralls
commented
Sep 30, 2016
nh13
requested changes
Sep 30, 2016
I think this makes sense, though I think safe vs. unsafe is a bit confusing, so I recommend in-place versus copy.
| @@ -2112,7 +2112,8 @@ private String buildMessage(final String baseMessage, final boolean isMate) { | ||
| /** | ||
| * Note that this does a shallow copy of everything, except for the attribute list, for which a copy of the list | ||
| * is made, but the attributes themselves are copied by reference. This should be safe because callers should | ||
| - * never modify a mutable value returned by any of the get() methods anyway. | ||
| + * never modify a mutable value returned by any of the get() methods anyway. If one of the cloned records SEQ or |
| @@ -2112,7 +2112,8 @@ private String buildMessage(final String baseMessage, final boolean isMate) { | ||
| /** | ||
| * Note that this does a shallow copy of everything, except for the attribute list, for which a copy of the list | ||
| * is made, but the attributes themselves are copied by reference. This should be safe because callers should | ||
| - * never modify a mutable value returned by any of the get() methods anyway. | ||
| + * never modify a mutable value returned by any of the get() methods anyway. If one of the cloned records SEQ or | ||
| + * QUAL needs to be modified, a deeper copy should be made, or other precautions taken (Eg. Reverse Complement). |
| @@ -39,23 +39,38 @@ | ||
| /** | ||
| * Reverse-complement bases and reverse quality scores along with known optional attributes that | ||
| - * need the same treatment. See {@link #TAGS_TO_REVERSE_COMPLEMENT} {@link #TAGS_TO_REVERSE} | ||
| + * need the same treatment. This does not make a deep copy of the bases, qualities or attributes: |
nh13
Sep 30, 2016
Contributor
Rather than talking about "deep copy", perhaps talk about "in-place" versus "makes a copy of".
| @@ -39,23 +39,38 @@ | ||
| /** | ||
| * Reverse-complement bases and reverse quality scores along with known optional attributes that | ||
| - * need the same treatment. See {@link #TAGS_TO_REVERSE_COMPLEMENT} {@link #TAGS_TO_REVERSE} | ||
| + * need the same treatment. This does not make a deep copy of the bases, qualities or attributes: | ||
| + * for safe reverseComplement see {@link #reverseComplement(SAMRecord, boolean)}. |
| + * Reverse-complement bases and reverse quality scores along with known optional attributes that | ||
| + * need the same treatment. Optionally makes a deep copy of the bases, qualities or attributes. | ||
| + * See {@link #TAGS_TO_REVERSE_COMPLEMENT} {@link #TAGS_TO_REVERSE} | ||
| + * for the default set of tags that are handled. |
| + * @param rec Record to reverse complement. | ||
| + * @param unsafe Setting this to false will clone all attributes, bases and qualities before changing the values. | ||
| + */ | ||
| + public static void reverseComplement(final SAMRecord rec, boolean unsafe) { |
| } | ||
| /** | ||
| * Reverse complement bases and reverse quality scores. In addition reverse complement any | ||
| * non-null attributes specified by tagsToRevcomp and reverse and non-null attributes | ||
| * specified by tagsToReverse. | ||
| */ | ||
| - public static void reverseComplement(final SAMRecord rec, final Collection<String> tagsToRevcomp, final Collection<String> tagsToReverse) { | ||
| - final byte[] readBases = rec.getReadBases(); | ||
| + public static void reverseComplement(final SAMRecord rec, final Collection<String> tagsToRevcomp, final Collection<String> tagsToReverse, boolean unsafe) { |
| + SequenceUtil.reverseComplement((byte[]) value); | ||
| + } | ||
| + else if (value instanceof String) { | ||
| + value = SequenceUtil.reverseComplement((String) value); |
nh13
Sep 30, 2016
Contributor
perhaps a comment here for the reader that SequenceUtil.reverseComplement is in-place with bytes and copies with String.
| + @Test public void testSafeReverseComplement() throws CloneNotSupportedException { | ||
| + final SAMRecord original = createTestSamRec(); | ||
| + final SAMRecord cloneOfOriginal = (SAMRecord) original.clone(); | ||
| + //Runs a safe reverseComplement |
| + } | ||
| + | ||
| + | ||
| + public SAMRecord createTestSamRec() { |
| } | ||
| + |
|
My 2c: I think we should err on the side of making apis like this be safe-by-default. Since reverseComplement is a SAMRecord mutator anyway, it could be moved to SAMRecord as a method with reversed polarity so it defaults to safe, with an overload with an opt-in arg for "in place". The existing method could then delegate to SAMRecord (and preferably be marked as deprecated - after which the entire class could be removed.) |
|
@cmnbroad it's unsafe for performance reasons, and I would be concerned about such a change impacting downstream tools which require performance. It's a good philosophy in general, but may be beyond the scope of this PR. |
|
@nh13 Agreed - I was suggesting that we keep the existing behavior as is, but deprecate the method and change the implementation to call into the new (SAMRecord method) that implements in-place as opt-in (and have the existing method opt-in to in-place to maintain the current behavior). Its a pretty small incremental changed from the current PR (just moving the code). Anyway, just a suggestion that I think cleans up the api nicely. |
meganshand
added some commits
Oct 4, 2016
coveralls
commented
Oct 4, 2016
| + /** | ||
| + * Reverse-complement bases and reverse quality scores along with known optional attributes that | ||
| + * need the same treatment. Changes made after making a copy of the bases, qualities, | ||
| + * and attributes. If in-place update is needed use {@link #reverseComplement(SAMRecord, boolean)}. |
| + if (value instanceof byte[]) { | ||
| + value = inplace ? value : ((byte[]) value).clone(); | ||
| + SequenceUtil.reverseComplement((byte[]) value); | ||
| + } |
| + //SequenceUtil.reverseComplement is in-place for bytes but copies Strings since they are immutable. | ||
| + value = SequenceUtil.reverseComplement((String) value); | ||
| + } | ||
| + else throw new UnsupportedOperationException("Don't know how to reverse complement: " + value); |
coveralls
commented
Oct 11, 2016
| @@ -24,7 +24,7 @@ | ||
| @Test public void testSafeReverseComplement() throws CloneNotSupportedException { | ||
| final SAMRecord original = createTestSamRec(); | ||
| final SAMRecord cloneOfOriginal = (SAMRecord) original.clone(); | ||
| - //Runs a safe reverseComplement | ||
| + //Runs a copy (rather than in-place) reverseComplement | ||
| SAMRecordUtil.reverseComplement(cloneOfOriginal, Arrays.asList("Y1"), Arrays.asList("X1", "X2", "X3", "X4", "X5"), false); | ||
| Assert.assertEquals(original.getReadString(), "ACACACACAC"); |
yfarjoun
Oct 20, 2016
•
Contributor
could you expand this test using a data-provider to show that if false->true then the assertion fails?
| + * See {@link #TAGS_TO_REVERSE_COMPLEMENT} {@link #TAGS_TO_REVERSE} | ||
| + * for the default set of tags that are handled. | ||
| + */ | ||
| + public static void reverseComplement(final SAMRecord rec) { |
yfarjoun
Oct 20, 2016
Contributor
given that this is already in SAMRecord, shuoldn't it be non-static and take no arguments?
| + * @param rec Record to reverse complement. | ||
| + * @param inplace Setting this to false will clone all attributes, bases and qualities before changing the values. | ||
| + */ | ||
| + public static void reverseComplement(final SAMRecord rec, boolean inplace) { |
yfarjoun
Oct 20, 2016
Contributor
similarly here, shouldn't it be non-static and only take inplace as an argument?
| + | ||
| + /** | ||
| + * Reverse-complement bases and reverse quality scores along with known optional attributes that | ||
| + * need the same treatment. Optionally makes a copy of the bases, qualities or attributes instead |
| @@ -33,97 +33,41 @@ | ||
| /** | ||
| * @author alecw@broadinstitute.org | ||
| */ | ||
| +@Deprecated |
| + | ||
| + return(rec); | ||
| + } | ||
| + |
|
back to you @meganshand. this looks nice, just a few comments and then I think we are good to go. Given that @nh13 looked at it first, it would be good if he could sign-off on it as well... |
coveralls
commented
Oct 21, 2016
|
I think this is fine. |
|
Let's move forward with this PR @meganshand as you have resolved the issues that @nh13 raised. and he has been silent for a while. |
yfarjoun
merged commit 4031fac
into
samtools:master
Oct 27, 2016
|
done. |
meganshand commentedSep 30, 2016
Description
Adds a "safe" option to reverseComplement that will clone all tags as well as bases and qualities instead of reverse complementing (or reversing) them in place. The default remains "unsafe" so clones will not be made unless they are requested.
The motivation is from broadinstitute/picard#663 in order to avoid having to deep copy every secondary and supplementary SAMRecord in MergeBamAlignment. We need a deep copy of the attributes any time there is a single read with two alignments on different strands.
Checklist