Adds option to deep copy attributes in ReverseComplement #717

Merged
merged 6 commits into from Oct 27, 2016

Conversation

Projects
None yet
5 participants
Contributor

meganshand commented Sep 30, 2016

Description

Adds a "safe" option to reverseComplement that will clone all tags as well as bases and qualities instead of reverse complementing (or reversing) them in place. The default remains "unsafe" so clones will not be made unless they are requested.

The motivation is from broadinstitute/picard#663 in order to avoid having to deep copy every secondary and supplementary SAMRecord in MergeBamAlignment. We need a deep copy of the attributes any time there is a single read with two alignments on different strands.

Checklist

  • Code compiles correctly
  • New tests covering changes and new functionality
  • All tests passing
  • Extended the README / documentation, if necessary
  • Is not backward compatible (breaks binary or source compatibility)

meganshand added some commits Sep 29, 2016

@meganshand meganshand Includes broken test for reverseComplementing without a deep copy aa04ef7
@meganshand meganshand Fixes test
88573c7

Coverage Status

Coverage increased (+0.01%) to 69.053% when pulling 88573c7 on meganshand:ms_cloneSAMRecord into 88b6719 on samtools:master.

@nh13

I think this makes sense, though I think safe vs. unsafe is a bit confusing, so I recommend in-place versus copy.

@@ -2112,7 +2112,8 @@ private String buildMessage(final String baseMessage, final boolean isMate) {
/**
* Note that this does a shallow copy of everything, except for the attribute list, for which a copy of the list
* is made, but the attributes themselves are copied by reference. This should be safe because callers should
- * never modify a mutable value returned by any of the get() methods anyway.
+ * never modify a mutable value returned by any of the get() methods anyway. If one of the cloned records SEQ or
@nh13

nh13 Sep 30, 2016 edited

Contributor

records -> record's

@@ -2112,7 +2112,8 @@ private String buildMessage(final String baseMessage, final boolean isMate) {
/**
* Note that this does a shallow copy of everything, except for the attribute list, for which a copy of the list
* is made, but the attributes themselves are copied by reference. This should be safe because callers should
- * never modify a mutable value returned by any of the get() methods anyway.
+ * never modify a mutable value returned by any of the get() methods anyway. If one of the cloned records SEQ or
+ * QUAL needs to be modified, a deeper copy should be made, or other precautions taken (Eg. Reverse Complement).
@nh13

nh13 Sep 30, 2016

Contributor

Eg -> e.g.

@nh13

nh13 Sep 30, 2016

Contributor

Remove "other precautions should be taken" since it is open ended.

@@ -39,23 +39,38 @@
/**
* Reverse-complement bases and reverse quality scores along with known optional attributes that
- * need the same treatment. See {@link #TAGS_TO_REVERSE_COMPLEMENT} {@link #TAGS_TO_REVERSE}
+ * need the same treatment. This does not make a deep copy of the bases, qualities or attributes:
@nh13

nh13 Sep 30, 2016

Contributor

comma after qualities.

@nh13

nh13 Sep 30, 2016

Contributor

Rather than talking about "deep copy", perhaps talk about "in-place" versus "makes a copy of".

@@ -39,23 +39,38 @@
/**
* Reverse-complement bases and reverse quality scores along with known optional attributes that
- * need the same treatment. See {@link #TAGS_TO_REVERSE_COMPLEMENT} {@link #TAGS_TO_REVERSE}
+ * need the same treatment. This does not make a deep copy of the bases, qualities or attributes:
+ * for safe reverseComplement see {@link #reverseComplement(SAMRecord, boolean)}.
@nh13

nh13 Sep 30, 2016

Contributor

I think in-place versus copy is better understood here than "safe".

+ * Reverse-complement bases and reverse quality scores along with known optional attributes that
+ * need the same treatment. Optionally makes a deep copy of the bases, qualities or attributes.
+ * See {@link #TAGS_TO_REVERSE_COMPLEMENT} {@link #TAGS_TO_REVERSE}
+ * for the default set of tags that are handled.
@nh13

nh13 Sep 30, 2016

Contributor

ditto for my comments above.

+ * @param rec Record to reverse complement.
+ * @param unsafe Setting this to false will clone all attributes, bases and qualities before changing the values.
+ */
+ public static void reverseComplement(final SAMRecord rec, boolean unsafe) {
@nh13

nh13 Sep 30, 2016

Contributor

unsafe -> inplace.

}
/**
* Reverse complement bases and reverse quality scores. In addition reverse complement any
* non-null attributes specified by tagsToRevcomp and reverse and non-null attributes
* specified by tagsToReverse.
*/
- public static void reverseComplement(final SAMRecord rec, final Collection<String> tagsToRevcomp, final Collection<String> tagsToReverse) {
- final byte[] readBases = rec.getReadBases();
+ public static void reverseComplement(final SAMRecord rec, final Collection<String> tagsToRevcomp, final Collection<String> tagsToReverse, boolean unsafe) {
@nh13

nh13 Sep 30, 2016

Contributor

unsafe -> inplace.

+ SequenceUtil.reverseComplement((byte[]) value);
+ }
+ else if (value instanceof String) {
+ value = SequenceUtil.reverseComplement((String) value);
@nh13

nh13 Sep 30, 2016

Contributor

perhaps a comment here for the reader that SequenceUtil.reverseComplement is in-place with bytes and copies with String.

+ @Test public void testSafeReverseComplement() throws CloneNotSupportedException {
+ final SAMRecord original = createTestSamRec();
+ final SAMRecord cloneOfOriginal = (SAMRecord) original.clone();
+ //Runs a safe reverseComplement
@nh13

nh13 Sep 30, 2016

Contributor

copy vs in-place.

+ }
+
+
+ public SAMRecord createTestSamRec() {
@nh13

nh13 Sep 30, 2016

Contributor

extra newline

}
+
@nh13

nh13 Sep 30, 2016

Contributor

extra newline

Contributor

cmnbroad commented Sep 30, 2016

My 2c: I think we should err on the side of making apis like this be safe-by-default. Since reverseComplement is a SAMRecord mutator anyway, it could be moved to SAMRecord as a method with reversed polarity so it defaults to safe, with an overload with an opt-in arg for "in place". The existing method could then delegate to SAMRecord (and preferably be marked as deprecated - after which the entire class could be removed.)

Contributor

nh13 commented Sep 30, 2016

@cmnbroad it's unsafe for performance reasons, and I would be concerned about such a change impacting downstream tools which require performance. It's a good philosophy in general, but may be beyond the scope of this PR.

Contributor

cmnbroad commented Sep 30, 2016

@nh13 Agreed - I was suggesting that we keep the existing behavior as is, but deprecate the method and change the implementation to call into the new (SAMRecord method) that implements in-place as opt-in (and have the existing method opt-in to in-place to maintain the current behavior). Its a pretty small incremental changed from the current PR (just moving the code). Anyway, just a suggestion that I think cleans up the api nicely.

meganshand added some commits Oct 4, 2016

@meganshand meganshand Addressing most comments 5c3b01e
@meganshand meganshand Refactor reverseComplement to SAMRecord and make default copy attributes
192bb5d

Coverage Status

Coverage decreased (-0.005%) to 69.035% when pulling 192bb5d on meganshand:ms_cloneSAMRecord into 88b6719 on samtools:master.

Contributor

meganshand commented Oct 5, 2016

@nh13 I addressed your comments in the first commit and refactored according to @cmnbroad's suggestion in the second commit.

+ /**
+ * Reverse-complement bases and reverse quality scores along with known optional attributes that
+ * need the same treatment. Changes made after making a copy of the bases, qualities,
+ * and attributes. If in-place update is needed use {@link #reverseComplement(SAMRecord, boolean)}.
@yfarjoun

yfarjoun Oct 7, 2016

Contributor

make sure it's clear that only some of the attributes are copied

+ if (value instanceof byte[]) {
+ value = inplace ? value : ((byte[]) value).clone();
+ SequenceUtil.reverseComplement((byte[]) value);
+ }
@yfarjoun

yfarjoun Oct 7, 2016

Contributor

} else if {

here and below

+ //SequenceUtil.reverseComplement is in-place for bytes but copies Strings since they are immutable.
+ value = SequenceUtil.reverseComplement((String) value);
+ }
+ else throw new UnsupportedOperationException("Don't know how to reverse complement: " + value);
@yfarjoun

yfarjoun Oct 7, 2016

Contributor

need { }

@yfarjoun

yfarjoun Oct 20, 2016

Contributor

still needs {} for else.

@meganshand meganshand Addressing new comments
38545d6

Coverage Status

Coverage decreased (-0.005%) to 69.035% when pulling 38545d6 on meganshand:ms_cloneSAMRecord into 88b6719 on samtools:master.

@@ -24,7 +24,7 @@
@Test public void testSafeReverseComplement() throws CloneNotSupportedException {
final SAMRecord original = createTestSamRec();
final SAMRecord cloneOfOriginal = (SAMRecord) original.clone();
- //Runs a safe reverseComplement
+ //Runs a copy (rather than in-place) reverseComplement
SAMRecordUtil.reverseComplement(cloneOfOriginal, Arrays.asList("Y1"), Arrays.asList("X1", "X2", "X3", "X4", "X5"), false);
Assert.assertEquals(original.getReadString(), "ACACACACAC");
@yfarjoun

yfarjoun Oct 20, 2016 edited

Contributor

could you expand this test using a data-provider to show that if false->true then the assertion fails?

+ * See {@link #TAGS_TO_REVERSE_COMPLEMENT} {@link #TAGS_TO_REVERSE}
+ * for the default set of tags that are handled.
+ */
+ public static void reverseComplement(final SAMRecord rec) {
@yfarjoun

yfarjoun Oct 20, 2016

Contributor

given that this is already in SAMRecord, shuoldn't it be non-static and take no arguments?

+ * @param rec Record to reverse complement.
+ * @param inplace Setting this to false will clone all attributes, bases and qualities before changing the values.
+ */
+ public static void reverseComplement(final SAMRecord rec, boolean inplace) {
@yfarjoun

yfarjoun Oct 20, 2016

Contributor

similarly here, shouldn't it be non-static and only take inplace as an argument?

+
+ /**
+ * Reverse-complement bases and reverse quality scores along with known optional attributes that
+ * need the same treatment. Optionally makes a copy of the bases, qualities or attributes instead
@yfarjoun

yfarjoun Oct 20, 2016

Contributor

"Optionally"? I don't see a way to disable it in this invokation.

@yfarjoun

yfarjoun Oct 20, 2016

Contributor

scratch that. I misunderstood that comment.

@@ -33,97 +33,41 @@
/**
* @author alecw@broadinstitute.org
*/
+@Deprecated
@yfarjoun

yfarjoun Oct 20, 2016

Contributor

in a comment, point users to what they should be using.

+
+ return(rec);
+ }
+
@yfarjoun

yfarjoun Oct 20, 2016

Contributor

extra nl

Contributor

yfarjoun commented Oct 20, 2016

back to you @meganshand. this looks nice, just a few comments and then I think we are good to go. Given that @nh13 looked at it first, it would be good if he could sign-off on it as well...

@meganshand meganshand latest comments
75d0c9e
Contributor

meganshand commented Oct 21, 2016

@yfarjoun @nh13 I think I've caught all of your comments.

Coverage Status

Coverage decreased (-0.005%) to 69.035% when pulling 75d0c9e on meganshand:ms_cloneSAMRecord into 88b6719 on samtools:master.

Contributor

yfarjoun commented Oct 21, 2016

I think this is fine. 👍 from me. I'll take @nh13's "this makes sense" to be a tentative thumbs up (conditional on all the work you did afterwards). but to be sure, let's give him till end of next week to respond, and if we don't hear from him till then, merge.

Contributor

yfarjoun commented Oct 27, 2016

Let's move forward with this PR @meganshand as you have resolved the issues that @nh13 raised. and he has been silent for a while.

@yfarjoun yfarjoun merged commit 4031fac into samtools:master Oct 27, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.005%) to 69.035%
Details
Contributor

yfarjoun commented Oct 27, 2016

done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment