Fixed bug disallowing file-based index and seekable path based source #769

Merged
merged 3 commits into from Dec 6, 2016

Conversation

Projects
None yet
3 participants
Contributor

kcibul commented Dec 6, 2016 edited

Description

Added support for use case where index is a file, and the source stream is a seekable path. This is important to allow users to pull down an index locally while querying a BAM in GCS. Currently the performance of leaving the index in GCS in unacceptable.

Checklist

  • Code compiles correctly
  • New tests covering changes and new functionality
  • All tests passing
  • Extended the README / documentation, if necessary
  • Is not backward compatible (breaks binary or source compatibility)
@kcibul kcibul Added support for use case where index is a file, and the source stre…
…am is a seekable path (e.g. a GCS location)
e1bf61e

droazen self-assigned this Dec 6, 2016

@droazen

Review complete, back to @kcibul for (minor) changes, then we can merge.

+ sourceSeekable.seek(0);
+ primitiveSamReader = new BAMFileReader(
+ sourceSeekable, indexSeekable, false, asynchronousIO, validationStringency, this.samRecordFactory);
+ }
} else {
@droazen

droazen Dec 6, 2016

Contributor

I think that this entire nested if statement from line 319 up to line 336 can be simplified from this:

                            if (indexFile!=null || null == sourceSeekable || null == indexSeekable) {
                                if (null == sourceSeekable || null == indexSeekable) {
                                    // not seekable.
                                    // it's OK that we consumed a bit of the stream already, this ctor expects it.
                                    primitiveSamReader = new BAMFileReader(bufferedStream, indexFile, false, asynchronousIO, validationStringency, this.samRecordFactory);
                                } else {
                                    sourceSeekable.seek(0);
                                    primitiveSamReader = new BAMFileReader(
                                            sourceSeekable, indexSeekable, false, asynchronousIO, validationStringency, this.samRecordFactory);
                                }
                            } else {
                                // seekable.
                                // need to return to the beginning because it's the same stream we used earlier
                                // and read a bit from, and that form of the ctor expects the stream to start at 0.
                                sourceSeekable.seek(0);
                                primitiveSamReader = new BAMFileReader(
                                    sourceSeekable, indexSeekable, false, asynchronousIO, validationStringency, this.samRecordFactory);
                            }

to this:

                                if (null == sourceSeekable || null == indexSeekable) {
                                    // not seekable.
                                    // it's OK that we consumed a bit of the stream already, this ctor expects it.
                                    primitiveSamReader = new BAMFileReader(bufferedStream, indexFile, false, asynchronousIO, validationStringency, this.samRecordFactory);
                                } else {
                                    // seekable.
                                    // need to return to the beginning because it's the same stream we used earlier
                                    // and read a bit from, and that form of the ctor expects the stream to start at 0.
                                    sourceSeekable.seek(0);
                                    primitiveSamReader = new BAMFileReader(
                                            sourceSeekable, indexSeekable, false, asynchronousIO, validationStringency, this.samRecordFactory);
                                }

Since the two else clauses in the original code are identical.

In other words, what you seem to have done with this patch is just to say "if you have an index that's a file, you can now use the constructor that takes two seekable streams instead of the (non-seekable-stream, file) constructor."

@kcibul

kcibul Dec 6, 2016

Contributor

👍

@@ -284,7 +284,37 @@ public void queryInputResourcePermutation(final SamInputResource resource) throw
}
reader.close();
}
-
+
+ public class NeverFilePathInpurResource extends PathInputResource {
@droazen

droazen Dec 6, 2016

Contributor

This class should definitely be private static, and come with a comment explaining what it's for.

Also, typo: Inpur -> Input

+ }
+ }
+
+ @Test
@droazen

droazen Dec 6, 2016

Contributor

Can you confirm that these tests both failed before this patch?

@kcibul

kcibul Dec 6, 2016 edited

Contributor

The query test actually doesn't fail if it can't use the index, it just falls back to not using the index and emits a warning. That seemed strange (to log a warning in a test rather than fail) but I didn't want to change it's behavior as it predates me. So I added the second test instead to assert the index is available which combined with the first test ensure it's both available AND does the right thing in query

+ }
+
+ @Test
+ public void streamingPathBamWithFileIndex() throws IOException {
@droazen

droazen Dec 6, 2016

Contributor

Rename test case to something like checkHasIndexForStreamingPathBamWithFileIndex(), to make it clearer how it differs from the test case below it.

+ InputResource index = new FileInputResource(localBamIndex);
+
+ // ensure that the index is being used, not checked in queryInputResourcePermutation
+ final SamReader reader = SamReaderFactory.makeDefault().open(new SamInputResource(bam, index));
@droazen

droazen Dec 6, 2016

Contributor

Close your reader when done using a try-with-resources statement here.

kcibul was assigned by droazen Dec 6, 2016

Contributor

droazen commented Dec 6, 2016

@vdauwera This is the PR that needs to be merged before the next htsjdk release, if you want to monitor its progress.

@kcibul kcibul Addressed PR comments
751a11b
@kcibul

Changes made

+ * A path that pretends it's not based upon a file. This helps in cases where we want to test branches
+ * that apply to non-file based paths without actually having to use non-file based resources (like cloud urls)
+ */
+ public static class NeverFilePathInputResource extends PathInputResource {
@droazen

droazen Dec 6, 2016

Contributor

Still need to make this class private

@kcibul kcibul Addressed missed PR comment
1049ff2
@kcibul

doh -- sorry about that missed 'private'

Contributor

droazen commented Dec 6, 2016

Ok, looks good now -- I'll hit merge once tests pass, then we're going to do an htsjdk release.

kcibul was unassigned by droazen Dec 6, 2016

Coverage Status

Coverage increased (+0.02%) to 70.379% when pulling 751a11b on kcibul:master into a5ee55b on samtools:master.

@droazen

droazen approved these changes Dec 6, 2016

@droazen droazen merged commit 4d0070b into samtools:master Dec 6, 2016

1 of 2 checks passed

coverage/coveralls Coverage pending from Coveralls.io
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

Coverage Status

Coverage increased (+0.02%) to 70.382% when pulling e1bf61e on kcibul:master into a5ee55b on samtools:master.

Coverage Status

Coverage increased (+0.02%) to 70.379% when pulling 1049ff2 on kcibul:master into a5ee55b on samtools:master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment