Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Fixed bug disallowing file-based index and seekable path based source #769
Conversation
droazen
self-assigned this
Dec 6, 2016
droazen
requested changes
Dec 6, 2016
Review complete, back to @kcibul for (minor) changes, then we can merge.
| + sourceSeekable.seek(0); | ||
| + primitiveSamReader = new BAMFileReader( | ||
| + sourceSeekable, indexSeekable, false, asynchronousIO, validationStringency, this.samRecordFactory); | ||
| + } | ||
| } else { |
droazen
Dec 6, 2016
Contributor
I think that this entire nested if statement from line 319 up to line 336 can be simplified from this:
if (indexFile!=null || null == sourceSeekable || null == indexSeekable) {
if (null == sourceSeekable || null == indexSeekable) {
// not seekable.
// it's OK that we consumed a bit of the stream already, this ctor expects it.
primitiveSamReader = new BAMFileReader(bufferedStream, indexFile, false, asynchronousIO, validationStringency, this.samRecordFactory);
} else {
sourceSeekable.seek(0);
primitiveSamReader = new BAMFileReader(
sourceSeekable, indexSeekable, false, asynchronousIO, validationStringency, this.samRecordFactory);
}
} else {
// seekable.
// need to return to the beginning because it's the same stream we used earlier
// and read a bit from, and that form of the ctor expects the stream to start at 0.
sourceSeekable.seek(0);
primitiveSamReader = new BAMFileReader(
sourceSeekable, indexSeekable, false, asynchronousIO, validationStringency, this.samRecordFactory);
}
to this:
if (null == sourceSeekable || null == indexSeekable) {
// not seekable.
// it's OK that we consumed a bit of the stream already, this ctor expects it.
primitiveSamReader = new BAMFileReader(bufferedStream, indexFile, false, asynchronousIO, validationStringency, this.samRecordFactory);
} else {
// seekable.
// need to return to the beginning because it's the same stream we used earlier
// and read a bit from, and that form of the ctor expects the stream to start at 0.
sourceSeekable.seek(0);
primitiveSamReader = new BAMFileReader(
sourceSeekable, indexSeekable, false, asynchronousIO, validationStringency, this.samRecordFactory);
}
Since the two else clauses in the original code are identical.
In other words, what you seem to have done with this patch is just to say "if you have an index that's a file, you can now use the constructor that takes two seekable streams instead of the (non-seekable-stream, file) constructor."
| @@ -284,7 +284,37 @@ public void queryInputResourcePermutation(final SamInputResource resource) throw | ||
| } | ||
| reader.close(); | ||
| } | ||
| - | ||
| + | ||
| + public class NeverFilePathInpurResource extends PathInputResource { |
droazen
Dec 6, 2016
Contributor
This class should definitely be private static, and come with a comment explaining what it's for.
Also, typo: Inpur -> Input
| + } | ||
| + } | ||
| + | ||
| + @Test |
kcibul
Dec 6, 2016
•
Contributor
The query test actually doesn't fail if it can't use the index, it just falls back to not using the index and emits a warning. That seemed strange (to log a warning in a test rather than fail) but I didn't want to change it's behavior as it predates me. So I added the second test instead to assert the index is available which combined with the first test ensure it's both available AND does the right thing in query
| + } | ||
| + | ||
| + @Test | ||
| + public void streamingPathBamWithFileIndex() throws IOException { |
droazen
Dec 6, 2016
Contributor
Rename test case to something like checkHasIndexForStreamingPathBamWithFileIndex(), to make it clearer how it differs from the test case below it.
| + InputResource index = new FileInputResource(localBamIndex); | ||
| + | ||
| + // ensure that the index is being used, not checked in queryInputResourcePermutation | ||
| + final SamReader reader = SamReaderFactory.makeDefault().open(new SamInputResource(bam, index)); |
droazen
Dec 6, 2016
Contributor
Close your reader when done using a try-with-resources statement here.
kcibul
was assigned
by droazen
Dec 6, 2016
|
@vdauwera This is the PR that needs to be merged before the next htsjdk release, if you want to monitor its progress. |
| + * A path that pretends it's not based upon a file. This helps in cases where we want to test branches | ||
| + * that apply to non-file based paths without actually having to use non-file based resources (like cloud urls) | ||
| + */ | ||
| + public static class NeverFilePathInputResource extends PathInputResource { |
|
Ok, looks good now -- I'll hit merge once tests pass, then we're going to do an htsjdk release. |
kcibul commentedDec 6, 2016
•
edited
Description
Added support for use case where index is a file, and the source stream is a seekable path. This is important to allow users to pull down an index locally while querying a BAM in GCS. Currently the performance of leaving the index in GCS in unacceptable.
Checklist