Path index #762

Merged
merged 4 commits into from Dec 5, 2016

Conversation

Projects
None yet
3 participants
Contributor

jean-philippe-martin commented Nov 29, 2016

Description

Fix a bug that was preventing inputs from being specified as Paths when they are not also Files
(issue #749).

Checklist

  • Code compiles correctly
  • New tests covering changes and new functionality
    (I wrote one, it passed, but then I had to take it out to avoid adding a dependency to htsjdk)
  • All tests passing
    except for AbstractFeatureReaderTest, testFTPOpenInputStream, sourceLikeBam, and SeekableFTPStreamTest.setUp showing a (presumably normal): Could not connect to ftp.broadinstitute.org, for input source: ftp://ftp.broadinstitute.org/distribution/igv/TEST/cpgIslands with spaces.hg18.bed
  • Extended the README / documentation, if necessary (not necessary for a bugfix)
  • Is backward compatible

jean-philippe-martin added some commits Nov 29, 2016

@jean-philippe-martin jean-philippe-martin Support index as Path
Includes passing test that accesses an indexed BAM from Google Cloud
Storage. Sample output:

[TestNG] Running:
  /usr/local/google/home/jpmartin/.IdeaIC2016.2/system/temp-testng-customsuite.xml
Count is: 916

===============================================
Default Suite
Total tests run: 1, Failures: 0, Skips: 0
===============================================

Process finished with exit code 0
179b5e1
@jean-philippe-martin jean-philippe-martin Remove temp test so I can remove its dependency 99353cc
@jean-philippe-martin jean-philippe-martin Bugfix handing of input type and null combinations
d9d25a9

droazen self-assigned this Nov 29, 2016

Contributor

droazen commented Nov 29, 2016

Tests failed -- re-running to try to determine if the failure is real.

coveralls commented Nov 29, 2016 edited

Coverage Status

Coverage decreased (-0.008%) to 70.02% when pulling d9d25a9 on jean-philippe-martin:path_index into 6e4e875 on samtools:master.

@droazen

First-pass review complete -- a few comments/questions to address.

@@ -311,7 +311,24 @@ public SamReader open(final SamInputResource resource) {
if (SamStreams.isBAMFile(bufferedStream)) {
@droazen

droazen Nov 29, 2016 edited

Contributor

Is it ok to leave the indexMaybe.asFile() call intact here (even if the index might be in a bucket)? If yes, add a comment explaining why.

@jean-philippe-martin

jean-philippe-martin Nov 30, 2016

Contributor

There is no such call on this line so I'm guessing you mean the call just before? Yes, calling toFile on a bucket has no harmful effect, it just returns null. We use other ways to get to the index inside this block, except for the case where there's no index at all (this matches the previous behavior: we pass null and the code deals with the fact there's no index). I've added a comment to this effect.

+ // let's make sure the user can seek in this file.
+ // need to return to the beginning because it's the same stream we used earlier
+ // and read a bit from.
+ sourceSeekable.seek(0);
@droazen

droazen Nov 29, 2016 edited

Contributor

Can you explain why we don't have to seek to the beginning in the other places where bufferedStream is used (assuming bufferedStream and sourceSeekable are the same underlying stream)?

@jean-philippe-martin

jean-philippe-martin Nov 30, 2016 edited

Contributor

I think that's because the other places were written before this patch and so they expect the non-seekable stream to start from wherever isBAMFile leaves it. You can see in the previous version of the code how the stream is passed to this variant of the constructor without rewinding. The seekable code path, instead, expects the stream to start from the beginning.

I've added comments to this effect.

+ primitiveSamReader = new BAMFileReader(
+ sourceSeekable, indexSeekable, false, asynchronousIO, validationStringency, this.samRecordFactory);
+ } else {
+ primitiveSamReader = new BAMFileReader(
@droazen

droazen Nov 29, 2016

Contributor

This else clause is the same as the if (indexFile != null) clause above -- is it possible to simplify this nested if/else to combine them into a single case?

@jean-philippe-martin

jean-philippe-martin Nov 30, 2016

Contributor

Yes it is; I've simplified it for you.

@@ -138,6 +138,31 @@
mStream.setInputFileName(file.getAbsolutePath());
}
+
+
+ private BAMFileReader(final File file,
@droazen

droazen Nov 29, 2016

Contributor

This new private constructor appears to be unused.

+ mStream.setInputFileName(file.getAbsolutePath());
+ }
+
+ static BAMFileReader fromFileAndSeekable(final File file,
@droazen

droazen Nov 29, 2016

Contributor

This method fromFileAndSeekable is unused

@@ -187,7 +212,7 @@ private BAMFileReader(final BlockCompressedInputStream compressedInputStream,
final SAMRecordFactory factory)
throws IOException {
mIndexStream = indexStream;
- mIsSeekable = true;
+ mIsSeekable = null != indexStream;
@droazen

droazen Nov 29, 2016 edited

Contributor

I'm concerned that with this change the treatment of mIsSeekable is now inconsistent across constructors in this class. Eg., the constructor that takes a BlockCompressedInputStream and a File for the index sets mIsSeekable to true unconditionally, regardless of whether the index file is null. We should probably attempt to impose consistency across constructors here rather than leave the class in an inconsistent state.

@droazen

droazen Nov 29, 2016

Contributor

Actually, I'm not even sure that this change is necessary, given that the implementation of BAMFileReader.hasIndex() looks like this:

public boolean hasIndex() {
    return mIsSeekable && ((mIndexFile != null) || (mIndexStream != null));
}
@jean-philippe-martin

jean-philippe-martin Nov 30, 2016

Contributor

I think you're right and I can leave it to "true". I ran SamReaderFactoryTest and all 84 tests pass.

Contributor

droazen commented Nov 29, 2016

Tests passed on the second run -- I assume the initial failure was a temporary ftp connectivity issue.

@jean-philippe-martin jean-philippe-martin Simplify code
Simplified "if" logic, removed unused code,
set mIsSeekable in all constructors consistently.
0708f28
Contributor

jean-philippe-martin commented Nov 30, 2016

@droazen this should address all your comments. I ran the test suite and all 23992 tests report passing.

Contributor

jean-philippe-martin commented Nov 30, 2016

Travis reports 2 failures, both with "Could not connect to ftp.broadinstitute.org". Sounds like the same transient we've had issues with before.

Coverage Status

Coverage increased (+0.006%) to 70.033% when pulling 0708f28 on jean-philippe-martin:path_index into 6e4e875 on samtools:master.

Contributor

jean-philippe-martin commented Dec 5, 2016

@droazen OK for me to merge?

@droazen

droazen approved these changes Dec 5, 2016

@droazen droazen merged commit a5ee55b into samtools:master Dec 5, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.006%) to 70.033%
Details

jean-philippe-martin deleted the jean-philippe-martin:path_index branch Dec 15, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment