updating tribble to support path wrappers #796

Merged
merged 3 commits into from Feb 3, 2017

Conversation

Projects
None yet
3 participants
Contributor

lbergelson commented Feb 2, 2017

updating AbstractFeatureReader.getFeatureReader so that path wrappers can be passed to it.

Contributor

lbergelson commented Feb 2, 2017

@droazen The tests were more time consuming than I anticipated. I think they're pretty solid though now.

I still need to update javadoc in a lot of places but I wanted to get these pushed up now.

@lbergelson lbergelson supporting NIO wrappers in tribble
adding an overload of AbstractFeatureReader.getFeatureReader() that takes wrappers for gcs prefetching
this required adding matching overloads in a number of other places

removed an unused Field from SamReaderFactor

test for both tabix and tribble indexes
d75605e

codecov-io commented Feb 2, 2017 edited

Codecov Report

Merging #796 into master will increase coverage by 0.308%.

@@               Coverage Diff               @@
##              master      #796       +/-   ##
===============================================
+ Coverage     64.569%   64.876%   +0.308%     
- Complexity      7105      7195       +90     
===============================================
  Files            524       525        +1     
  Lines          31667     32212      +545     
  Branches        5415      5543      +128     
===============================================
+ Hits           20447     20898      +451     
- Misses          9076      9134       +58     
- Partials        2144      2180       +36
Impacted Files Coverage Δ Complexity Δ
...rc/main/java/htsjdk/samtools/SamReaderFactory.java 68.471% <ø> (+5.02%) 12 <ø> (+5)
src/main/java/htsjdk/tribble/FeatureCodec.java 0% <ø> (ø) 0 <ø> (ø)
...sjdk/samtools/util/BlockCompressedInputStream.java 78.497% <ø> (+3.595%) 113 <ø> (+40)
src/main/java/htsjdk/variant/bcf2/BCF2Codec.java 74.847% <ø> (+3.082%) 31 <ø> (ø)
...amtools/seekablestream/ISeekableStreamFactory.java 0% <ø> (ø) 0 <ø> (?)
...rc/main/java/htsjdk/tribble/util/ParsingUtils.java 73.545% <100%> (+0.141%) 47 <1> (+1)
...samtools/seekablestream/SeekableStreamFactory.java 83.333% <100%> (+9.42%) 6 <ø> (ø)
...c/main/java/htsjdk/tribble/TabixFeatureReader.java 68.657% <62.5%> (+5.421%) 8 <2> (+1)
...ain/java/htsjdk/tribble/AbstractFeatureReader.java 70.833% <66.667%> (+6.548%) 18 <8> (+5)
...c/main/java/htsjdk/tribble/index/IndexFactory.java 78.667% <66.667%> (+3.499%) 26 <4> (+1)
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 991a8e4...05692a6. Read the comment docs.

droazen self-assigned this Feb 2, 2017

@@ -77,8 +77,6 @@
private static ValidationStringency defaultValidationStringency = ValidationStringency.DEFAULT_STRINGENCY;
- private Function<SeekableByteChannel, SeekableByteChannel> pathWrapper = Function.identity();
@lbergelson

lbergelson Feb 3, 2017

Contributor

this isn't used anywhere so I removed it

@droazen

droazen Feb 3, 2017

Contributor

👍

@droazen

Review complete, back to @lbergelson. Mostly minor comments -- merge after addressing, and open a PR against GATK4 to move us to an htsjdk snapshot with this change.

@@ -77,8 +77,6 @@
private static ValidationStringency defaultValidationStringency = ValidationStringency.DEFAULT_STRINGENCY;
- private Function<SeekableByteChannel, SeekableByteChannel> pathWrapper = Function.identity();
@droazen

droazen Feb 3, 2017

Contributor

👍

@@ -30,4 +32,12 @@
* @return
*/
public SeekableStream getBufferedStream(SeekableStream stream, int bufferSize);
+
+ default SeekableStream getStreamFor(String path, Function<SeekableByteChannel, SeekableByteChannel> wrapper) throws IOException {
@droazen

droazen Feb 3, 2017

Contributor

javadoc for this new interface method

@lbergelson

lbergelson Feb 3, 2017

Contributor

done

@@ -80,16 +90,18 @@ public SeekableStream getStreamFor(final String path) throws IOException {
} else if (path.startsWith("file:")) {
return new SeekableFileStream(new File(new URL(path).getPath()));
} else if (IOUtil.hasScheme(path)) {
@droazen

droazen Feb 3, 2017

Contributor

Document that the wrapper is only applied for URIs that have a scheme other than http/https/ftp/file. It should eventually (in the future!) be applied everywhere, though, so that the client can decide whether the stream should be wrapped or not, regardless of protocol.

@lbergelson

lbergelson Feb 3, 2017

Contributor

done

@@ -42,6 +44,8 @@
// the path to underlying data source
String path;
+ final Function<SeekableByteChannel, SeekableByteChannel> wrapper;
+ final Function<SeekableByteChannel, SeekableByteChannel> indexWrapper;
@droazen

droazen Feb 3, 2017

Contributor

Should these be protected or just package-visible? (answer depends on whether there could be subclasses outside of the package)

Also, add docs for these fields.

@lbergelson

lbergelson Feb 3, 2017

Contributor

I kept them at the same level as path, because any implementations that need to access the wrapper would presumably also need to access the path.

Docs added.

@@ -73,25 +86,26 @@
* @param indexResource the index for the feature file. If null, will auto-generate (if necessary)
* @param codec
* @param requireIndex whether an index is required for this file
+ * @param wrapper
+ * @param indexWrapper
@droazen

droazen Feb 3, 2017

Contributor

Fill in docs

@lbergelson

lbergelson Feb 3, 2017

Contributor

done

+ }
+ return hasBlockCompressedExtension(resourcePath) && ParsingUtils.resourceExists(indexPath);
+ }
+
public static class ComponentMethods{
@droazen

droazen Feb 3, 2017

Contributor

What is the point of this ComponentMethods inner class if you're just going to move the implementation to the enclosing class?

@lbergelson

lbergelson Feb 3, 2017

Contributor

It's probably pointless, but I'm afraid to move it because its an injectable component that someone (IGV) may subclass.

+ * @param featureFile - path to a feature file. Can be a local file, http url, or ftp url
+ * @param indexFile - path to the index file.
+ * @param codec
+ * @param wrapper
@droazen

droazen Feb 3, 2017

Contributor

complete docs

@lbergelson

lbergelson Feb 3, 2017

Contributor

docced

+ }
+
+ /**
+ * @param featureFile - path to the feature file, can be a local file path, http url, or ftp url
@droazen

droazen Feb 3, 2017

Contributor

or other kind of url

@lbergelson

lbergelson Feb 3, 2017

Contributor

done

@@ -139,12 +161,12 @@ public TribbleIndexedFeatureReader(final String featureFile, final FeatureCodec<
private void loadIndex() throws IOException{
String indexFile = Tribble.indexFile(this.path);
if (ParsingUtils.resourceExists(indexFile)) {
@droazen

droazen Feb 3, 2017

Contributor

Will ParsingUtils.resourceExists() return true for an index that lives on GCS?

@lbergelson

lbergelson Feb 3, 2017

Contributor

Yes, there's even a test for it using jimfs.

@@ -288,7 +314,7 @@ private void readHeader() throws IOException {
* @throws IOException
*/
public WFIterator() throws IOException {
- final InputStream inputStream = ParsingUtils.openInputStream(path);
+ final InputStream inputStream = ParsingUtils.openInputStream(path, wrapper);
@droazen

droazen Feb 3, 2017

Contributor

It looks like the wrapping happens before unzipping here -- we should make it clear to the client that they are always wrapping raw unmodified byte streams (and verify that that's the case everywhere).

+ * Load in index from the specified file. The type of index (LinearIndex or IntervalTreeIndex) is determined
+ * at run time by reading the type flag in the file.
+ *
+ * @param indexFile from which to load the index
@droazen

droazen Feb 3, 2017

Contributor

doc for indexWrapper

@lbergelson

lbergelson Feb 3, 2017

Contributor

done

final Class<Index> indexClass = IndexType.getIndexType(bufferedInputStream).getIndexType();
final Constructor<Index> ctor = indexClass.getConstructor(InputStream.class);
return ctor.newInstance(bufferedInputStream);
+ } catch (final TribbleException ex) {
+ throw ex;
@droazen

droazen Feb 3, 2017

Contributor

This is a change in behavior for this method -- previously this got wrapped in a RuntimeException. Reason for the change?

@lbergelson

lbergelson Feb 3, 2017

Contributor

There was previously re-wrapping tribble exceptions as runtime exceptions. This makes it more consistent and specific. Since a TribbleException is a RuntimeException, this is backwards compatible.

- private BlockCompressedInputStream mFp;
+ private final String mFn;
+ private final String mIdxFn;
+ private final Function<SeekableByteChannel, SeekableByteChannel> mIdxWrpr;
@droazen

droazen Feb 3, 2017

Contributor

Erm, no need to obey the questionable preference of this class for abbreviated names -- name it something humans can pronounce like mIndexWrapper

@lbergelson

lbergelson Feb 3, 2017

Contributor

😢

@@ -111,6 +114,17 @@ public TabixReader(final String fn, final String idxFn) throws IOException {
}
/**
+ * @param fn File name of the data file
@droazen

droazen Feb 3, 2017

Contributor

Is it necessarily a file?

@@ -111,6 +114,17 @@ public TabixReader(final String fn, final String idxFn) throws IOException {
}
/**
+ * @param fn File name of the data file
+ * @param idxFn Full path to the index file. Auto-generated if null
@droazen

droazen Feb 3, 2017

Contributor

Fn is too similar to Function (which we also have here, now) -- recommend renaming to indexPath (here and below)

@lbergelson

lbergelson Feb 3, 2017

Contributor

fixed :(

@@ -80,9 +83,7 @@
public static InputStream openInputStream(String path)
@droazen

droazen Feb 3, 2017

Contributor

Can this method just call the other overload with a null wrapper?

-
- InputStream inputStream;
-
+ final InputStream inputStream;
if (path.startsWith("http:") || path.startsWith("https:") || path.startsWith("ftp:")) {
inputStream = getURLHelper(new URL(path)).openInputStream();
} else if (IOUtil.hasScheme(path)) {
@droazen

droazen Feb 3, 2017

Contributor

Should SeekablePathStream be used here as well, or does it have to be Files.newInputStream() for some reason?

@lbergelson

lbergelson Feb 3, 2017

Contributor

changed

@@ -95,6 +96,21 @@ public static InputStream openInputStream(String path)
return inputStream;
}
+ public static InputStream openInputStream(String path, Function<SeekableByteChannel, SeekableByteChannel> wrapper)
@droazen

droazen Feb 3, 2017

Contributor

Docs for this method, including mention of under what circumstances the wrapper is applied

@lbergelson

lbergelson Feb 3, 2017

Contributor

done

+ private static final String VCF_INDEX = TEST_PATH + "baseVariants.vcf.idx";
+ private static final String VCF_TABIX = TEST_PATH + "baseVariants.vcf.gz";
+ private static final String VCF_TABIX_INDEX = TEST_PATH + "baseVariants.vcf.gz.tbi";
+ private static final String MANGLED_VCF_TABIX = TEST_PATH + "baseVariants.mangled.vcf.gz";
@droazen

droazen Feb 3, 2017

Contributor

Can you name block-gzipped test inputs to make it clear in the name that they are block-gzipped rather than regular-gzipped?

@lbergelson

lbergelson Feb 3, 2017

Contributor

done

+ private static final String MANGLED_VCF_TABIX = TEST_PATH + "baseVariants.mangled.vcf.gz";
+ private static final String MANGLED_VCF_TABIX_INDEX = TEST_PATH + "baseVariants.mangled.vcf.gz.tbi";
+
+ private static final Function<SeekableByteChannel, SeekableByteChannel> WRAPPER = SkippingByteChannel::new;
@droazen

droazen Feb 3, 2017

Contributor

Document what this test wrapper does

@lbergelson

lbergelson Feb 3, 2017

Contributor

done

+ {MANGLED_VCF, VCF_INDEX, WRAPPER, null},
+ {MANGLED_VCF_TABIX, MANGLED_VCF_TABIX_INDEX, WRAPPER, WRAPPER},
+ {VCF_TABIX, MANGLED_VCF_TABIX_INDEX, null, WRAPPER},
+ {MANGLED_VCF_TABIX, VCF_TABIX_INDEX, WRAPPER, null},
@droazen

droazen Feb 3, 2017

Contributor

Have you confirmed that all test cases that take a wrapper actually throw if the wrapper is not provided?

@lbergelson

lbergelson Feb 3, 2017

Contributor

updated the failure case tests to include all of the mangled cases

+ Function<SeekableByteChannel, SeekableByteChannel> wrapper,
+ Function<SeekableByteChannel, SeekableByteChannel> indexWrapper) throws IOException, URISyntaxException {
+ try(FileSystem fs = Jimfs.newFileSystem("test", Configuration.unix())) {
+ final AbstractFeatureReader<VariantContext, ?> featureReader = getFeatureReader(file, index, wrapper,
@droazen

droazen Feb 3, 2017

Contributor

Close your featureReader when done using try-with-resources

@lbergelson

lbergelson Feb 3, 2017

Contributor

done

+ private static Object[][] failsWithoutWrappers(){
+ return new Object[][] {
+ {MANGLED_VCF, VCF_INDEX, new VCFCodec()},
+ {VCF, MANGLED_VCF_INDEX, new VCFCodec()},
@droazen

droazen Feb 3, 2017

Contributor

Test with the mangled tabix inputs as well

@lbergelson

lbergelson Feb 3, 2017

Contributor

done

@droazen droazen assigned lbergelson and unassigned droazen Feb 3, 2017

lbergelson added some commits Feb 3, 2017

@lbergelson lbergelson responding to comments
47b79d2
@lbergelson lbergelson updating BCF2Codec.canDecode() to deal with NIO but not wrappers
05692a6

droazen referenced this pull request in broadinstitute/gatk Feb 3, 2017

Merged

Support Path URIs for features. #2344

@lbergelson lbergelson merged commit b05d1f2 into master Feb 3, 2017

4 of 5 checks passed

codecov/changes 5 files have unexpected coverage changes not visible in diff.
Details
codecov/patch 65.278% of diff hit (target 64.569%)
Details
codecov/project 64.876% (+0.308%) compared to 991a8e4
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

lbergelson deleted the lb_update_tribble_for_nio branch Feb 3, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment