Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
updating tribble to support path wrappers #796
Conversation
|
@droazen The tests were more time consuming than I anticipated. I think they're pretty solid though now. I still need to update javadoc in a lot of places but I wanted to get these pushed up now. |
codecov-io
commented
Feb 2, 2017
•
Codecov Report@@ Coverage Diff @@
## master #796 +/- ##
===============================================
+ Coverage 64.569% 64.876% +0.308%
- Complexity 7105 7195 +90
===============================================
Files 524 525 +1
Lines 31667 32212 +545
Branches 5415 5543 +128
===============================================
+ Hits 20447 20898 +451
- Misses 9076 9134 +58
- Partials 2144 2180 +36
Continue to review full report at Codecov.
|
droazen
self-assigned this
Feb 2, 2017
| @@ -77,8 +77,6 @@ | ||
| private static ValidationStringency defaultValidationStringency = ValidationStringency.DEFAULT_STRINGENCY; | ||
| - private Function<SeekableByteChannel, SeekableByteChannel> pathWrapper = Function.identity(); |
droazen
requested changes
Feb 3, 2017
Review complete, back to @lbergelson. Mostly minor comments -- merge after addressing, and open a PR against GATK4 to move us to an htsjdk snapshot with this change.
| @@ -77,8 +77,6 @@ | ||
| private static ValidationStringency defaultValidationStringency = ValidationStringency.DEFAULT_STRINGENCY; | ||
| - private Function<SeekableByteChannel, SeekableByteChannel> pathWrapper = Function.identity(); |
| @@ -30,4 +32,12 @@ | ||
| * @return | ||
| */ | ||
| public SeekableStream getBufferedStream(SeekableStream stream, int bufferSize); | ||
| + | ||
| + default SeekableStream getStreamFor(String path, Function<SeekableByteChannel, SeekableByteChannel> wrapper) throws IOException { |
| @@ -80,16 +90,18 @@ public SeekableStream getStreamFor(final String path) throws IOException { | ||
| } else if (path.startsWith("file:")) { | ||
| return new SeekableFileStream(new File(new URL(path).getPath())); | ||
| } else if (IOUtil.hasScheme(path)) { |
droazen
Feb 3, 2017
Contributor
Document that the wrapper is only applied for URIs that have a scheme other than http/https/ftp/file. It should eventually (in the future!) be applied everywhere, though, so that the client can decide whether the stream should be wrapped or not, regardless of protocol.
| @@ -42,6 +44,8 @@ | ||
| // the path to underlying data source | ||
| String path; | ||
| + final Function<SeekableByteChannel, SeekableByteChannel> wrapper; | ||
| + final Function<SeekableByteChannel, SeekableByteChannel> indexWrapper; |
droazen
Feb 3, 2017
Contributor
Should these be protected or just package-visible? (answer depends on whether there could be subclasses outside of the package)
Also, add docs for these fields.
lbergelson
Feb 3, 2017
Contributor
I kept them at the same level as path, because any implementations that need to access the wrapper would presumably also need to access the path.
Docs added.
| @@ -73,25 +86,26 @@ | ||
| * @param indexResource the index for the feature file. If null, will auto-generate (if necessary) | ||
| * @param codec | ||
| * @param requireIndex whether an index is required for this file | ||
| + * @param wrapper | ||
| + * @param indexWrapper |
| + } | ||
| + return hasBlockCompressedExtension(resourcePath) && ParsingUtils.resourceExists(indexPath); | ||
| + } | ||
| + | ||
| public static class ComponentMethods{ |
droazen
Feb 3, 2017
Contributor
What is the point of this ComponentMethods inner class if you're just going to move the implementation to the enclosing class?
lbergelson
Feb 3, 2017
Contributor
It's probably pointless, but I'm afraid to move it because its an injectable component that someone (IGV) may subclass.
| + * @param featureFile - path to a feature file. Can be a local file, http url, or ftp url | ||
| + * @param indexFile - path to the index file. | ||
| + * @param codec | ||
| + * @param wrapper |
| + } | ||
| + | ||
| + /** | ||
| + * @param featureFile - path to the feature file, can be a local file path, http url, or ftp url |
| @@ -139,12 +161,12 @@ public TribbleIndexedFeatureReader(final String featureFile, final FeatureCodec< | ||
| private void loadIndex() throws IOException{ | ||
| String indexFile = Tribble.indexFile(this.path); | ||
| if (ParsingUtils.resourceExists(indexFile)) { |
droazen
Feb 3, 2017
Contributor
Will ParsingUtils.resourceExists() return true for an index that lives on GCS?
| @@ -288,7 +314,7 @@ private void readHeader() throws IOException { | ||
| * @throws IOException | ||
| */ | ||
| public WFIterator() throws IOException { | ||
| - final InputStream inputStream = ParsingUtils.openInputStream(path); | ||
| + final InputStream inputStream = ParsingUtils.openInputStream(path, wrapper); |
droazen
Feb 3, 2017
Contributor
It looks like the wrapping happens before unzipping here -- we should make it clear to the client that they are always wrapping raw unmodified byte streams (and verify that that's the case everywhere).
| + * Load in index from the specified file. The type of index (LinearIndex or IntervalTreeIndex) is determined | ||
| + * at run time by reading the type flag in the file. | ||
| + * | ||
| + * @param indexFile from which to load the index |
| final Class<Index> indexClass = IndexType.getIndexType(bufferedInputStream).getIndexType(); | ||
| final Constructor<Index> ctor = indexClass.getConstructor(InputStream.class); | ||
| return ctor.newInstance(bufferedInputStream); | ||
| + } catch (final TribbleException ex) { | ||
| + throw ex; |
droazen
Feb 3, 2017
Contributor
This is a change in behavior for this method -- previously this got wrapped in a RuntimeException. Reason for the change?
lbergelson
Feb 3, 2017
Contributor
There was previously re-wrapping tribble exceptions as runtime exceptions. This makes it more consistent and specific. Since a TribbleException is a RuntimeException, this is backwards compatible.
| - private BlockCompressedInputStream mFp; | ||
| + private final String mFn; | ||
| + private final String mIdxFn; | ||
| + private final Function<SeekableByteChannel, SeekableByteChannel> mIdxWrpr; |
droazen
Feb 3, 2017
Contributor
Erm, no need to obey the questionable preference of this class for abbreviated names -- name it something humans can pronounce like mIndexWrapper
| @@ -111,6 +114,17 @@ public TabixReader(final String fn, final String idxFn) throws IOException { | ||
| } | ||
| /** | ||
| + * @param fn File name of the data file |
| @@ -111,6 +114,17 @@ public TabixReader(final String fn, final String idxFn) throws IOException { | ||
| } | ||
| /** | ||
| + * @param fn File name of the data file | ||
| + * @param idxFn Full path to the index file. Auto-generated if null |
droazen
Feb 3, 2017
Contributor
Fn is too similar to Function (which we also have here, now) -- recommend renaming to indexPath (here and below)
| @@ -80,9 +83,7 @@ | ||
| public static InputStream openInputStream(String path) |
| - | ||
| - InputStream inputStream; | ||
| - | ||
| + final InputStream inputStream; | ||
| if (path.startsWith("http:") || path.startsWith("https:") || path.startsWith("ftp:")) { | ||
| inputStream = getURLHelper(new URL(path)).openInputStream(); | ||
| } else if (IOUtil.hasScheme(path)) { |
droazen
Feb 3, 2017
Contributor
Should SeekablePathStream be used here as well, or does it have to be Files.newInputStream() for some reason?
| @@ -95,6 +96,21 @@ public static InputStream openInputStream(String path) | ||
| return inputStream; | ||
| } | ||
| + public static InputStream openInputStream(String path, Function<SeekableByteChannel, SeekableByteChannel> wrapper) |
droazen
Feb 3, 2017
Contributor
Docs for this method, including mention of under what circumstances the wrapper is applied
| + private static final String VCF_INDEX = TEST_PATH + "baseVariants.vcf.idx"; | ||
| + private static final String VCF_TABIX = TEST_PATH + "baseVariants.vcf.gz"; | ||
| + private static final String VCF_TABIX_INDEX = TEST_PATH + "baseVariants.vcf.gz.tbi"; | ||
| + private static final String MANGLED_VCF_TABIX = TEST_PATH + "baseVariants.mangled.vcf.gz"; |
droazen
Feb 3, 2017
Contributor
Can you name block-gzipped test inputs to make it clear in the name that they are block-gzipped rather than regular-gzipped?
| + private static final String MANGLED_VCF_TABIX = TEST_PATH + "baseVariants.mangled.vcf.gz"; | ||
| + private static final String MANGLED_VCF_TABIX_INDEX = TEST_PATH + "baseVariants.mangled.vcf.gz.tbi"; | ||
| + | ||
| + private static final Function<SeekableByteChannel, SeekableByteChannel> WRAPPER = SkippingByteChannel::new; |
| + {MANGLED_VCF, VCF_INDEX, WRAPPER, null}, | ||
| + {MANGLED_VCF_TABIX, MANGLED_VCF_TABIX_INDEX, WRAPPER, WRAPPER}, | ||
| + {VCF_TABIX, MANGLED_VCF_TABIX_INDEX, null, WRAPPER}, | ||
| + {MANGLED_VCF_TABIX, VCF_TABIX_INDEX, WRAPPER, null}, |
droazen
Feb 3, 2017
Contributor
Have you confirmed that all test cases that take a wrapper actually throw if the wrapper is not provided?
lbergelson
Feb 3, 2017
Contributor
updated the failure case tests to include all of the mangled cases
| + Function<SeekableByteChannel, SeekableByteChannel> wrapper, | ||
| + Function<SeekableByteChannel, SeekableByteChannel> indexWrapper) throws IOException, URISyntaxException { | ||
| + try(FileSystem fs = Jimfs.newFileSystem("test", Configuration.unix())) { | ||
| + final AbstractFeatureReader<VariantContext, ?> featureReader = getFeatureReader(file, index, wrapper, |
| + private static Object[][] failsWithoutWrappers(){ | ||
| + return new Object[][] { | ||
| + {MANGLED_VCF, VCF_INDEX, new VCFCodec()}, | ||
| + {VCF, MANGLED_VCF_INDEX, new VCFCodec()}, |
lbergelson commentedFeb 2, 2017
updating
AbstractFeatureReader.getFeatureReaderso that path wrappers can be passed to it.