Support use of Path in ParsingUtils, SeekableStreamFactory, AbstractVCFCodec #724

Merged
merged 1 commit into from Jan 20, 2017

Conversation

Projects
None yet
6 participants
Contributor

tomwhite commented Oct 12, 2016

Description

Added support for java.util.nio.Path for some more code pathways.

This is to support loading of Features with Spark in GATK.

Checklist

  • Code compiles correctly
  • New tests covering changes and new functionality
  • All tests passing
  • Extended the README / documentation, if necessary
  • Is not backward compatible (breaks binary or source compatibility)

Coverage Status

Coverage decreased (-0.009%) to 69.032% when pulling d3f3a4b on tomwhite:tw_use_path_in_parsing_utils into 88b6719 on samtools:master.

droazen self-assigned this Oct 13, 2016

tomwhite referenced this pull request in broadinstitute/gatk Dec 14, 2016

Merged

Add spark versions of walker classes #2256

@lbergelson

@tomwhite Seems like very reasonable changes. Hard to test that they're actually working though. Especially the case with a weird classpath. I have a few concrete comments and then a vague suggestion that we need tests with non file: paths just to shake out any places where it might accidentally be falling back to files without us knowing it. I think maybe if we include a test dependency on an in memory filesytem we could make much stronger test for the paths without having to bundle anything with htsjdk on release.

+ * @return the resulting {@code Path}
+ * @throws IOException an I/O error occurs creating the file system
+ */
+ public static Path getPath(String uriString) throws IOException {
@lbergelson

lbergelson Dec 16, 2016

Contributor

could you replace the use of Paths.get in SamInputResource.UrlInputResource with a call to this instead? Just in case...

@tomwhite

tomwhite Jan 17, 2017

Contributor

Done

+ public static Path getPath(String uriString) throws IOException {
+ URI uri = URI.create(uriString);
+ try {
+ return uri.getScheme() == null ? Paths.get(uriString) : Paths.get(uri);
@lbergelson

lbergelson Dec 16, 2016

Contributor

Could you add a comment here explaining this line. It's going to be very confusing for anyone who doesn't know that Paths.get(String) and Paths.get(URI) have different mechanics.

@tomwhite

tomwhite Jan 17, 2017

Contributor

Done

+ if (cl == null) {
+ throw e;
+ }
+ return FileSystems.newFileSystem(uri, new HashMap<>(), cl).provider().getPath(uri);
@lbergelson

lbergelson Dec 16, 2016

Contributor

I think there's a bug here. I tested this code explicitly by including the gcs provider in the hellbender path and running the following tests:

    @Test
    public void testCompletePath() throws IOException {
        newFilesystem(URI.create("gs://hellbender/somefile.txt"));
    }

    @Test
    public void testBucketOnly() throws IOException {
        newFilesystem(URI.create("gs://hellbender"));
    }

    private void newFilesystem(URI uri) throws IOException {
        ClassLoader cl = Thread.currentThread().getContextClassLoader();
        final FileSystem fileSystem = FileSystems.newFileSystem(uri, new HashMap<>(), cl);
    }

the first test fails with

java.lang.IllegalArgumentException: GCS FileSystem URIs mustn't have: port, userinfo, path, query, or fragment: gs://hellbender/somefile.txt

	at shaded.cloud-nio.com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
	at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.newFileSystem(CloudStorageFileSystemProvider.java:192)
	at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.newFileSystem(CloudStorageFileSystemProvider.java:83)
	at java.nio.file.FileSystems.newFileSystem(FileSystems.java:326)
	at htsjdk.tribble.util.ParsingUtilsTest.newFilesystem(ParsingUtilsTest.java:179)
	at htsjdk.tribble.util.ParsingUtilsTest.testCompletePath(ParsingUtilsTest.java:169)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:85)
	at org.testng.internal.Invoker.invokeMethod(Invoker.java:639)
	at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:816)
	at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1124)
	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125)
	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:108)
	at org.testng.TestRunner.privateRun(TestRunner.java:774)
	at org.testng.TestRunner.run(TestRunner.java:624)
	at org.testng.SuiteRunner.runTest(SuiteRunner.java:359)
	at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:354)
	at org.testng.SuiteRunner.privateRun(SuiteRunner.java:312)
	at org.testng.SuiteRunner.run(SuiteRunner.java:261)
	at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
	at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
	at org.testng.TestNG.runSuitesSequentially(TestNG.java:1215)
	at org.testng.TestNG.runSuitesLocally(TestNG.java:1140)
	at org.testng.TestNG.run(TestNG.java:1048)
	at org.testng.IDEARemoteTestNG.run(IDEARemoteTestNG.java:72)
	at org.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:127)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)

The second passes. So we can't just use the URI that's passed in to open the new file system. I'm not sure if this is a bug in the implementation of the GCS fileSystemProvider or if it's going to be a consistent issue with multiple providers. The other providers that I know of don't offer consistent behavior here that I can tell. UnixFileSystem provider always throws because it doesn't want to be reinstantiated. The HDFS provider from hdfs.jsr203 seems to be more robust and just picks out the part of the URI that it needs.

@jean-philippe-martin Can you comment on this? It looks like a simple patch to the gcs provider would make it so that this approach would work.

@tomwhite Have you tested this running in spark? I don't see how it can work, but maybe I'm missing something important. I do see that our implementation of this method in gatk has a special case for gcs still.

@tomwhite

tomwhite Dec 19, 2016

Contributor

I did try running this on HDFS with Spark a while ago and it worked. There's a fix for the GCS issue from @jean-philippe-martin here: GoogleCloudPlatform/google-cloud-java#1470

@jean-philippe-martin

jean-philippe-martin Jan 18, 2017

Contributor

@lbergerson: yes this was a bug, it's fixed now.

@tomwhite: Does this mean that on Spark after someone calls getPath on a gs:// URI, the fs provider will be loaded and Path.get will work correctly? Because that would be sweet.

@@ -143,6 +164,18 @@ private void tstExists(String path, boolean expectExists) throws IOException{
}
@Test
@lbergelson

lbergelson Dec 16, 2016

Contributor

I think we need a way of testing non-file URI's. I'm not sure the best way to go about it. One possibility would be to include one of the alternative filesystem providers as a test dependency. Either that or maybe we could mock one. There is a [google in memory filesystem] that might make a good test for the non-file based file systems.

@lbergelson

lbergelson Dec 16, 2016

Contributor

@droazen What do you think about including a new test dependency for providing alternative paths. It might help us avoid problems like the index one we saw before.

@tomwhite

tomwhite Jan 17, 2017

Contributor

Good idea. I've added a test dependency on https://github.com/google/jimfs, and a few tests that use non-file URIs that exercise the NIO file path.

magicDGS referenced this pull request Dec 27, 2016

Merged

Methods for default sequence dictionary name #774

2 of 5 tasks complete
@tomwhite tomwhite Support use of Path in ParsingUtils, SeekableStreamFactory, AbstractV…
…CFCodec.
f15fbdd

Current coverage is 63.863% (diff: 50.000%)

Merging #724 into master will decrease coverage by 0.019%

@@             master       #724   diff @@
==========================================
  Files           523        523          
  Lines         31643      31660    +17   
  Methods           0          0          
  Messages          0          0          
  Branches       6766       6772     +6   
==========================================
+ Hits          20214      20219     +5   
- Misses         9293       9301     +8   
- Partials       2136       2140     +4   

Sunburst

Diff Coverage File Path
0% src/main/java/htsjdk/samtools/SamInputResource.java
0% ...k/samtools/seekablestream/SeekableStreamFactory.java
••••• 50% src/main/java/htsjdk/samtools/util/IOUtil.java
••••• 50% ...c/main/java/htsjdk/variant/vcf/AbstractVCFCodec.java
•••••••••• 100% src/main/java/htsjdk/tribble/util/ParsingUtils.java

Powered by Codecov. Last update 5a9d819...f15fbdd

tomwhite referenced this pull request in broadinstitute/gatk Jan 17, 2017

Merged

Support Path URIs for features. #2344

Contributor

tomwhite commented Jan 17, 2017

@lbergelson thanks for the review. I've addressed all your comments, so please take another look. (See broadinstitute/gatk#2344 for the companion GATK issue.)

lbergelson was assigned by droazen Jan 20, 2017

Contributor

lbergelson commented Jan 20, 2017

@tomwhite Changes look good. Tests with jimfs are a good addition. It would be good to have a test that reads a vcf with an index from jimfs so we make sure we avoid the issue we had with bam indexes. I want to get this in though, so I'm going to merge and we can add one after the fact.

@lbergelson lbergelson merged commit 6738286 into samtools:master Jan 20, 2017

1 of 4 checks passed

codecov/changes 3 files have unexpected coverage changes not visible in diff.
Details
codecov/patch 50.000% of diff hit (target 63.881%)
Details
codecov/project 63.863% (-0.019%) compared to 5a9d819
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment