Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Support use of Path in ParsingUtils, SeekableStreamFactory, AbstractVCFCodec #724
Conversation
coveralls
commented
Oct 12, 2016
droazen
self-assigned this
Oct 13, 2016
tomwhite
referenced
this pull request
in broadinstitute/gatk
Dec 14, 2016
Merged
Add spark versions of walker classes #2256
lbergelson
reviewed
Dec 16, 2016
@tomwhite Seems like very reasonable changes. Hard to test that they're actually working though. Especially the case with a weird classpath. I have a few concrete comments and then a vague suggestion that we need tests with non file: paths just to shake out any places where it might accidentally be falling back to files without us knowing it. I think maybe if we include a test dependency on an in memory filesytem we could make much stronger test for the paths without having to bundle anything with htsjdk on release.
| + * @return the resulting {@code Path} | ||
| + * @throws IOException an I/O error occurs creating the file system | ||
| + */ | ||
| + public static Path getPath(String uriString) throws IOException { |
lbergelson
Dec 16, 2016
Contributor
could you replace the use of Paths.get in SamInputResource.UrlInputResource with a call to this instead? Just in case...
| + public static Path getPath(String uriString) throws IOException { | ||
| + URI uri = URI.create(uriString); | ||
| + try { | ||
| + return uri.getScheme() == null ? Paths.get(uriString) : Paths.get(uri); |
lbergelson
Dec 16, 2016
Contributor
Could you add a comment here explaining this line. It's going to be very confusing for anyone who doesn't know that Paths.get(String) and Paths.get(URI) have different mechanics.
| + if (cl == null) { | ||
| + throw e; | ||
| + } | ||
| + return FileSystems.newFileSystem(uri, new HashMap<>(), cl).provider().getPath(uri); |
lbergelson
Dec 16, 2016
Contributor
I think there's a bug here. I tested this code explicitly by including the gcs provider in the hellbender path and running the following tests:
@Test
public void testCompletePath() throws IOException {
newFilesystem(URI.create("gs://hellbender/somefile.txt"));
}
@Test
public void testBucketOnly() throws IOException {
newFilesystem(URI.create("gs://hellbender"));
}
private void newFilesystem(URI uri) throws IOException {
ClassLoader cl = Thread.currentThread().getContextClassLoader();
final FileSystem fileSystem = FileSystems.newFileSystem(uri, new HashMap<>(), cl);
}the first test fails with
java.lang.IllegalArgumentException: GCS FileSystem URIs mustn't have: port, userinfo, path, query, or fragment: gs://hellbender/somefile.txt
at shaded.cloud-nio.com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.newFileSystem(CloudStorageFileSystemProvider.java:192)
at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.newFileSystem(CloudStorageFileSystemProvider.java:83)
at java.nio.file.FileSystems.newFileSystem(FileSystems.java:326)
at htsjdk.tribble.util.ParsingUtilsTest.newFilesystem(ParsingUtilsTest.java:179)
at htsjdk.tribble.util.ParsingUtilsTest.testCompletePath(ParsingUtilsTest.java:169)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:85)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:639)
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:816)
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1124)
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:108)
at org.testng.TestRunner.privateRun(TestRunner.java:774)
at org.testng.TestRunner.run(TestRunner.java:624)
at org.testng.SuiteRunner.runTest(SuiteRunner.java:359)
at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:354)
at org.testng.SuiteRunner.privateRun(SuiteRunner.java:312)
at org.testng.SuiteRunner.run(SuiteRunner.java:261)
at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
at org.testng.TestNG.runSuitesSequentially(TestNG.java:1215)
at org.testng.TestNG.runSuitesLocally(TestNG.java:1140)
at org.testng.TestNG.run(TestNG.java:1048)
at org.testng.IDEARemoteTestNG.run(IDEARemoteTestNG.java:72)
at org.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:127)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
The second passes. So we can't just use the URI that's passed in to open the new file system. I'm not sure if this is a bug in the implementation of the GCS fileSystemProvider or if it's going to be a consistent issue with multiple providers. The other providers that I know of don't offer consistent behavior here that I can tell. UnixFileSystem provider always throws because it doesn't want to be reinstantiated. The HDFS provider from hdfs.jsr203 seems to be more robust and just picks out the part of the URI that it needs.
@jean-philippe-martin Can you comment on this? It looks like a simple patch to the gcs provider would make it so that this approach would work.
@tomwhite Have you tested this running in spark? I don't see how it can work, but maybe I'm missing something important. I do see that our implementation of this method in gatk has a special case for gcs still.
tomwhite
Dec 19, 2016
Contributor
I did try running this on HDFS with Spark a while ago and it worked. There's a fix for the GCS issue from @jean-philippe-martin here: GoogleCloudPlatform/google-cloud-java#1470
jean-philippe-martin
Jan 18, 2017
Contributor
@lbergerson: yes this was a bug, it's fixed now.
@tomwhite: Does this mean that on Spark after someone calls getPath on a gs:// URI, the fs provider will be loaded and Path.get will work correctly? Because that would be sweet.
| @@ -143,6 +164,18 @@ private void tstExists(String path, boolean expectExists) throws IOException{ | ||
| } | ||
| @Test |
lbergelson
Dec 16, 2016
Contributor
I think we need a way of testing non-file URI's. I'm not sure the best way to go about it. One possibility would be to include one of the alternative filesystem providers as a test dependency. Either that or maybe we could mock one. There is a [google in memory filesystem] that might make a good test for the non-file based file systems.
lbergelson
Dec 16, 2016
Contributor
@droazen What do you think about including a new test dependency for providing alternative paths. It might help us avoid problems like the index one we saw before.
tomwhite
Jan 17, 2017
Contributor
Good idea. I've added a test dependency on https://github.com/google/jimfs, and a few tests that use non-file URIs that exercise the NIO file path.
magicDGS
referenced
this pull request
Dec 27, 2016
Merged
Methods for default sequence dictionary name #774
codecov-io
commented
Jan 17, 2017
Current coverage is 63.863% (diff: 50.000%)@@ master #724 diff @@
==========================================
Files 523 523
Lines 31643 31660 +17
Methods 0 0
Messages 0 0
Branches 6766 6772 +6
==========================================
+ Hits 20214 20219 +5
- Misses 9293 9301 +8
- Partials 2136 2140 +4
|
tomwhite
referenced
this pull request
in broadinstitute/gatk
Jan 17, 2017
Merged
Support Path URIs for features. #2344
|
@lbergelson thanks for the review. I've addressed all your comments, so please take another look. (See broadinstitute/gatk#2344 for the companion GATK issue.) |
lbergelson
was assigned
by droazen
Jan 20, 2017
|
@tomwhite Changes look good. Tests with jimfs are a good addition. It would be good to have a test that reads a vcf with an index from jimfs so we make sure we avoid the issue we had with bam indexes. I want to get this in though, so I'm going to merge and we can add one after the fact. |
tomwhite commentedOct 12, 2016
Description
Added support for java.util.nio.Path for some more code pathways.
This is to support loading of Features with Spark in GATK.
Checklist