Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Tribble/Tabix index path support #810
Conversation
|
@magicDGS Something went wrong here, tests are failing all over. |
|
I figured it out already, I'm trying to find the problem. |
|
Very awful error from my side, sorry. The Now I guess that it's solved. Could you have a quick review, @lbergelson? |
codecov-io
commented
Mar 1, 2017
•
Codecov Report
@@ Coverage Diff @@
## master #810 +/- ##
===============================================
- Coverage 64.867% 64.846% -0.021%
- Complexity 7175 7187 +12
===============================================
Files 526 527 +1
Lines 31731 31769 +38
Branches 5424 5424
===============================================
+ Hits 20583 20601 +18
- Misses 8997 9019 +22
+ Partials 2151 2149 -2
Continue to review full report at Codecov.
|
lbergelson
self-assigned this
Mar 1, 2017
magicDGS
changed the title from
Tribble/Tabix index path support to WIP: Tribble/Tabix index path support
Mar 2, 2017
|
This is still WIP, because I found some other paths in the indexing code that should use Path for fully support. So this should wait for review, @lbergelson. |
magicDGS
referenced
this pull request
Mar 2, 2017
Merged
Md5CalculatingOutputStream support for Path #814
magicDGS
changed the title from
WIP: Tribble/Tabix index path support to Tribble/Tabix index path support
Mar 3, 2017
|
This should be ok now, @lbergelson. Previously failing tests are passing in my computer. |
|
Can you have a look to this one @lbergelson? This is going to be important for my upcoming PR with tribble writing support with indexing on the fly. Thanks a lot! |
lbergelson
requested changes
Mar 8, 2017
@magicDGS This is useful work. Thanks for doing this. I have a a few comments. I think it's probably worth breaking implementations of Index to avoid having weird defaults methods there, but lets see if @droazen thinks otherwise. He's usually more conservative about these things than I am.
Would it be possible for you to add a test that creates an index on a non-file path? There's an example of how to do this using an in memory filesystem called jimfs in AbstractFeatureReaderTest.
The basic idea is that you create a new jimfs filesystem, then you can get paths from that filesystem (ex: "jimfs://filesystemname/path/to/my/in/memory/file"). You may have to copy a file from the local system into jimfs if you want to create an index from it, but that's easy to do using the standard Files operations.
Just be sure that the jimfs filesystem gets closed after your test is done, because it will stay in memory until the jvm shuts down otherwise.
| * @return a non-null File representing the index | ||
| */ | ||
| public static File indexFile(final File file) { | ||
| return indexFile(file.getAbsoluteFile(), STANDARD_INDEX_EXTENSION); | ||
| } | ||
| /** | ||
| - * Return the name of the tabix index file for the provided vcf {@code filename} | ||
| + * Return the name of the index file for the provided {@code filename} |
| * Does not actually create an index | ||
| - * @param filename name of the vcf file | ||
| + * @param path name of the path |
lbergelson
Mar 8, 2017
Contributor
"name of the path" sounds like it's something distinct from the path itself, which is strange because "name of the file" is fine. Maybe just "the path"
| + * @param path name of the path | ||
| + * @return non-null String representing the index filename | ||
| + */ | ||
| + public static String indexPath(final Path path) { |
lbergelson
Mar 8, 2017
Contributor
It seems like we would be better off making this actually return the path instead of a String. It's different than the old API, but I'm not sure I see the use of getting a string back.
| @@ -67,7 +70,7 @@ | ||
| private final static long NO_TS = -1L; | ||
| protected int version; // Our version value | ||
| - protected File indexedFile = null; // The file we've created this index for | ||
| + protected Path indexedPath = null; // The file we've created this index for |
lbergelson
Mar 8, 2017
Contributor
This is technically a breaking change. I'm going to say it's fine though. I don't believe there are any subclasses of AbstractIndex in the wild that we're going to be breaking.
magicDGS
Mar 9, 2017
Contributor
Oh, I haven't realize that it is breaking compatibility. If we do not want to break compatibility, I can kept the field with the deprecated annotation and set it when the used constructor is a File. Let me know if I should do that.
lbergelson
Mar 9, 2017
Contributor
Yeah, it's tricky. Anything field that's visible to a subclass of a non-final class is technically a breaking change since someone could be relying on it if they implement a subclass. It's why I'm so strongly in favor of making classes final and of making variable private with accessors if they need to be accessed.
| @@ -201,7 +213,11 @@ public boolean isCurrentVersion() { | ||
| } | ||
| public File getIndexedFile() { | ||
| - return indexedFile; | ||
| + return getIndexedPath().toFile(); |
lbergelson
Mar 8, 2017
Contributor
I think we should deprecate this method because it's no longer safe to call in all instances. It also needs a comment saying that it can fail with an exception if the path can't be represented as a file.
| @@ -251,7 +271,7 @@ private void writeHeader(final LittleEndianOutputStream dos) throws IOException | ||
| dos.writeInt(MAGIC_NUMBER); | ||
| dos.writeInt(getType()); | ||
| dos.writeInt(version); | ||
| - dos.writeString(indexedFile.getAbsolutePath()); | ||
| + dos.writeString(indexedPath.toString()); |
lbergelson
Mar 8, 2017
Contributor
I think it's better to use indexedPath.toURI().toString() since it's more likely to roundtrip correctly for non-file paths.
| @@ -56,13 +57,18 @@ | ||
| MathUtils.RunningStat stats = new MathUtils.RunningStat(); | ||
| long basesSeen = 0; | ||
| Feature lastFeature = null; | ||
| - File inputFile; | ||
| + // TODO: actually this field is not needed | ||
| + Path inputPath; |
lbergelson
Mar 8, 2017
Contributor
Also a breaking change, but I don't see the harm in it since I really doubt anyone is subclassing this thing.
magicDGS
Mar 9, 2017
Contributor
In this case, because it is package protected, I do not think that is going to harm at all (unless someone is subclassing in the same package outside HTSJDK).
If we are breaking compatibiity anyway, as it is unused, why don't we just remove the field (no the constructor arg)? It is only used for get the index creators...
| + * @param indexPath Where to write the index. | ||
| + * @throws IOException if the index is unable to write to the specified path. | ||
| + */ | ||
| + public default void write(final Path indexPath) throws IOException { |
lbergelson
Mar 8, 2017
Contributor
I think it's better idea to have the default throw OperationUnsupportedException than to have it call toFile() which will work most of the time but sometimes fail.
| + * | ||
| + * @param featurePath | ||
| + */ | ||
| + public default void writeBasedOnFeaturePath(Path featurePath) throws IOException { |
lbergelson
Mar 8, 2017
Contributor
I sort of want to break compatibility and force everyone to implement the Path functions. It would be so much cleaner to have the file versions be default implementations that call the path version. @droazen What do you think?
lbergelson
Mar 8, 2017
Contributor
I suspect no on in the wild implements Index. At least it's not done in gatk/picard/igv.
| + * | ||
| + * @param tabixPath Where to write the index. | ||
| + */ | ||
| + @Override |
| + * @param featurePath Path being indexed. | ||
| + */ | ||
| + @Override | ||
| + public void writeBasedOnFeaturePath(final Path featurePath) throws IOException { |
|
I added two commits, @lbergelson:
Regarding the test, I don't know if I will have time today. I will try, but if not I will add them by tomorrow. You can review the current changes anyway, in case that something else should be changed. |
|
I've just realized that something is broken with the changes, @lbergelson. Do you have an idea of what is happening? It looks that is breaking indexing on the fly for gzip VCF result (not the indexing itself). The most likely reson is the change to delegate the file constructor to the path one for files, because I can add a simple constructor to |
|
Ops, I realized that it was an error not related with the BGZ output stream. That's what happens when you program right after waking up... Fixed in the next commit (and removed the commit with the BGZ changes). |
| @@ -219,8 +219,8 @@ public void write(final Path tabixPath) throws IOException { | ||
| */ | ||
| @Override | ||
| public void writeBasedOnFeaturePath(final Path featurePath) throws IOException { | ||
| - if (Files.isRegularFile(featurePath)) return; | ||
| - write(IOUtil.getPath(Tribble.tabixIndexPath(featurePath))); | ||
| + if (!Files.isRegularFile(featurePath)) return; |
magicDGS
Mar 9, 2017
Contributor
This was the refactoring problem, I overseen this. @lbergelson, I think that this behavior should be documented and/or at least log a warning to avoid getting crazy about what happened with the index. I will rather prefer if it fails with and IO/Tribble exception, but that will break compatibility with public methods...
lbergelson
Mar 9, 2017
Contributor
Ack, good catch. That's terrible behavior. I wonder why it was done that way? Definitely needs to at least log a warning. I would also prefer your suggestion to fail with a TribbleException, but I think we'd need to investigate if that would break existing downstream code somehow. For now, could you have it log a warning, and open a new ticket to change the behavior?
| @@ -73,41 +73,42 @@ | ||
| /** | ||
| * Writes the index into a file. | ||
| * | ||
| + * Default implementation delegates in {@link #write(Path)} |
lbergelson
Mar 9, 2017
Contributor
Sorry to nitpick the language, but delegates to instead of delegates in is more idiomatic.
magicDGS
Mar 9, 2017
Contributor
Thanks for this, I really appreciate to have comments to improve my English skills. Done!
| /** | ||
| * Write an appropriately named and located Index file based on the name and location of the featureFile. | ||
| * If featureFile is not a normal file, the index will silently not be written. | ||
| + * | ||
| + * Default implementation delegates in {@link #writeBasedOnFeaturePath(Path)} |
|
@magicDGS Looks good. |
magicDGS
added some commits
Mar 9, 2017
|
Three more commits:
Thanks for reviewing, @lbergelson - Back to you! |
|
|
lbergelson
merged commit 0c282b8
into
samtools:master
Mar 10, 2017
|
Thanks! |
magicDGS
deleted the
magicDGS:dgs_tribble_path_support branch
Mar 10, 2017
magicDGS
added a commit
to magicDGS/htsjdk
that referenced
this pull request
Mar 10, 2017
|
|
magicDGS |
c2721e2
|
magicDGS
added a commit
to magicDGS/htsjdk
that referenced
this pull request
Jun 26, 2017
|
|
magicDGS |
9cd4459
|
magicDGS
added a commit
to magicDGS/htsjdk
that referenced
this pull request
Jun 27, 2017
|
|
magicDGS |
9234f2c
|
magicDGS commentedMar 1, 2017
•
edited by lbergelson
Description
Motivation
While trying to index a
Path(no convertible toFile) using theIndexinterface, the client should:LittleEndianOutputStreamby its own. This may lead to wrong indexes because they aren't block-compressed.In addition, this is part of a bigger change for include support of writing in whatever filesystem (and also reading).
Changes
Tribbleand unused imports.Indexinterface: methods with default values usePath.toFile()AbstractIndexandTabixIndexChecklist