Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Add optional wrapper for the underlying SeekableByteChannel #775
Conversation
coveralls
commented
Dec 15, 2016
droazen
self-assigned this
Dec 16, 2016
lbergelson
requested changes
Jan 4, 2017
@jean-philippe-martin This looks good to me. I have a few minor comments.
I would really like to have a test for this. Could you add a test for SamReaderFactory that has some trivial wrapper. Something dumb just to catch boneheaded mistakes in the future. Something like a wrapper that just reads the whole stream into a buffer and then doubles it. Easy to check that you're getting 2 reads instead of 1.
| @@ -89,7 +91,9 @@ public String toString() { | ||
| public static SamInputResource of(final File file) { return new SamInputResource(new FileInputResource(file)); } | ||
| /** Creates a {@link SamInputResource} reading from the provided resource, with no index. */ | ||
| - public static SamInputResource of(final Path path) { return new SamInputResource(new PathInputResource(path)); } |
lbergelson
Jan 4, 2017
Contributor
we should probably keep original overload for backwards compatibility reasons.
lbergelson
Jan 4, 2017
Contributor
Could you add javadoc explaining what the wrapper is for on all these constructors that take one?
jean-philippe-martin
Jan 5, 2017
Contributor
Kept overload, added comment here. Which other ctors would you like to add a comment to? I think this one's the only public one so it would be the right place for the comment.
lbergelson
Jan 5, 2017
Contributor
I thought it might be worth it to add an explanation to the PathInputResource constructor in case someone wanders into the code there and wants to know why path inputs have this special thing that the others don't.
| + * @param wrapper | ||
| + * @return wrapped PathInputResource | ||
| + */ | ||
| + PathInputResource wrap(Function<SeekableByteChannel, SeekableByteChannel> wrapper) { |
lbergelson
Jan 4, 2017
Contributor
I'm not sure I see the need for this, why not just initialize it with a wrapper in the first place? Is there a specific reason for this that I'm not seeing?
jean-philippe-martin
Jan 5, 2017
Contributor
This had something to do with how GATK uses it. I don't remember now, it has been too long and I lost the context. If this function's a problem then I should also push the GATK PR and we can work them in concert.
lbergelson
Jan 5, 2017
Contributor
If we need it for gatk then we can keep it. It's not a public method, so I was assuming no one is using it if they're not in htsjdk. Can you check if your gatk branch needs it for something?
jean-philippe-martin
Jan 6, 2017
Contributor
I looked into it and it looks like I can do without wrap(). So I removed it.
| @@ -74,11 +76,13 @@ | ||
| public abstract class SamReaderFactory { | ||
| private static ValidationStringency defaultValidationStringency = ValidationStringency.DEFAULT_STRINGENCY; | ||
| - | ||
| + | ||
| + protected Function<SeekableByteChannel, SeekableByteChannel> pathWrapper; |
lbergelson
Jan 4, 2017
Contributor
If we're going to include this in the base class with accessors, lets just make it private.
| @@ -34,6 +35,16 @@ public SeekablePathStream(final Path path) throws IOException { | ||
| ALL_INSTANCES.add(this); |
lbergelson
Jan 4, 2017
Contributor
Should this constructor one just delegate to the new one with Function.identity()?
jean-philippe-martin
Jan 5, 2017
Contributor
Yes, done (though with null since we also want to accept that)
lbergelson
assigned jean-philippe-martin and unassigned droazen
Jan 4, 2017
| abstract public SamReader open(final File file); | ||
| public SamReader open(final Path path) { | ||
| - final SamInputResource r = SamInputResource.of(path); | ||
| + final SamInputResource r = SamInputResource.of(path, getPathWrapper()); | ||
| final Path indexMaybe = SamFiles.findIndex(path); | ||
| if (indexMaybe != null) r.index(indexMaybe); |
lbergelson
Jan 4, 2017
Contributor
I just noticed, this isn't going to wrap the index. That seems like a problem.
We might actually want the ability to treat the index and the main path separately? Do you think that's necessary? I.e. download the index in it's entirety and store it in on disk vs cache chunks bam in memory as we stream over them.
jean-philippe-martin
Jan 5, 2017
Contributor
Not wrapping the index is exactly the right thing to do. The index is small, we already read the whole thing into memory at startup anyways.
We could optionally add a way for the user to specify their own wrapper for the index, if we see a need for it later.
lbergelson
Jan 5, 2017
•
Contributor
You're certain the index is always read into memory and cached? I could have sworn @kcibul was having issues where there was slowdown due to repeatedly accessing the index over the network. He ended up copying the index files locally. It looks like there's an option CACHE_FILE_BASED_INDEXES, maybe he wasn't using that while you are?
jean-philippe-martin
Jan 5, 2017
Contributor
It's up to user code; reading the whole thing is the right thing to do so I expect performance-oriented code will do that already (by "we" earlier I meant my own code).
| + return this; | ||
| + } | ||
| + | ||
| + public Function<SeekableByteChannel, SeekableByteChannel> getPathWrapper() { |
|
Unfortunately, doubling the buffer results in an invalid SAM file (breaks the header rules). |
|
OK, I got a test with a wrapper that adds the necessary headers (reading would fail if the wrapper weren't invoked). |
codecov-io
commented
Jan 5, 2017
•
Current coverage is 63.78% (diff: 95.65%)
|
lbergelson
merged commit 2acb88b
into
samtools:master
Jan 6, 2017
jean-philippe-martin
deleted the
jean-philippe-martin:jp_inputwrapper branch
Jan 6, 2017
|
Thank you @lbergelson! |
jean-philippe-martin commentedDec 15, 2016
•
edited
Description
This allows users to provide their own buffering or prefetching, without them having to change htsjdk.
Checklist
cc: @droazen @lbergelson