Add optional wrapper for the underlying SeekableByteChannel #775

Merged
merged 3 commits into from Jan 6, 2017

Conversation

Projects
None yet
5 participants
Contributor

jean-philippe-martin commented Dec 15, 2016 edited

Description

This allows users to provide their own buffering or prefetching, without them having to change htsjdk.

Checklist

  • Code compiles correctly
  • New tests covering changes and new functionality (added a wrapper just for tests that adds required headers)
  • All tests passing
  • Extended the README / documentation, if necessary (not necessary aside from comments)
  • Is backward compatible (does not breaks binary or source compatibility)

cc: @droazen @lbergelson

@jean-philippe-martin jean-philippe-martin Add optional wrapper for the underlying SeekableByteChannel
This allows users to provide their own buffering or prefetching, without
them having to change htsjdk.
73845d6

Coverage Status

Coverage decreased (-0.001%) to 70.546% when pulling 73845d6 on jean-philippe-martin:jp_inputwrapper into c795101 on samtools:master.

droazen self-assigned this Dec 16, 2016

@lbergelson

@jean-philippe-martin This looks good to me. I have a few minor comments.

I would really like to have a test for this. Could you add a test for SamReaderFactory that has some trivial wrapper. Something dumb just to catch boneheaded mistakes in the future. Something like a wrapper that just reads the whole stream into a buffer and then doubles it. Easy to check that you're getting 2 reads instead of 1.

@@ -89,7 +91,9 @@ public String toString() {
public static SamInputResource of(final File file) { return new SamInputResource(new FileInputResource(file)); }
/** Creates a {@link SamInputResource} reading from the provided resource, with no index. */
- public static SamInputResource of(final Path path) { return new SamInputResource(new PathInputResource(path)); }
@lbergelson

lbergelson Jan 4, 2017

Contributor

we should probably keep original overload for backwards compatibility reasons.

@lbergelson

lbergelson Jan 4, 2017

Contributor

Could you add javadoc explaining what the wrapper is for on all these constructors that take one?

@jean-philippe-martin

jean-philippe-martin Jan 5, 2017

Contributor

Kept overload, added comment here. Which other ctors would you like to add a comment to? I think this one's the only public one so it would be the right place for the comment.

@lbergelson

lbergelson Jan 5, 2017

Contributor

I thought it might be worth it to add an explanation to the PathInputResource constructor in case someone wanders into the code there and wants to know why path inputs have this special thing that the others don't.

+ * @param wrapper
+ * @return wrapped PathInputResource
+ */
+ PathInputResource wrap(Function<SeekableByteChannel, SeekableByteChannel> wrapper) {
@lbergelson

lbergelson Jan 4, 2017

Contributor

I'm not sure I see the need for this, why not just initialize it with a wrapper in the first place? Is there a specific reason for this that I'm not seeing?

@jean-philippe-martin

jean-philippe-martin Jan 5, 2017

Contributor

This had something to do with how GATK uses it. I don't remember now, it has been too long and I lost the context. If this function's a problem then I should also push the GATK PR and we can work them in concert.

@lbergelson

lbergelson Jan 5, 2017

Contributor

If we need it for gatk then we can keep it. It's not a public method, so I was assuming no one is using it if they're not in htsjdk. Can you check if your gatk branch needs it for something?

@jean-philippe-martin

jean-philippe-martin Jan 5, 2017

Contributor

Let me get back to you on that.

@jean-philippe-martin

jean-philippe-martin Jan 6, 2017

Contributor

I looked into it and it looks like I can do without wrap(). So I removed it.

@@ -74,11 +76,13 @@
public abstract class SamReaderFactory {
private static ValidationStringency defaultValidationStringency = ValidationStringency.DEFAULT_STRINGENCY;
-
+
+ protected Function<SeekableByteChannel, SeekableByteChannel> pathWrapper;
@lbergelson

lbergelson Jan 4, 2017

Contributor

If we're going to include this in the base class with accessors, lets just make it private.

@lbergelson

lbergelson Jan 4, 2017

Contributor

Can you give this a default of Function.identity()?

@lbergelson

lbergelson Jan 5, 2017

Contributor

thanks for bearing with my nitpicks!

@@ -34,6 +35,16 @@ public SeekablePathStream(final Path path) throws IOException {
ALL_INSTANCES.add(this);
@lbergelson

lbergelson Jan 4, 2017

Contributor

Should this constructor one just delegate to the new one with Function.identity()?

@jean-philippe-martin

jean-philippe-martin Jan 5, 2017

Contributor

Yes, done (though with null since we also want to accept that)

abstract public SamReader open(final File file);
public SamReader open(final Path path) {
- final SamInputResource r = SamInputResource.of(path);
+ final SamInputResource r = SamInputResource.of(path, getPathWrapper());
final Path indexMaybe = SamFiles.findIndex(path);
if (indexMaybe != null) r.index(indexMaybe);
@lbergelson

lbergelson Jan 4, 2017

Contributor

I just noticed, this isn't going to wrap the index. That seems like a problem.

We might actually want the ability to treat the index and the main path separately? Do you think that's necessary? I.e. download the index in it's entirety and store it in on disk vs cache chunks bam in memory as we stream over them.

@jean-philippe-martin

jean-philippe-martin Jan 5, 2017

Contributor

Not wrapping the index is exactly the right thing to do. The index is small, we already read the whole thing into memory at startup anyways.
We could optionally add a way for the user to specify their own wrapper for the index, if we see a need for it later.

@lbergelson

lbergelson Jan 5, 2017 edited

Contributor

You're certain the index is always read into memory and cached? I could have sworn @kcibul was having issues where there was slowdown due to repeatedly accessing the index over the network. He ended up copying the index files locally. It looks like there's an option CACHE_FILE_BASED_INDEXES, maybe he wasn't using that while you are?

@jean-philippe-martin

jean-philippe-martin Jan 5, 2017

Contributor

It's up to user code; reading the whole thing is the right thing to do so I expect performance-oriented code will do that already (by "we" earlier I meant my own code).

+ return this;
+ }
+
+ public Function<SeekableByteChannel, SeekableByteChannel> getPathWrapper() {
@lbergelson

lbergelson Jan 5, 2017

Contributor

This should probably have javadoc while we're here.

Contributor

jean-philippe-martin commented Jan 5, 2017

Unfortunately, doubling the buffer results in an invalid SAM file (breaks the header rules).

Contributor

jean-philippe-martin commented Jan 5, 2017

OK, I got a test with a wrapper that adds the necessary headers (reading would fail if the wrapper weren't invoked).

@jean-philippe-martin jean-philippe-martin Add test for input wrapper
Also clean up a bit
0f2a582

codecov-io commented Jan 5, 2017 edited

Current coverage is 63.78% (diff: 95.65%)

No coverage report found for master at c795101.

Powered by Codecov. Last update c795101...087f83d

@jean-philippe-martin jean-philippe-martin remove wrap method
087f83d
Contributor

lbergelson commented Jan 6, 2017

@lbergelson lbergelson merged commit 2acb88b into samtools:master Jan 6, 2017

3 checks passed

codecov/patch 95.65% of diff hit (target 63.78%)
Details
codecov/project 63.78% (+0.01%) compared to ffca259
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

jean-philippe-martin deleted the jean-philippe-martin:jp_inputwrapper branch Jan 6, 2017

Contributor

jean-philippe-martin commented Jan 6, 2017

Thank you @lbergelson!

jean-philippe-martin referenced this pull request in broadinstitute/gatk Jan 6, 2017

Merged

Add cloud prefetching to GATK #2331

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment