Add InflaterFactory #771

Merged
merged 1 commit into from Feb 1, 2017

Conversation

Projects
None yet
5 participants
Contributor

gspowley commented Dec 9, 2016 edited

Description

Created the InflaterFactory class and added a defaultInflaterFactory to BlockGunzipper. By default, the defaultInflaterFactory creates a Java Inflater, so htsjdk default behavior remains the same.

The InflaterFactory enables the use of accelerated Inflater implementations, like the one in an upcoming release of GKL.

Applications can set the default InflaterFactory using this static method:

BlockGunzipper.setDefaultInflaterFactory(myInflaterFactory);

Applications can override the default InflaterFactory using:

SamReader reader = SamReaderFactory.makeDefault().inflaterFactory(myInflaterFactory);

Checklist

  • Code compiles correctly
  • New tests covering changes and new functionality
  • All tests passing

Coverage Status

Coverage increased (+0.003%) to 70.395% when pulling 299d7b5 on gspowley:gp_inflater_factory into e69aff0 on samtools:master.

droazen self-assigned this Dec 9, 2016

@@ -162,7 +164,7 @@ public void testDevNull() throws Exception {
}
@Test
- public void testCustomDeflater() throws Exception {
+ public void testCustomDeflaterInflater() throws Exception {
@droazen

droazen Dec 13, 2016

Contributor

Can you separate the inflater tests from the deflater tests?

@gspowley

gspowley Dec 20, 2016

Contributor

done

@@ -43,10 +44,32 @@
* @author alecw@broadinstitute.org
*/
public class BlockGunzipper {
- private final Inflater inflater = new Inflater(true); // GZIP mode
+ private static InflaterFactory defaultInflaterFactory = new InflaterFactory();
@droazen

droazen Dec 13, 2016

Contributor

In addition to this default inflater set via a static method, could we also have a non-static method on SamReaderFactory called setInflaterFactory() that can override the statically-set default? This would mirror what was done for DeflaterFactory.

@gspowley

gspowley Dec 20, 2016

Contributor

done

+ * Subclasses may override to provide their own inflater implementation.
+ * @param nowrap if true then use GZIP compatible compression
+ */
+ public Inflater makeInflater(final boolean nowrap) {
@droazen

droazen Dec 13, 2016

Contributor

nowrap is a bit cryptic -- should this be named gzipCompatible instead? (I know the name comes from the JDK, but it's a pretty bad name...)

@gspowley

gspowley Dec 20, 2016

Contributor

done

@droazen droazen assigned gspowley and unassigned droazen Dec 13, 2016

Coverage Status

Coverage decreased (-0.02%) to 70.529% when pulling 2736671 on gspowley:gp_inflater_factory into c795101 on samtools:master.

Coverage Status

Coverage decreased (-0.02%) to 70.529% when pulling 3d747f3 on gspowley:gp_inflater_factory into c795101 on samtools:master.

Coverage Status

Coverage decreased (-0.02%) to 70.528% when pulling 3d672f2 on gspowley:gp_inflater_factory into c795101 on samtools:master.

Coverage Status

Coverage increased (+0.02%) to 70.563% when pulling 9c87375 on gspowley:gp_inflater_factory into c795101 on samtools:master.

Contributor

gspowley commented Dec 20, 2016

@droazen, I've addressed your review comments. Please take a look when you get a chance.

@lbergelson lbergelson assigned droazen and unassigned gspowley Jan 6, 2017

Contributor

droazen commented Jan 9, 2017

@gspowley Can you rebase this onto the latest master? Some changes to BlockCompressedInputStream just got merged.

codecov-io commented Jan 10, 2017 edited

Codecov Report

Merging #771 into master will increase coverage by 0.028%.

@@               Coverage Diff               @@
##              master      #771       +/-   ##
===============================================
+ Coverage     64.531%   64.559%   +0.029%     
- Complexity         0      7104     +7104     
===============================================
  Files            523       524        +1     
  Lines          31616     31664       +48     
  Branches        6769      5415     -1354     
===============================================
+ Hits           20402     20442       +40     
- Misses          9067      9077       +10     
+ Partials        2147      2145        -2
Impacted Files Coverage Δ Complexity Δ
...java/htsjdk/samtools/util/zip/InflaterFactory.java 100% <100%> (ø) 2 <2> (?)
...main/java/htsjdk/samtools/util/BlockGunzipper.java 55.319% <100%> (+7.819%) 7 <4> (+7)
...java/htsjdk/samtools/util/zip/DeflaterFactory.java 100% <100%> (ø) 2 <1> (+2)
...sjdk/samtools/util/BlockCompressedInputStream.java 74.902% <100%> (+3.828%) 73 <6> (+73)
...samtools/util/AsyncBlockCompressedInputStream.java 72% <25%> (-2.627%) 12 <ø> (+12)
src/main/java/htsjdk/samtools/BAMFileReader.java 63.584% <36.364%> (-1.686%) 37 <4> (+37)
...rc/main/java/htsjdk/samtools/SamReaderFactory.java 63.452% <83.333%> (+0.431%) 7 <1> (+7)
...dk/samtools/seekablestream/SeekableHTTPStream.java 56.061% <ø> (+1.515%) 10% <ø> (+10%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4231660...260a62e. Read the comment docs.

Contributor

gspowley commented Jan 11, 2017

@droazen Changes are rebased. A number of changes in BAMFileReader.java related to AsyncBlockCompressedInputStream are on the same line as changes for InflaterFactory. Someone may want to look in to applying InflaterFactory to AsyncBlockCompressedInputStream.

Contributor

gspowley commented Jan 14, 2017

@droazen, added the InflaterFactory to AsyncBlockCompressedInputStream, it was pretty straightforward.

@droazen

Second-pass review complete, back to @gspowley. We should be able to merge once this round of comments are addressed, and you've rebased and squashed.

* @param validationStringency Controls how to handle invalidate reads or header lines.
+ * @param factory SAM record factory
@droazen

droazen Jan 20, 2017

Contributor

Fill in full docs for all constructor parameters (applies here and below).

@gspowley

gspowley Jan 31, 2017

Contributor

done

* @param validationStringency Controls how to handle invalidate reads or header lines.
+ * @param factory SAM record factory
+ * @throws IOException
*/
BAMFileReader(final InputStream stream,
final File indexFile,
final boolean eagerDecode,
final boolean useAsynchronousIO,
final ValidationStringency validationStringency,
final SAMRecordFactory factory)
@droazen

droazen Jan 20, 2017

Contributor

Rename this parameter to samRecordFactory to distinguish it from the inflater factory (applies to other constructors as well).

@gspowley

gspowley Jan 31, 2017

Contributor

done

@@ -97,6 +98,9 @@ public SamReader open(final Path path) {
/** Set this factory's {@link htsjdk.samtools.SAMRecordFactory} to the provided one, then returns itself. */
abstract public SamReaderFactory samRecordFactory(final SAMRecordFactory samRecordFactory);
+ /** Set this factory's {@link htsjdk.samtools.util.zip.InflaterFactory} to the provided one, then returns itself. */
+ abstract public SamReaderFactory inflaterFactory(final InflaterFactory inflaterFactory);
@droazen

droazen Jan 20, 2017

Contributor

Document the fact that the inflater factory is only used for BAM decompression, and not for CRAM or other formats.

@droazen

droazen Jan 20, 2017

Contributor

Also clarify whether the factory would be used for something like a .sam.gz or not.

@gspowley

gspowley Jan 31, 2017

Contributor

done

+ /**
+ * Use this ctor if you wish to call seek()
+ */
+ public BlockCompressedInputStream(final File file, final InflaterFactory inflaterFactory) throws IOException {
@droazen

droazen Jan 20, 2017

Contributor

Document the inflaterFactory parameter for all relevant constructors in this class

@gspowley

gspowley Jan 31, 2017

Contributor

done

+
+/**
+ * Factory for {@link Inflater} objects used by {@link BlockGunzipper}.
+ * This class may be extended to provide alternative inflaters (e.g., for improved performance).
@droazen

droazen Jan 20, 2017

Contributor

Document the fact that the default implementation creates the JDK inflater.

@gspowley

gspowley Jan 31, 2017

Contributor

done

+ }
+
+ /**
+ * Returns an inflater object that will be used when reading BAM files.
@droazen

droazen Jan 20, 2017

Contributor

Don't mention BAMs explicitly in the docs for InflaterFactory, since in the future it could be used for other purposes as well.

@gspowley

gspowley Jan 31, 2017

Contributor

done

+
+ /**
+ * Returns an inflater object that will be used when reading BAM files.
+ * Subclasses may override to provide their own inflater implementation.
@droazen

droazen Jan 20, 2017

Contributor

Default implementation returns the JDK inflater.

@gspowley

gspowley Jan 31, 2017

Contributor

done

@@ -41,6 +43,35 @@ public void variousFormatReaderTest(final String inputFile) throws IOException {
reader.close();
}
+ @Test(dataProvider = "variousFormatReaderTestCases")
@droazen

droazen Jan 20, 2017

Contributor

This seems like the wrong DataProvider to use with this test case, since the DataProvider has only one bam and a bunch of sams, and the inflater functionality affects only the bam. Perhaps just use the bam from that provider directly in this test case.

@gspowley

gspowley Feb 1, 2017

Contributor

done

@@ -41,6 +43,35 @@ public void variousFormatReaderTest(final String inputFile) throws IOException {
reader.close();
}
+ @Test(dataProvider = "variousFormatReaderTestCases")
+ public void variousFormatReaderInflatorFactoryTest(final String inputFile) throws IOException {
+ final int[] inflateCalls = {0}; //Note: using and array is a HACK to fool the compiler
@droazen

droazen Jan 20, 2017

Contributor

and -> an

@gspowley

gspowley Jan 31, 2017

Contributor

done

+ };
+
+ final File input = new File(TEST_DATA_DIR, inputFile);
+ final SamReader reader = SamReaderFactory.makeDefault().inflaterFactory(myInflaterFactory).open(input);
@droazen

droazen Jan 20, 2017

Contributor

Wrap reader in try-with-resources to ensure it always gets closed.

@gspowley

gspowley Feb 1, 2017

Contributor

done

@@ -86,4 +91,142 @@ public void available_should_return_number_of_bytes_left_in_current_block() thro
}
}
}
+
+ @Test
+ public void testCustomInflaterFileInput() throws Exception {
@droazen

droazen Jan 20, 2017

Contributor

I think that the three test cases you added here could be unified into a single test case via a DataProvider that supplies BlockCompressedInputStreams constructed in each of the three ways, as well as expected decompressed file contents.

@gspowley

gspowley Feb 1, 2017

Contributor

done

+ System.out.println("Creating file " + f);
+
+ final List<String> linesWritten = new ArrayList<>();
+ final BlockCompressedOutputStream bcos = new BlockCompressedOutputStream(f, 5);
@droazen

droazen Jan 20, 2017

Contributor

Use try-with-resources to ensure this is closed when done

@gspowley

gspowley Feb 1, 2017

Contributor

done

+ linesWritten.add(s);
+ }
+ bcos.write(sb.toString().getBytes()); //Call 3
+ bcos.close();
@droazen

droazen Jan 20, 2017

Contributor

Can you extract a private helper method to write the temporary compressed file, and call it from the various test cases instead of repeating this block of code (lines 97-117)?

@gspowley

gspowley Feb 1, 2017

Contributor

Moved to DataProvider named customInflateInput.

+ bcos.close();
+
+ int[] inflateCalls = {0}; //Note: using and array is a HACK to fool the compiler
+ class MyInflater extends Inflater {
@droazen

droazen Jan 20, 2017

Contributor

Move MyInflater out of this method and into class scope, make it private static, refactor to allow the inflateCalls array to be passed in to the constructor and stored as a field, and share it between test cases instead of repeating the class definition in multiple methods.

@gspowley

gspowley Feb 1, 2017

Contributor

done

+ for(int i = 0; (line = reader.readLine()) != null; ++i) {}
+ bcis.close();
+ Assert.assertEquals(inflateCalls[0], 21, "inflate calls");
+ }
}
@droazen

droazen Jan 20, 2017

Contributor

Can you add a test case that covers an AsyncBlockCompressedInputStream with a custom inflater as well?

@gspowley

gspowley Feb 1, 2017

Contributor

done

@gspowley gspowley add inflater factory
260a62e
Contributor

gspowley commented Feb 1, 2017

@droazen, all review comments have been addressed and changes were rebased and squashed.

FYI, I saw the failure below on Travis. I did not see this failure on my system, and the tests passed when rerunning on Travis. Is this a known issue?

Gradle suite > Gradle test > htsjdk.samtools.seekablestream.SeekableBufferedStreamTest.testEOF FAILED
    java.lang.AssertionError: expected [1000] but found [149]
        at org.testng.Assert.fail(Assert.java:94)
        at org.testng.Assert.failNotEquals(Assert.java:496)
        at org.testng.Assert.assertEquals(Assert.java:125)
        at org.testng.Assert.assertEquals(Assert.java:372)
        at org.testng.Assert.assertEquals(Assert.java:382)
        at htsjdk.samtools.seekablestream.SeekableBufferedStreamTest.testEOF(SeekableBufferedStreamTest.java:89)
+ */
+ BlockGunzipper(InflaterFactory inflaterFactory) {
+ inflater = inflaterFactory.makeInflater(true); // GZIP mode
+ }
@droazen

droazen Feb 1, 2017

Contributor

There should be a no-arg constructor in this class as well that calls defaultInflaterFactory.makeInflater(true)

+ final File input = new File(TEST_DATA_DIR, inputFile);
+ try (final SamReader reader = SamReaderFactory.makeDefault().inflaterFactory(myInflaterFactory).open(input)) {
+ for (final SAMRecord ignored : reader) { }
+ reader.close();
@droazen

droazen Feb 1, 2017

Contributor

Explicit close() call not needed within try-with-resources

+ inflateCalls++;
+ return super.inflate(b, off, len);
+ }
+ static int inflateCalls;
@droazen

droazen Feb 1, 2017

Contributor

This should be non-static and initialized at construction time, to avoid the possibility of separate test cases incorrectly sharing the counter.

Contributor

nh13 commented Feb 1, 2017

@droazen can you comment with "review changes" rather than "single comment" so we get one email when you submit your comments rather than an email after each time you comment?

+ public Object[][] customInflateInput() throws IOException {
+ final File tempFile = File.createTempFile("testCustomInflater.", ".bam");
+ tempFile.deleteOnExit();
+ System.out.println("Creating file " + tempFile);
@droazen

droazen Feb 1, 2017

Contributor

Remove println() statement from DataProvider

Contributor

droazen commented Feb 1, 2017

@nh13 I usually do use the review feature, but this PR is very time-sensitive for us, and holding back all comments until the end of the review gives George less time to address them today. Hopefully your email client has the capability to thread or filter emails as necessary.

+ {new BlockCompressedInputStream(new FileInputStream(tempFile), false, myInflaterFactory), linesWritten, 4},
+ {new BlockCompressedInputStream(tempFile, myInflaterFactory), linesWritten, 4},
+ {new AsyncBlockCompressedInputStream(tempFile, myInflaterFactory), linesWritten, 4},
+ {new BlockCompressedInputStream(new URL("http://broadinstitute.github.io/picard/testdata/index_test.bam"), myInflaterFactory), null, 21}
@droazen

droazen Feb 1, 2017

Contributor

@lbergelson suggests using a Supplier here to delay construction of the streams (and, therefore, any resulting exceptions) until the test case starts

Contributor

droazen commented Feb 1, 2017 edited

@gspowley Final review complete, a few last changes needed. Since we want this in today, and the remaining issues are not correctness issues, I'll merge it as-is, and then submit a patch to address remaining issues.

@droazen droazen merged commit 173bccd into samtools:master Feb 1, 2017

4 checks passed

codecov/changes No unexpected coverage changes found.
Details
codecov/patch 70.27% of diff hit (target 64.531%)
Details
codecov/project 64.559% (+0.028%) compared to 4231660
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@droazen droazen added a commit that referenced this pull request Feb 1, 2017

@droazen droazen Small patches to custom InflaterFactory support left over from code r…
…eview

A few remaining tweaks to the custom InflaterFactory support from the
final review round of #771
60732f0

@droazen droazen added a commit that referenced this pull request Feb 1, 2017

@droazen droazen Small patches to custom InflaterFactory support left over from code r…
…eview

A few remaining tweaks to the custom InflaterFactory support from the
final review round of #771
939bf1e

@droazen droazen added a commit that referenced this pull request Feb 1, 2017

@droazen droazen Small patches to custom InflaterFactory support left over from code r…
…eview (#794)

A few remaining tweaks to the custom InflaterFactory support from the
final review round of #771
34440b7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment