Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Add InflaterFactory #771
Conversation
coveralls
commented
Dec 9, 2016
droazen
self-assigned this
Dec 9, 2016
| @@ -162,7 +164,7 @@ public void testDevNull() throws Exception { | ||
| } | ||
| @Test | ||
| - public void testCustomDeflater() throws Exception { | ||
| + public void testCustomDeflaterInflater() throws Exception { |
| @@ -43,10 +44,32 @@ | ||
| * @author alecw@broadinstitute.org | ||
| */ | ||
| public class BlockGunzipper { | ||
| - private final Inflater inflater = new Inflater(true); // GZIP mode | ||
| + private static InflaterFactory defaultInflaterFactory = new InflaterFactory(); |
droazen
Dec 13, 2016
Contributor
In addition to this default inflater set via a static method, could we also have a non-static method on SamReaderFactory called setInflaterFactory() that can override the statically-set default? This would mirror what was done for DeflaterFactory.
| + * Subclasses may override to provide their own inflater implementation. | ||
| + * @param nowrap if true then use GZIP compatible compression | ||
| + */ | ||
| + public Inflater makeInflater(final boolean nowrap) { |
droazen
Dec 13, 2016
Contributor
nowrap is a bit cryptic -- should this be named gzipCompatible instead? (I know the name comes from the JDK, but it's a pretty bad name...)
droazen
assigned gspowley and unassigned droazen
Dec 13, 2016
coveralls
commented
Dec 15, 2016
coveralls
commented
Dec 16, 2016
coveralls
commented
Dec 19, 2016
coveralls
commented
Dec 20, 2016
|
@droazen, I've addressed your review comments. Please take a look when you get a chance. |
lbergelson
assigned droazen and unassigned gspowley
Jan 6, 2017
|
@gspowley Can you rebase this onto the latest master? Some changes to |
codecov-io
commented
Jan 10, 2017
•
Codecov Report@@ Coverage Diff @@
## master #771 +/- ##
===============================================
+ Coverage 64.531% 64.559% +0.029%
- Complexity 0 7104 +7104
===============================================
Files 523 524 +1
Lines 31616 31664 +48
Branches 6769 5415 -1354
===============================================
+ Hits 20402 20442 +40
- Misses 9067 9077 +10
+ Partials 2147 2145 -2
Continue to review full report at Codecov.
|
|
@droazen Changes are rebased. A number of changes in |
|
@droazen, added the |
droazen
requested changes
Jan 20, 2017
Second-pass review complete, back to @gspowley. We should be able to merge once this round of comments are addressed, and you've rebased and squashed.
| * @param validationStringency Controls how to handle invalidate reads or header lines. | ||
| + * @param factory SAM record factory |
droazen
Jan 20, 2017
Contributor
Fill in full docs for all constructor parameters (applies here and below).
| * @param validationStringency Controls how to handle invalidate reads or header lines. | ||
| + * @param factory SAM record factory | ||
| + * @throws IOException | ||
| */ | ||
| BAMFileReader(final InputStream stream, | ||
| final File indexFile, | ||
| final boolean eagerDecode, | ||
| final boolean useAsynchronousIO, | ||
| final ValidationStringency validationStringency, | ||
| final SAMRecordFactory factory) |
droazen
Jan 20, 2017
Contributor
Rename this parameter to samRecordFactory to distinguish it from the inflater factory (applies to other constructors as well).
| @@ -97,6 +98,9 @@ public SamReader open(final Path path) { | ||
| /** Set this factory's {@link htsjdk.samtools.SAMRecordFactory} to the provided one, then returns itself. */ | ||
| abstract public SamReaderFactory samRecordFactory(final SAMRecordFactory samRecordFactory); | ||
| + /** Set this factory's {@link htsjdk.samtools.util.zip.InflaterFactory} to the provided one, then returns itself. */ | ||
| + abstract public SamReaderFactory inflaterFactory(final InflaterFactory inflaterFactory); |
droazen
Jan 20, 2017
Contributor
Document the fact that the inflater factory is only used for BAM decompression, and not for CRAM or other formats.
droazen
Jan 20, 2017
Contributor
Also clarify whether the factory would be used for something like a .sam.gz or not.
| + /** | ||
| + * Use this ctor if you wish to call seek() | ||
| + */ | ||
| + public BlockCompressedInputStream(final File file, final InflaterFactory inflaterFactory) throws IOException { |
droazen
Jan 20, 2017
Contributor
Document the inflaterFactory parameter for all relevant constructors in this class
| + | ||
| +/** | ||
| + * Factory for {@link Inflater} objects used by {@link BlockGunzipper}. | ||
| + * This class may be extended to provide alternative inflaters (e.g., for improved performance). |
droazen
Jan 20, 2017
Contributor
Document the fact that the default implementation creates the JDK inflater.
| + } | ||
| + | ||
| + /** | ||
| + * Returns an inflater object that will be used when reading BAM files. |
droazen
Jan 20, 2017
Contributor
Don't mention BAMs explicitly in the docs for InflaterFactory, since in the future it could be used for other purposes as well.
| + | ||
| + /** | ||
| + * Returns an inflater object that will be used when reading BAM files. | ||
| + * Subclasses may override to provide their own inflater implementation. |
| @@ -41,6 +43,35 @@ public void variousFormatReaderTest(final String inputFile) throws IOException { | ||
| reader.close(); | ||
| } | ||
| + @Test(dataProvider = "variousFormatReaderTestCases") |
droazen
Jan 20, 2017
Contributor
This seems like the wrong DataProvider to use with this test case, since the DataProvider has only one bam and a bunch of sams, and the inflater functionality affects only the bam. Perhaps just use the bam from that provider directly in this test case.
| @@ -41,6 +43,35 @@ public void variousFormatReaderTest(final String inputFile) throws IOException { | ||
| reader.close(); | ||
| } | ||
| + @Test(dataProvider = "variousFormatReaderTestCases") | ||
| + public void variousFormatReaderInflatorFactoryTest(final String inputFile) throws IOException { | ||
| + final int[] inflateCalls = {0}; //Note: using and array is a HACK to fool the compiler |
| + }; | ||
| + | ||
| + final File input = new File(TEST_DATA_DIR, inputFile); | ||
| + final SamReader reader = SamReaderFactory.makeDefault().inflaterFactory(myInflaterFactory).open(input); |
| @@ -86,4 +91,142 @@ public void available_should_return_number_of_bytes_left_in_current_block() thro | ||
| } | ||
| } | ||
| } | ||
| + | ||
| + @Test | ||
| + public void testCustomInflaterFileInput() throws Exception { |
droazen
Jan 20, 2017
Contributor
I think that the three test cases you added here could be unified into a single test case via a DataProvider that supplies BlockCompressedInputStreams constructed in each of the three ways, as well as expected decompressed file contents.
| + System.out.println("Creating file " + f); | ||
| + | ||
| + final List<String> linesWritten = new ArrayList<>(); | ||
| + final BlockCompressedOutputStream bcos = new BlockCompressedOutputStream(f, 5); |
| + linesWritten.add(s); | ||
| + } | ||
| + bcos.write(sb.toString().getBytes()); //Call 3 | ||
| + bcos.close(); |
droazen
Jan 20, 2017
Contributor
Can you extract a private helper method to write the temporary compressed file, and call it from the various test cases instead of repeating this block of code (lines 97-117)?
| + bcos.close(); | ||
| + | ||
| + int[] inflateCalls = {0}; //Note: using and array is a HACK to fool the compiler | ||
| + class MyInflater extends Inflater { |
droazen
Jan 20, 2017
Contributor
Move MyInflater out of this method and into class scope, make it private static, refactor to allow the inflateCalls array to be passed in to the constructor and stored as a field, and share it between test cases instead of repeating the class definition in multiple methods.
| + for(int i = 0; (line = reader.readLine()) != null; ++i) {} | ||
| + bcis.close(); | ||
| + Assert.assertEquals(inflateCalls[0], 21, "inflate calls"); | ||
| + } | ||
| } |
droazen
Jan 20, 2017
Contributor
Can you add a test case that covers an AsyncBlockCompressedInputStream with a custom inflater as well?
|
@droazen, all review comments have been addressed and changes were rebased and squashed. FYI, I saw the failure below on Travis. I did not see this failure on my system, and the tests passed when rerunning on Travis. Is this a known issue?
|
| + */ | ||
| + BlockGunzipper(InflaterFactory inflaterFactory) { | ||
| + inflater = inflaterFactory.makeInflater(true); // GZIP mode | ||
| + } |
droazen
Feb 1, 2017
Contributor
There should be a no-arg constructor in this class as well that calls defaultInflaterFactory.makeInflater(true)
| + final File input = new File(TEST_DATA_DIR, inputFile); | ||
| + try (final SamReader reader = SamReaderFactory.makeDefault().inflaterFactory(myInflaterFactory).open(input)) { | ||
| + for (final SAMRecord ignored : reader) { } | ||
| + reader.close(); |
| + inflateCalls++; | ||
| + return super.inflate(b, off, len); | ||
| + } | ||
| + static int inflateCalls; |
droazen
Feb 1, 2017
Contributor
This should be non-static and initialized at construction time, to avoid the possibility of separate test cases incorrectly sharing the counter.
|
@droazen can you comment with "review changes" rather than "single comment" so we get one email when you submit your comments rather than an email after each time you comment? |
| + public Object[][] customInflateInput() throws IOException { | ||
| + final File tempFile = File.createTempFile("testCustomInflater.", ".bam"); | ||
| + tempFile.deleteOnExit(); | ||
| + System.out.println("Creating file " + tempFile); |
|
@nh13 I usually do use the review feature, but this PR is very time-sensitive for us, and holding back all comments until the end of the review gives George less time to address them today. Hopefully your email client has the capability to thread or filter emails as necessary. |
| + {new BlockCompressedInputStream(new FileInputStream(tempFile), false, myInflaterFactory), linesWritten, 4}, | ||
| + {new BlockCompressedInputStream(tempFile, myInflaterFactory), linesWritten, 4}, | ||
| + {new AsyncBlockCompressedInputStream(tempFile, myInflaterFactory), linesWritten, 4}, | ||
| + {new BlockCompressedInputStream(new URL("http://broadinstitute.github.io/picard/testdata/index_test.bam"), myInflaterFactory), null, 21} |
droazen
Feb 1, 2017
Contributor
@lbergelson suggests using a Supplier here to delay construction of the streams (and, therefore, any resulting exceptions) until the test case starts
|
@gspowley Final review complete, a few last changes needed. Since we want this in today, and the remaining issues are not correctness issues, I'll merge it as-is, and then submit a patch to address remaining issues. |
droazen
merged commit 173bccd
into
samtools:master
Feb 1, 2017
droazen
added a commit
that referenced
this pull request
Feb 1, 2017
|
|
droazen |
60732f0
|
droazen
referenced
this pull request
Feb 1, 2017
Merged
Small patches to custom InflaterFactory support left over from code review #794
droazen
added a commit
that referenced
this pull request
Feb 1, 2017
|
|
droazen |
939bf1e
|
droazen
added a commit
that referenced
this pull request
Feb 1, 2017
|
|
droazen |
34440b7
|
gspowley commentedDec 9, 2016
•
edited
Description
Created the
InflaterFactoryclass and added adefaultInflaterFactorytoBlockGunzipper. By default, thedefaultInflaterFactorycreates a JavaInflater, sohtsjdkdefault behavior remains the same.The
InflaterFactoryenables the use of acceleratedInflaterimplementations, like the one in an upcoming release of GKL.Applications can set the default
InflaterFactoryusing this static method:Applications can override the default
InflaterFactoryusing:Checklist