-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bzip2 fixes in cram.io.ExternalCompression #533
Conversation
@bpow - thanks for finding (and fixing!) this issue. The current cram implementation doesn't appear to ever use bzip for compression, so this appears to have gone unnoticed. Its part of the spec though so it should work in the case that we need to decompress a cram file that does use it. The code itself looks good to me (@vadimzalunin can you have a look). I'm not sure about the sbt changes though, or how to verify those. @lbergelson ??. |
If we take this it will also fix #360. |
libraryDependencies += "org.apache.ant" % "ant" % "1.8.2" | ||
|
||
libraryDependencies += "org.testng" % "testng" % "6.8.8" | ||
libraryDependencies += "org.testng" % "testng" % "6.8.8" % "test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmnbroad These changes look fine to me from an htsjdk perspective. testng is only used in tests. We could probably just remove it from the sbt completely since we don't actually use the sbt for anything other than publishing to maven.
I can't see any usage of the ant dependency either. It's possible it's needed for the ant build, but that would be unaffected by removing it. Likewise, picard probably requires these be included by htsjdk, but since it doesn't access them through maven it won't matter if we remove them from the sbt.
Rebased to current master/HEAD |
@vadimzalunin - Can you want to take a look at these changes before we merge them since they mostly affect CRAM ? |
@bpow Thanks for your patience. Can you squash this down to 2 commits (one with the compression code changes and a separate one with the sbt changes). |
Rebased and squashed. Note that the second commit also removes the binary jar |
looks good to me |
@bpow I though I'd be able to test these sbt changes, but I can't, and I don't want to merge this without testing them. So we either need to take those out of the PR (which is fine since I think we're headed towards gradle soon) or we need to hold off until we get the local publish working. Thanks again for your help and patience with this. |
Rebased to move deletion of Should I make a separate issue/pull request for the sbt edits, knowing that it would be on the back burner until someone can test it or until unnecessary because of a switch to gradle? Is the sbt build used for maven publishing? If so, then not removing the dependencies I had listed before bloats the maven artifact. |
I'm not sure of the best way to proceed. Should I put the sbt changes back in this pull request or open a separate pull request for the sbt changes? I'm not able to fully test the sbt changes either-- |
@bpow Sorry for my slow response. I think you should add the sbt changes back in. They're only for publishing to maven, and if the library is not need with this change than it should be removed in all places. If downstream people depend on have it here than they'll have to fix it themselves. |
The removal of unneeded dependencies is back in, and everything is squashed down to one commit. |
Use commons compress for bzip2 functionality and have ExternalCompression::bzip2 actually compress instead of decompress This also obviates the binary blob lib/apache-ant-1.8.2-bzip2.jar and means that ant is no longer a classpath dependency.
I'm trying not to be a pest... Is there any additional information I can supply regarding this PR, or has it just fallen off people's radar? It seems to me like a relatively simple fix for a clear error in the code (complete with a unit test that demonstrates the problem and passes when the problem is fixed). I've been responsive to all suggestions made so far, and have kept the PR up-to-date with other changes in the codebase (including the switch to gradle). People that may have input:
It is rebased again to master. |
@bpow We haven't forgotten about this, we've just been conservative regarding the dependency change (the code change itself I think is uncontroversial). I had hoped @lbergelson would review the last post-gradle PR, but he is away on vacation now, and I have successfully built with this. I'll merge this week if no one objects. |
@bpow thanks for your help with this. |
Post 909381a, src/tests/java/htsjdk/samtools/cram/io/ExternalCompressionTest.java and testdata/htsjdk/samtools/io/bzip2-test.bz2 are all on their lonesome and presumably in the wrong place. |
Description
A couple of fixes to BZip2 support in h.s.cram.io.ExternalCompression:
bzip2
uses a BZip2CompressorInputStream. This is the Apache commons compress class for uncompressing from bzip2. This should be BZip2CompressorOutputStream. There wasn't a test for this before, so I added oneunbzip2
was using CBZip2InputStream (from ant) when there is a perfectly good BZip2CompressorInputStream in commons compress which is already a dependency. The classpath dependency on ant had been removed in dd90c3e but was put back in d92460d. The logic for its removal is similar to before, but now with an additional reason: the build.sbt that produces the maven artifact uses the whole ant jar file instead of the pared-down apache-ant-bzip2.jar, so that pulls in a few megs of dependency for downstream users when commons already provides bzip2 compress/decompress supportChecklist
not sure if any additional documentation is necessary...
The pull request is split into three commits to show the logic:
I would be happy to squash, of course.