Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bzip2 fixes in cram.io.ExternalCompression #533

Merged
merged 1 commit into from
May 26, 2016

Conversation

bpow
Copy link
Contributor

@bpow bpow commented Mar 22, 2016

Description

A couple of fixes to BZip2 support in h.s.cram.io.ExternalCompression:

  1. The existing version of bzip2 uses a BZip2CompressorInputStream. This is the Apache commons compress class for uncompressing from bzip2. This should be BZip2CompressorOutputStream. There wasn't a test for this before, so I added one
  2. unbzip2 was using CBZip2InputStream (from ant) when there is a perfectly good BZip2CompressorInputStream in commons compress which is already a dependency. The classpath dependency on ant had been removed in dd90c3e but was put back in d92460d. The logic for its removal is similar to before, but now with an additional reason: the build.sbt that produces the maven artifact uses the whole ant jar file instead of the pared-down apache-ant-bzip2.jar, so that pulls in a few megs of dependency for downstream users when commons already provides bzip2 compress/decompress support
  3. Not directly related, but testng is no longer a "compile" configuration dependency (no classes in the main code depend on it, to what I can tell... I think there used to be some), and can be marked as a "test" configuration dependency so downstream code won't automatically pull it in with maven. It would be helpful if someone could run sbt to make sure this works.

Checklist

  • Code compiles correctly
  • New tests covering changes and new functionality (and to demonstrate old class failing)
  • All tests passing
  • [?] Extended the README / documentation, if necessary

not sure if any additional documentation is necessary...

The pull request is split into three commits to show the logic:

  1. The tests that show existing code failing
  2. The fixes
  3. Removal of dependencies.

I would be happy to squash, of course.

@cmnbroad
Copy link
Collaborator

@bpow - thanks for finding (and fixing!) this issue. The current cram implementation doesn't appear to ever use bzip for compression, so this appears to have gone unnoticed. Its part of the spec though so it should work in the case that we need to decompress a cram file that does use it.

The code itself looks good to me (@vadimzalunin can you have a look).

I'm not sure about the sbt changes though, or how to verify those. @lbergelson ??.

@cmnbroad
Copy link
Collaborator

If we take this it will also fix #360.

libraryDependencies += "org.apache.ant" % "ant" % "1.8.2"

libraryDependencies += "org.testng" % "testng" % "6.8.8"
libraryDependencies += "org.testng" % "testng" % "6.8.8" % "test"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmnbroad These changes look fine to me from an htsjdk perspective. testng is only used in tests. We could probably just remove it from the sbt completely since we don't actually use the sbt for anything other than publishing to maven.

I can't see any usage of the ant dependency either. It's possible it's needed for the ant build, but that would be unaffected by removing it. Likewise, picard probably requires these be included by htsjdk, but since it doesn't access them through maven it won't matter if we remove them from the sbt.

@bpow
Copy link
Contributor Author

bpow commented Apr 6, 2016

Rebased to current master/HEAD

@cmnbroad
Copy link
Collaborator

@vadimzalunin - Can you want to take a look at these changes before we merge them since they mostly affect CRAM ?

@cmnbroad
Copy link
Collaborator

@bpow Thanks for your patience. Can you squash this down to 2 commits (one with the compression code changes and a separate one with the sbt changes).

@bpow
Copy link
Contributor Author

bpow commented Apr 13, 2016

Rebased and squashed.

Note that the second commit also removes the binary jar lib/apache-ant-1.8.2-bzip2.jar (it can be easy to overlook file deletions in git logs). It is possible that some downstream users could be relying on this jar being coalesced into the htsjdk jar, but if so they should be encouraged to switch to commons-compress or to depend on ant explicitly.

@vadimzalunin
Copy link
Contributor

looks good to me

@cmnbroad
Copy link
Collaborator

cmnbroad commented Apr 15, 2016

@bpow I though I'd be able to test these sbt changes, but I can't, and I don't want to merge this without testing them. So we either need to take those out of the PR (which is fine since I think we're headed towards gradle soon) or we need to hold off until we get the local publish working. Thanks again for your help and patience with this.

@bpow
Copy link
Contributor Author

bpow commented Apr 20, 2016

Rebased to move deletion of lib/apache-ant-1.8.2-bzip2.jar to the first commit and remove the second commit for now.

Should I make a separate issue/pull request for the sbt edits, knowing that it would be on the back burner until someone can test it or until unnecessary because of a switch to gradle? Is the sbt build used for maven publishing? If so, then not removing the dependencies I had listed before bloats the maven artifact.

@lbergelson
Copy link
Member

@cmnbroad publishLocal should be fixed now. #573 @bpow I'd recommend removing the ant requirement from the sbt if you're deleting it here.

@bpow
Copy link
Contributor Author

bpow commented Apr 22, 2016

I'm not sure of the best way to proceed. Should I put the sbt changes back in this pull request or open a separate pull request for the sbt changes?

I'm not able to fully test the sbt changes either-- sbt test on my machine gives several test failures for java.lang.OutOfMemoryError. I notice that the build.xml increases memory to 2G for tests.

@lbergelson
Copy link
Member

@bpow Sorry for my slow response. I think you should add the sbt changes back in. They're only for publishing to maven, and if the library is not need with this change than it should be removed in all places. If downstream people depend on have it here than they'll have to fix it themselves.

@bpow
Copy link
Contributor Author

bpow commented May 9, 2016

The removal of unneeded dependencies is back in, and everything is squashed down to one commit.

@bpow
Copy link
Contributor Author

bpow commented May 16, 2016

@cmnbroad : I updated this for the gradle build process now that #383 has been committed.

Note that in addition to fixing the bzip2 compression forcram files, removing the compile dependency on ant results in a shadowJar that is 36% smaller.

Use commons compress for bzip2 functionality and have
ExternalCompression::bzip2 actually compress instead of decompress

This also obviates the binary blob lib/apache-ant-1.8.2-bzip2.jar and
means that ant is no longer a classpath dependency.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.004%) to 67.373% when pulling 00344d1 on bpow:bzip2-fixes into 72ae3f7 on samtools:master.

@bpow
Copy link
Contributor Author

bpow commented May 24, 2016

I'm trying not to be a pest... Is there any additional information I can supply regarding this PR, or has it just fallen off people's radar?

It seems to me like a relatively simple fix for a clear error in the code (complete with a unit test that demonstrates the problem and passes when the problem is fixed). I've been responsive to all suggestions made so far, and have kept the PR up-to-date with other changes in the codebase (including the switch to gradle).

People that may have input:

  • @cmnbroad, to whom the PR is assigned
  • @vadimzalunin, since it affects cram
  • @lbergelson, since it affects the build dependencies in the build.gradle file
  • @nh13, since his commit (d92460d) was the one that added the ant dependency that had been removed in dd90c3e

It is rebased again to master.

@cmnbroad
Copy link
Collaborator

@bpow We haven't forgotten about this, we've just been conservative regarding the dependency change (the code change itself I think is uncontroversial). I had hoped @lbergelson would review the last post-gradle PR, but he is away on vacation now, and I have successfully built with this. I'll merge this week if no one objects.

@cmnbroad cmnbroad merged commit 67714e0 into samtools:master May 26, 2016
@cmnbroad
Copy link
Collaborator

@bpow thanks for your help with this.

@jmarshall
Copy link
Member

Post 909381a, src/tests/java/htsjdk/samtools/cram/io/ExternalCompressionTest.java and testdata/htsjdk/samtools/io/bzip2-test.bz2 are all on their lonesome and presumably in the wrong place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants