Fix BEDCodec.canDecode() to handle block-compressed extensions #704

Merged
merged 2 commits into from Sep 13, 2016

Conversation

Projects
None yet
3 participants
Contributor

magicDGS commented Sep 12, 2016 edited

Description

When using canDecode("filename.bed.gz") it will return false, but the AbstractFeatureReader could support block-compressed bed files. This could cause problems when trying to find the codec for a file (for instance, broadinstitute/gatk#2131).

I added here a check for block-compression extensions and tests for them.

Checklist

  • Code compiles correctly
  • New tests covering changes and new functionality
  • All tests passing
  • Extended the README / documentation, if necessary
  • Is not backward compatible (breaks binary or source compatibility)
@magicDGS magicDGS fix BEDCodec.canDecode to handle block-compressed extensions
70b8f35

Coverage Status

Coverage increased (+0.003%) to 68.924% when pulling 70b8f35 on magicDGS:dgs_fix_bed_candecode into c3d5a88 on samtools:master.

lbergelson self-assigned this Sep 12, 2016

@lbergelson lbergelson commented on the diff Sep 12, 2016

src/test/java/htsjdk/tribble/bed/BEDCodecTest.java
@@ -226,4 +226,15 @@ private void createIndex(File testFile, File idxFile) throws IOException {
public void testGetTabixFormat() {
Assert.assertEquals(new BEDCodec().getTabixFormat(), TabixFormat.BED);
}
+
@lbergelson

lbergelson Sep 12, 2016

Contributor

Could you add a small block compressed bedfile and a test that shows that it actually can read from it as well?

@magicDGS

magicDGS Sep 12, 2016

Contributor

I will add it tomorrow, I don't have now a command line available to create the test file.

@magicDGS

magicDGS Sep 13, 2016

Contributor

I think that I don't need to implement a new test for this. A FeatureReader for bgzip compressed files and tabix index queries is implemented in FeatureReaderTest.testBedQuery().

@lbergelson

lbergelson Sep 13, 2016

Contributor

ah, good point, I didn't see

@lbergelson lbergelson and 1 other commented on an outdated diff Sep 12, 2016

src/main/java/htsjdk/tribble/bed/BEDCodec.java
@@ -197,7 +201,13 @@ private void createExons(int start, String[] tokens, FullBEDFeature gene,
@Override
public boolean canDecode(final String path) {
- return path.toLowerCase().endsWith(".bed");
+ final String toDecode;
+ if (AbstractFeatureReader.hasBlockCompressedExtension(path)) {
+ toDecode = path.substring(0, path.lastIndexOf("."));
+ } else {
+ toDecode = path;
+ }
+ return toDecode.toLowerCase().endsWith(".bed");
@lbergelson

lbergelson Sep 12, 2016

Contributor

While you're here, could you expose this as a static constant.

@magicDGS

magicDGS Sep 12, 2016

Contributor

Sure. Do you prefer a simple String constant or a List<String> for the case of new extensions are added later?

@lbergelson

lbergelson Sep 12, 2016

Contributor

I think just a String. I have trouble imagining big changes in the bed file world in the near future. We can always revisit if that happens.

@magicDGS

magicDGS Sep 12, 2016

Contributor

Done.

@lbergelson lbergelson and 1 other commented on an outdated diff Sep 12, 2016

src/main/java/htsjdk/tribble/bed/BEDCodec.java
import htsjdk.tribble.AsciiFeatureCodec;
import htsjdk.tribble.annotation.Strand;
import htsjdk.tribble.index.tabix.TabixFormat;
import htsjdk.tribble.readers.LineIterator;
import htsjdk.tribble.util.ParsingUtils;
+import org.apache.commons.compress.compressors.FileNameUtil;
@lbergelson

lbergelson Sep 12, 2016

Contributor

are these new imports actually used?

@magicDGS

magicDGS Sep 12, 2016

Contributor

Only one, I will remove them.

@magicDGS

magicDGS Sep 12, 2016

Contributor

Done.

Contributor

lbergelson commented Sep 12, 2016

@magicDGS Looks good, two small comments. Thanks for the pull request! (And sorry I haven't gotten back to you yet on some of your older ones. I'll try to get those reviewed soon!)

Coverage Status

Coverage increased (+0.009%) to 68.93% when pulling cb6cd41 on magicDGS:dgs_fix_bed_candecode into c3d5a88 on samtools:master.

Contributor

magicDGS commented Sep 13, 2016

I addressed your comments (except test) -- Back to you @lbergelson
And thanks for the review of my PRs!

@magicDGS @magicDGS magicDGS addressed comments
94940d3

Coverage Status

Coverage increased (+0.009%) to 68.93% when pulling 94940d3 on magicDGS:dgs_fix_bed_candecode into c3d5a88 on samtools:master.

Coverage Status

Coverage increased (+0.009%) to 68.93% when pulling 94940d3 on magicDGS:dgs_fix_bed_candecode into c3d5a88 on samtools:master.

Contributor

lbergelson commented Sep 13, 2016

👍

@lbergelson lbergelson merged commit fb1ba06 into samtools:master Sep 13, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.009%) to 68.93%
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment