Fix for picard issue 574: cram reader should look for the index file automatically #716

Merged
merged 1 commit into from Oct 27, 2016

Conversation

Projects
None yet
4 participants
Contributor

dnikolaeva commented Sep 29, 2016 edited

Fix for issue #574 (picard):

ViewSam works very slowly with CRAM files even if index file .crai exists. It turns out, in the case the index is not set explicitly CRAMFileReader does not try to find CRAM index file. We propose improvements in htsjdk project that make CRAMFileReader works with CRAM index automatically. To achieve this, we have expanded constructors in CRAMFileReader as follows: if indexFile is null try to find it using method findIndex in SamFiles class. All proposed improvements include unit tests to validate the results.

Coverage Status

Coverage increased (+0.04%) to 69.08% when pulling cca696a on EpamLifeSciencesTeam:issue_574_cram_index into 88b6719 on samtools:master.

dnikolaeva changed the title from fixed picard issue 574: cram reader should look for the index file au… to Fix for picard issue 574: cram reader should look for the index file automatically Sep 29, 2016

cmnbroad self-assigned this Sep 29, 2016

@cmnbroad

Thanks for making this PR. I requested a couple of (hopefully not too difficult) changes.

@@ -95,6 +97,7 @@ public CRAMFileReader(final File cramFile, final InputStream inputStream,
this.cramFile = cramFile;
this.inputStream = inputStream;
this.referenceSource = referenceSource;
+ if (cramFile != null) initIndexFile(null);
@cmnbroad

cmnbroad Sep 30, 2016 edited

Contributor

Since initIndexFile is called in the middle of the constructors, I would make it static (you'll have to pass in cramFile as well) and just have it return the updated indexFile, which the constructors can use to initialize the member variable. You might rename it (maybe "findIndexForFile" ?).

@ZLyanov

ZLyanov Oct 6, 2016

Contributor

done

@@ -214,8 +218,25 @@ public CRAMFileReader(final InputStream stream,
public CRAMFileReader(final File cramFile,
final File indexFile, final CRAMReferenceSource referenceSource,
final ValidationStringency validationStringency) throws IOException {
- this(new FileInputStream(cramFile), indexFile, referenceSource, validationStringency);
+ this(new FileInputStream(cramFile),
+ wrapIndexFile(cramFile, indexFile),
@cmnbroad

cmnbroad Sep 30, 2016 edited

Contributor

Rather than having this somewhat redundant single-use static method, I think it would be cleaner to remove it, and instead factor out the common initialization code from the constructor it's delegating to, and call that method from both constructors (this method would no longer delegate to the other one).

@ZLyanov

ZLyanov Oct 6, 2016

Contributor

Could you clarify what gain we would get?

@cmnbroad

cmnbroad Oct 7, 2016

Contributor

It would be an incremental step in minimizing the number of code paths (and exception propagation strategies) for creating SeekableStreams in this code. There are several bugs in the index/iterator state management in this code (some of which can be seen here). Ultimately, this needs a bigger refactoring anyway. At a minimum, if you keep this method, can you rename it to something more specific - maybe getStreamForIndex ? Thx.

@ZLyanov

ZLyanov Oct 14, 2016

Contributor

fixed. see the updated version. We have extracted common initialization code from the two constructors into the 'initWithStreams' method and inlined mentioned above 'wrapIndexFile' method.

@@ -42,13 +43,17 @@
public class CRAMFileReaderTest {
private static final File TEST_DATA_DIR = new File("src/test/resources/htsjdk/samtools");
+ private static final File CRAM_FILE = new File(TEST_DATA_DIR, "cram_with_crai_index.cram");
@cmnbroad

cmnbroad Sep 30, 2016

Contributor

Can you rename this to something like CRAM_WITH_CRAI, to make the tests read a little easier.

@ZLyanov

ZLyanov Oct 6, 2016

Contributor

done

+ public void testCRAMReader7_ShouldUseCRAMIndex() throws IOException {
+ CRAMFileReader reader = new CRAMFileReader(CRAM_FILE, INDEX_FILE, REFERENCE, ValidationStringency.STRICT);
+ Assert.assertTrue(reader.hasIndex(), "Can't find existing CRAM index.");
+ }
@cmnbroad

cmnbroad Sep 30, 2016

Contributor

It would be good to add a negative test for each constructor that got updated just to make sure there are no problems in those code paths. There are plenty of cram files in the test folder with no index.

@ZLyanov

ZLyanov Oct 6, 2016

Contributor

done

Contributor

cmnbroad commented Sep 30, 2016

@dnikolaeva Thanks for the PR - a couple of requests for changes - back to you.

Coverage Status

Coverage increased (+0.03%) to 69.074% when pulling 8baf865 on EpamLifeSciencesTeam:issue_574_cram_index into 88b6719 on samtools:master.

Coverage Status

Coverage increased (+0.04%) to 69.077% when pulling 3fa79b0 on EpamLifeSciencesTeam:issue_574_cram_index into 88b6719 on samtools:master.

@cmnbroad

Looks good - thanks for making those changes (though for some reason I'm unable to pull these changes from your branch like I could the first time - not sure why). One minor formatting request, then squash/rebase and we can merge. Thanks for fixing this.

@@ -95,6 +97,7 @@ public CRAMFileReader(final File cramFile, final InputStream inputStream,
this.cramFile = cramFile;
this.inputStream = inputStream;
this.referenceSource = referenceSource;
+ if (cramFile != null) mIndexFile = findIndexForFile(null, cramFile);
@cmnbroad

cmnbroad Oct 26, 2016

Contributor

Add braces and move the if-stmt body to a separate line.

@ZLyanov ZLyanov Fix for Issue 574: CRAM index
e059a03

Coverage Status

Coverage increased (+0.04%) to 69.777% when pulling e059a03 on EpamLifeSciencesTeam:issue_574_cram_index into 58f4154 on samtools:master.

@cmnbroad cmnbroad merged commit 6f0512b into samtools:master Oct 27, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.04%) to 69.777%
Details
Contributor

ZLyanov commented Oct 27, 2016

@cmnbroad Sorry, forgot to place comments.. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment