Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Fix for picard issue 574: cram reader should look for the index file automatically #716
Conversation
coveralls
commented
Sep 29, 2016
dnikolaeva
changed the title from
fixed picard issue 574: cram reader should look for the index file au… to Fix for picard issue 574: cram reader should look for the index file automatically
Sep 29, 2016
cmnbroad
self-assigned this
Sep 29, 2016
cmnbroad
requested changes
Sep 30, 2016
Thanks for making this PR. I requested a couple of (hopefully not too difficult) changes.
| @@ -95,6 +97,7 @@ public CRAMFileReader(final File cramFile, final InputStream inputStream, | ||
| this.cramFile = cramFile; | ||
| this.inputStream = inputStream; | ||
| this.referenceSource = referenceSource; | ||
| + if (cramFile != null) initIndexFile(null); |
cmnbroad
Sep 30, 2016
•
Contributor
Since initIndexFile is called in the middle of the constructors, I would make it static (you'll have to pass in cramFile as well) and just have it return the updated indexFile, which the constructors can use to initialize the member variable. You might rename it (maybe "findIndexForFile" ?).
| @@ -214,8 +218,25 @@ public CRAMFileReader(final InputStream stream, | ||
| public CRAMFileReader(final File cramFile, | ||
| final File indexFile, final CRAMReferenceSource referenceSource, | ||
| final ValidationStringency validationStringency) throws IOException { | ||
| - this(new FileInputStream(cramFile), indexFile, referenceSource, validationStringency); | ||
| + this(new FileInputStream(cramFile), | ||
| + wrapIndexFile(cramFile, indexFile), |
cmnbroad
Sep 30, 2016
•
Contributor
Rather than having this somewhat redundant single-use static method, I think it would be cleaner to remove it, and instead factor out the common initialization code from the constructor it's delegating to, and call that method from both constructors (this method would no longer delegate to the other one).
cmnbroad
Oct 7, 2016
Contributor
It would be an incremental step in minimizing the number of code paths (and exception propagation strategies) for creating SeekableStreams in this code. There are several bugs in the index/iterator state management in this code (some of which can be seen here). Ultimately, this needs a bigger refactoring anyway. At a minimum, if you keep this method, can you rename it to something more specific - maybe getStreamForIndex ? Thx.
ZLyanov
Oct 14, 2016
Contributor
fixed. see the updated version. We have extracted common initialization code from the two constructors into the 'initWithStreams' method and inlined mentioned above 'wrapIndexFile' method.
| @@ -42,13 +43,17 @@ | ||
| public class CRAMFileReaderTest { | ||
| private static final File TEST_DATA_DIR = new File("src/test/resources/htsjdk/samtools"); | ||
| + private static final File CRAM_FILE = new File(TEST_DATA_DIR, "cram_with_crai_index.cram"); |
cmnbroad
Sep 30, 2016
Contributor
Can you rename this to something like CRAM_WITH_CRAI, to make the tests read a little easier.
| + public void testCRAMReader7_ShouldUseCRAMIndex() throws IOException { | ||
| + CRAMFileReader reader = new CRAMFileReader(CRAM_FILE, INDEX_FILE, REFERENCE, ValidationStringency.STRICT); | ||
| + Assert.assertTrue(reader.hasIndex(), "Can't find existing CRAM index."); | ||
| + } |
cmnbroad
Sep 30, 2016
Contributor
It would be good to add a negative test for each constructor that got updated just to make sure there are no problems in those code paths. There are plenty of cram files in the test folder with no index.
|
@dnikolaeva Thanks for the PR - a couple of requests for changes - back to you. |
coveralls
commented
Oct 6, 2016
coveralls
commented
Oct 14, 2016
cmnbroad
requested changes
Oct 26, 2016
Looks good - thanks for making those changes (though for some reason I'm unable to pull these changes from your branch like I could the first time - not sure why). One minor formatting request, then squash/rebase and we can merge. Thanks for fixing this.
| @@ -95,6 +97,7 @@ public CRAMFileReader(final File cramFile, final InputStream inputStream, | ||
| this.cramFile = cramFile; | ||
| this.inputStream = inputStream; | ||
| this.referenceSource = referenceSource; | ||
| + if (cramFile != null) mIndexFile = findIndexForFile(null, cramFile); |
coveralls
commented
Oct 27, 2016
cmnbroad
merged commit 6f0512b
into
samtools:master
Oct 27, 2016
|
@cmnbroad Sorry, forgot to place comments.. Thanks! |
dnikolaeva commentedSep 29, 2016
•
edited
Fix for issue #574 (picard):
ViewSam works very slowly with CRAM files even if index file .crai exists. It turns out, in the case the index is not set explicitly CRAMFileReader does not try to find CRAM index file. We propose improvements in htsjdk project that make CRAMFileReader works with CRAM index automatically. To achieve this, we have expanded constructors in CRAMFileReader as follows: if indexFile is null try to find it using method findIndex in SamFiles class. All proposed improvements include unit tests to validate the results.