-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes and tests for CRAM index queries. #552
Conversation
@vadimzalunin Can you review? We were somewhat surprised to learn that cram queries are broken in so many cases.. |
74e0dda
to
1a5b896
Compare
if (alignEnd == 0) { | ||
// handle placed but "unmapped" since they can pass the overlap test | ||
return (alignStart >= interval.start) && | ||
(interval.end <= 0 || alignStart <= interval.end); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle case where alignStart < interval.start && alignEnd == 0
in a manner consistent with the BAM-querying code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(BAMQueryMultipleIntervalsIteratorFilter.compareIntervalToRecord()
suggests that unmapped reads with a start position are assigned an end position equal to start for BAM queries)
I've added a second commit to this PR with fixes for the comments here, as well as a proposed fix for #551. The second commit reduces the two CRAMFileReader interval iterator implementations (IntervalIterator and MultipleIntervalIterator) into a single implementation that uses (a temporarily cloned copy of the) interval matching code used by the BAM iterators iterators to do interval matching. If we get agreement that this is the right strategy, I'll refactor the cloned code so it can be shared between the implementations. |
@cmnbroad the approach looks good. A shared iterator impl would be ideal. |
dfd2d39
to
4fec505
Compare
@vadimzalunin Please review. thx. |
4fec505
to
01b87ee
Compare
} | ||
} | ||
throw new SAMException("Failed to query alignment start: " + sequence + " at " + start); | ||
return new CRAMIntervalIterator(new QueryInterval[]{new QueryInterval(referenceIndex, start, -1)}, true); | ||
} | ||
|
||
CloseableIterator<SAMRecord> query(final int referenceIndex, | ||
final int start, final int end, final boolean overlap) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: Delete this method. I meant to remove it as its no longer used internally and should just be deleted (its not part of the public query interface, and is confusing since it reverses the polarity of contained/overlap compared to the other methods).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 on deletion if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@vadimzalunin Do you have time to do a final review on this? This is blocking us from safely using CRAM as input to the GATK, since our queries on CRAMs are currently not returning the correct results. Thanks! |
/** | ||
* Filters out records that do not match any of the given intervals and query type. | ||
*/ | ||
public class BAMQueryMultipleIntervalsIteratorFilter implements BAMIteratorFilter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you've made this formerly private class public, can you add direct unit tests for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
new String[]{"a", "b", "c", "d", "e"}}, | ||
{cramQueryWithBAI, cramQueryReference, | ||
new QueryInterval[]{new QueryInterval(0, 100010, 100010), new QueryInterval(0, 100011, 100011)}, | ||
new String[]{"a", "b", "c", "d", "e"}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend having more than two test cases here. Should have tests involving multiple contigs, as well as more than two intervals, and also cover cases like no reads overlap any intervals, first read is before the first interval, first interval is before the first read, etc. If your existing test file doesn't have enough reads/contigs, can use something like src/test/resources/org/broadinstitute/hellbender/engine/reads_data_source_test1.bam
from the GATK, which uses a tiny reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can also try porting the test cases from ReadsDataSourceUnitTest
here, which use the reads_data_source_test1.bam
file mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added quite a few more tests for both overlapped and contained.
First-pass review complete, back to @cmnbroad |
5b14052
to
9db6fdc
Compare
Didn't squash or rebase yet but otherwise I think this covers it. |
} catch (final IOException e) { | ||
throw new RuntimeEOFException(e); | ||
} | ||
next(); // advance to the first record that matches the filter criteria |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you confirmed that this iterator works with an empty cram file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a couple of tests to validate this.
Review complete, back to @cmnbroad. Merge after addressing remaining comments and rebasing. |
9db6fdc
to
7c4f032
Compare
7c4f032
to
12b0345
Compare
Includes fixes for #481, #550, #551, and #562;