Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HtsgetBAMFileReader #1494

Merged
merged 18 commits into from
Aug 14, 2020
Merged

Conversation

andersleung
Copy link
Contributor

Description

Add a new SamReader type that is able to request reads from HTSget sources, as well as classes defining the content of an HTSget request and response. This change allows consumers of the htsjdk library to use HTSget sources like any other type of reads source.

Things to think about before submitting:

  • Make sure your changes compile and new tests pass locally.
  • Add new tests or update existing ones:
    • A bug fix should include a test that previously would have failed and passes now.
    • New features should come with new tests that exercise and validate the new functionality.
  • Extended the README / documentation, if necessary
  • Check your code style.
  • Write a clear commit title and message
    • The commit message should describe what changed and is targeted at htsjdk developers
    • Breaking changes should be mentioned in the commit message.

@codecov-commenter
Copy link

codecov-commenter commented Jul 23, 2020

Codecov Report

Merging #1494 into master will increase coverage by 0.046%.
The diff coverage is 72.156%.

@@               Coverage Diff               @@
##              master     #1494       +/-   ##
===============================================
+ Coverage     69.279%   69.325%   +0.046%     
- Complexity      8764      8888      +124     
===============================================
  Files            590       601       +11     
  Lines          34755     35433      +678     
  Branches        5800      5901      +101     
===============================================
+ Hits           24078     24564      +486     
- Misses          8386      8538      +152     
- Partials        2291      2331       +40     
Impacted Files Coverage Δ Complexity Δ
...rc/main/java/htsjdk/samtools/SamReaderFactory.java 65.487% <18.750%> (-3.381%) 7.000 <0.000> (ø)
src/main/java/htsjdk/samtools/SamReader.java 79.348% <25.000%> (-2.470%) 0.000 <0.000> (ø)
.../util/htsget/HtsgetMalformedResponseException.java 28.571% <28.571%> (ø) 1.000 <1.000> (?)
src/main/java/htsjdk/samtools/BAMFileReader.java 69.501% <50.000%> (+1.036%) 50.000 <1.000> (-2.000) ⬆️
...rc/main/java/htsjdk/samtools/SamInputResource.java 65.432% <53.846%> (-2.451%) 20.000 <3.000> (+3.000) ⬇️
...ava/htsjdk/samtools/util/htsget/HtsgetRequest.java 61.905% <61.905%> (ø) 34.000 <34.000> (?)
...dk/samtools/util/SAMRecordPrefetchingIterator.java 74.667% <74.667%> (ø) 13.000 <13.000> (?)
...main/java/htsjdk/samtools/HtsgetBAMFileReader.java 75.799% <75.799%> (ø) 24.000 <24.000> (?)
...sjdk/samtools/util/htsget/HtsgetErrorResponse.java 80.000% <80.000%> (ø) 4.000 <4.000> (?)
src/main/java/htsjdk/samtools/QueryInterval.java 71.429% <90.909%> (+4.762%) 24.000 <5.000> (+5.000)
... and 22 more

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andersleung I promised to get this done today but other stuff came up and then it got late. Here's my first round of comments but I'll probably have more tomorrow. It looks generally good so far but I think we want to change some of the handling of the Input resources.

scripts/start-htsget-test-server.sh Outdated Show resolved Hide resolved
scripts/start-htsget-test-server.sh Outdated Show resolved Hide resolved
src/main/java/htsjdk/samtools/HtsgetBAMFileReader.java Outdated Show resolved Hide resolved
* @return false, since htsget sources never have indices
*/
@Override
public boolean hasIndex() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be problematic since this method is used in 2 different ways. The first and most important is to mean "can I query on this reader or do I have to iterate over it". Obviously htsget supports queries without the index. The second way this is used though is "let me check if I can locate the index for this file so I can perform some operation on it. That's very rarely used but it is used internally by the sam reader to guard a check to getIndex.

We might want to add a new method "isQueryable" or something like that to the interface so we can split that concept up.

returning true here will be a lie but will work better with almost everything. we'd have to make getIndex return null then I think.

Either way it's unfortunate.

src/main/java/htsjdk/samtools/SamReaderFactory.java Outdated Show resolved Hide resolved
src/main/java/htsjdk/samtools/SamInputResource.java Outdated Show resolved Hide resolved
src/main/java/htsjdk/samtools/SamReaderFactory.java Outdated Show resolved Hide resolved
@@ -523,6 +548,11 @@ void applyTo(final CRAMFileReader underlyingReader, final SamReader reader) {
void applyTo(final SRAFileReader underlyingReader, final SamReader reader) {
underlyingReader.enableIndexCaching(true);
}

@Override
void applyTo(final HtsgetBAMFileReader underlyingReader, final SamReader reader) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This design is so crufty. It seems like someone started refactoring it to use ReaderImplementation so we don't need these custom handlers here as well but then never actually did it. (Just me venting... nothing to do here...)


/**
* Class allowing deserialization from json htsget error response
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: stopping here for now.

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andersleung More comments. I think I got everything this time.

src/test/java/htsjdk/samtools/HtsgetBAMFileReaderTest.java Outdated Show resolved Hide resolved
import java.io.IOException;
import java.net.URI;

public class HtsgetBAMFileReaderTest extends HtsjdkTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to run these tests with a variety of settings on the htsget reader. Particularly I'd like to see them with async on for the htsget reader since it seems plausible that there could be potential issues there. You could add a datprovider that has multiple readers.

The readers also need to be closed at the end.

Typically I'm in favor of not reusing the same objects for multiple tests and generating them fresh each time, it helps avoid test ordering problems or if 1 test puts the objects into a broken state. This should be OK though.

src/test/java/htsjdk/samtools/SamReaderFactoryTest.java Outdated Show resolved Hide resolved
public class HtsgetResponseUnitTest extends HtsjdkTest {
@Test
public void testDeserialization() {
final String respJson = "{\"htsget\":{\"format\":\"BAM\",\"urls\":[{\"url\":\"data:application/vnd.ga4gh.bam;base64,QkFNAQ==\",\"class\":\"header\"},{\"url\":\"https://htsget.blocksrv.example/sample1234/header\",\"class\":\"header\"},{\"url\":\"https://htsget.blocksrv.example/sample1234/run1.bam\",\"headers\":{\"Authorization\":\"Bearer xxxx\",\"Range\":\"bytes=65536-1003750\"},\"class\":\"body\"},{\"url\":\"https://htsget.blocksrv.example/sample1234/run1.bam\",\"headers\":{\"Authorization\":\"Bearer xxxx\",\"Range\":\"bytes=2744831-9375732\"},\"class\":\"body\"}],\"md5\":\"blah\"}}";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't wait until java adds multiline strings...

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andersleung I think this is looking good overall, but I think the prefetcher could use some changes. I think there are a number of race conditions and generally the error handling needs to be improved.

@@ -0,0 +1,5 @@
This folder contains scripts and files necessary for starting a local htsget reference server so that htsget functionality can be tested.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is helpful

src/main/java/htsjdk/samtools/HtsgetBAMFileReader.java Outdated Show resolved Hide resolved
src/main/java/htsjdk/samtools/SamReader.java Outdated Show resolved Hide resolved
src/main/java/htsjdk/samtools/HtsgetBAMFileReader.java Outdated Show resolved Hide resolved
Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andersleung A few minor comments. Looks good to me after those are resolved.

* Note that this implementation is not synchronized. If multiple threads
* access an instance concurrently, it must be synchronized externally.
*/
public class SAMRecordPrefetchingIterator implements CloseableIterator<SAMRecord> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I have two small comments. I don't see any concurrency issues so hopefully that means that there aren't any :)

This is nicer than the last implementation.

if (this.backgroundThread == null) return;
/*
If prefetch thread is interrupted while awake and before acquiring permits, it will either acquire the permits
and pass through to the next case, or check interruption status before sleeping then exit immediately
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation.


import java.util.stream.IntStream;

public class SAMRecordPrefetchingIteratorTest extends HtsjdkTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing these tests!

@@ -65,6 +67,7 @@ private void prefetch() {
// InterruptedException is expected if the iterator is being closed
return;
} catch (final Throwable t) {
t.printStackTrace();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should only be printed in the case of an Error, not in the normal case of an Exception.

@lbergelson lbergelson merged commit e803eea into samtools:master Aug 14, 2020
@lbergelson
Copy link
Member

@andersleung 👍

@lbergelson lbergelson added the GA4GH Collaboration with GA4GH label Sep 1, 2020
brainstorm added a commit to umccr/igv that referenced this pull request Sep 20, 2020
co-authored-by: Florian Reisinger <florian.reisinger@unimelb.edu.au>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GA4GH Collaboration with GA4GH
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants