Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Htsget unable to fetch header from some endpoints #67

Open
cmdcolin opened this issue Nov 9, 2020 · 7 comments
Open

Htsget unable to fetch header from some endpoints #67

cmdcolin opened this issue Nov 9, 2020 · 7 comments

Comments

@cmdcolin
Copy link
Collaborator

cmdcolin commented Nov 9, 2020

Ref ga4gh endpoint here

samtools/hts-specs#530

Rejects the bogus refname provided that we use currently

Have to adjust to not specify any refname, but then it returns very large data chunks, and we have to range-request the results of what it gives back

@brainstorm
Copy link

brainstorm commented Nov 9, 2020

I suspect that's partly being tackled by @jb-adams in ga4gh/htsget-refserver#8 ?

@cmdcolin
Copy link
Collaborator Author

cmdcolin commented Nov 9, 2020

Could be! On some level, I think this code should figure out how to be more like samtools and figure it out but I'll certainly check the pr especially if it is deployed somewhere

@cmdcolin
Copy link
Collaborator Author

cmdcolin commented Jul 7, 2021

For dnanexus's webserver, we request a bogus refname because otherwise it says the "header" involves a download of 10GB of data, and we don't try to "subselect" the range that it gives us

We could consider dropping support for dnanexus's htsget server so that ga4gh's htsget server works, or we find a fix that accomodates both, or just leave as is

@cmdcolin
Copy link
Collaborator Author

cmdcolin commented Jul 7, 2021

See the behavior of the dnanexus server here

#range is the entire file, e.g. 140gb, which our code doesn't currently try to subselect from resulting in bad behavior if used
http://htsnexus.rnd.dnanex.us/v1/reads/BroadHiSeqX_b37/NA12878?class=header

#reasonable size, all data encoded in a data uri even
http://htsnexus.rnd.dnanex.us/v1/reads/BroadHiSeqX_b37/NA12878?class=header&referenceName=DOES_NOT_EXIST

@brainstorm
Copy link

brainstorm commented Jul 8, 2021

IIRC the htsnexus htsget server might not as up to date as the GA4GH reference htsget server? Please refer to the official public GA4GH server endpoints mentioned in here:

igvteam/igv.js#1187 (comment)

So yes, I'd consider dropping support for previous spec versions, tbh.

/cc @mlin @ohofmann

@cmdcolin
Copy link
Collaborator Author

cmdcolin commented Jul 8, 2021

Ya that was the impetus for the comment. However, my workaround to work with the dnanexus server (to add a random referenceName to the class=header request) does not work with the ga4gh server. I kind of figured the hacky behavior to add the random refname wouldn't be great but I got to figure out what to do next

@brainstorm
Copy link

brainstorm commented Jan 19, 2022

That page is now gone along with the deprecated endpoints. I was about to suggest using the official GA4GH htsget endpoint, but it seems to be undergoing some issues for a couple of weeks now?:

Screen Shot 2022-01-19 at 3 19 01 pm

/cc @jb-adams can you tilt that one back up please?
/cc @victorskl @andrewpatto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants