-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trouble reading SRA CRAM files #1254
Comments
|
More info: But the GCP instance has no trouble accessing the URL directly with curl, so it's not a routing or firewall issue: I then tried uploading ~/.cache/hts-ref from the local cluster to the GCP instance to see if it would work around the issue, but I still get the same error. |
|
My last comment about uploading an hts-ref directory was incorrect. It actually does work around the issue. I had set REF_PATH in a previous attempt to fix the issue and forgot to remove it, so samtools was ignoring my ~/.cache/hts-ref directory. |
|
Now I'm getting the same error on my FreeBSD workstation where it was working flawlessly last week. Manually downloading from the command-line works fine, though. I'm using the script below as a workaround. It takes the URL straight from the error message |
|
Hello Jason, Sorry for not replying sooner. With |
|
Good timing for this new feature. Here's my output from a samtools 1.10 run. Doesn't mean much to me, but maybe suggests a problem communicating with the EBI squid server? Running curl outside samtools still works fine. Thanks, JB |
|
It looks like samtools managed to contact the EBI server and got the HTTP headers back. Then something went wrong while it was trying to read the data. Does the error happen all the time, or is it intermittent? Trying |
|
It happens every time. Here's some verbose curl output. Tell you anything? Thanks, JB |
|
Looks like the same output I get when I run the same command. There is nothing that stands out to me. |
|
Can you reproduce the issue with samtools? Maybe grab a CRAM from SRA to keep the variable to a minimum. |
I don't think we have access to any GCP instances. |
|
It's not limited to GCP like I originally thought. As I mentioned earlier, I'm seeing the same error on my FreeBSD workstation and the recent debugging output is all from a pristine local CentOS 7 VM. Running curl directly works fine in all environments. |
|
My mistake, I was fixated on the GCP aspect. Do you have a suggested cram file I can download? |
|
So I tried replicating your error on my home laptop by inserting the M5 tag into a tiny test file. It all seemed to work for me. I'll include the output, maybe you can spot something I've missed.
|
|
I don't think we are going to get anywhere without knowing what error is being thrown. We could change http_status_errno and easy_error (and possibly multi_errno) in hfile_curl.c to print out the error number. |
|
Regarding your earlier question, all of our CRAMs are restricted access from the Women's Health Initiative on SRA. I'm not sure if there are equivalent CRAMs that you would be able to access. Might be worth exploring... As for debugging, I took your suggested and patched a couple of fprintfs into the latest htslib commit: https://github.com/outpaddling/freebsd-ports-wip/tree/master/htslib Running this with the latest samtools commit produced the following. It shows that easy_errno() is receiving a code of 43. From where I'm not sure yet. There are a lot of calls to this function... But I found this in curl.h:
[E::cram_next_slice] Failure to decode slice |
|
I'm wondering if this is a curl problem or if something is failing after the download. The download runs for a long time and I can see activity under iftop or netstat that looks similar to what I get from a manual curl/wget/fetch. |
Taken together, these indicate that this is indeed the same problem as #1284. So when htslib is built against a sufficiently-recent libcurl, it can be reproduced with essentially any files accessed over http(s). |
|
I just installed curl-7.67.0 from the FreeBSD ports history and this eliminated the problem. The same version is installed in an old pkgsrc tree on our CentOS cluster and samtools works there as well. I assume it won't be hard to fix htslib to work with the latest libcurl, but we have a workaround for now in any case. |
|
HTSlib pull request samtools/htslib#1105 should fix this. Would it be possible to give it a try on your systems? |
|
I applied the pull request to commit 9c357445... on a FreeBSD server with curl 7.71.0. So far so good: It successfully cached the first reference for a large CRAM file, which it has been unable to do for some time. |
|
For anyone else that runs into this issue, the following worked for me: |
I'm getting these "Unable to fetch reference" errors when trying to run samtools view on a GCP instance:
Here's the code producing the output above:
I can access the CRAM files with other tools, e.g. cat to /dev/null.
The exact same samtools builds and CRAM files work on a local machine to which I've downloaded a few of the CRAM files.
I've tried CentOS 8 and CentOS 7 on the GCP instances. On CentOS 8 I've tried samtools 1.9 built from scratch and by pkgsrc, as well as samtools 1.10 from pkgsrc.
Locally I can process the same CRAM files under CentOS 7 and FreeBSD without any issues. I've stripped the command down to the most basic possible and still consistently get this error.
All systems are up-to-date with the latest patches.
Any ideas what might cause this?
Thanks much.
The text was updated successfully, but these errors were encountered: