-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
some samples fail with --force_sratools_download due to changes in prefetch results #98
Comments
Hi. I confirm that the fetchngs is not useable as it is now for SRA ids. Neither version 1.6 or 1.4. |
so for me most of them still work just some are not available. did you try use teh |
Hi. Yes it doesn't work. I double checked and in fact all my IDs are Geo Sample IDs (GSM). |
Although I think fetchngs is designed to also link these back SRA accessions I would suggest using the SRA run selector or the Entrez API to link GSM to SRA accessions maybe that solves the problem |
Thanks for the suggestion. I managed to convert the IDs using the Entrez API as you suggested and run the command again with or without the --force_sratools_download option, but I get the same error message as before. |
Could you post the exact error you get and for which process this occurs? |
Yes, here is the error:
|
Okay this is the same thing I get connected to this issue. What I did to get around this is locally changing the bin/sra_ids_to_runinfo.py file so that the ENA FTP is preferred to SRA which can simply be done by changing line 229 from |
It works now. Thanks a lot! |
Hi @dmalzl ! Thanks for looking into a fix. Yes, the NCBI changed their APIs yet again with a breaking change. Given this pipelines and other tools make assumptions about the API calls unfortunately the only thing we can do is to patch fix on the fly... Have you by any chance found a backwards compatible fix? If so, we can do a patch release straight away. We have plans to use and contribute to |
Hi @drpatelh, Unfortunately, the only things I could come up with is using the ENA FTP by default and falling back on the FTP when prefetch failed. So what I did was first ignoring all fails of prefetch due to it only downloading the Such API changes are so annoying and I know this is out of your hands. Just wanted to point it out so that other users know what's going on. Thanks for looking into it though |
This also worked for me. |
This issue should mostly be solved I think after the API was fixed. Feel free to re-open if the problem persists. |
Description of the bug
It is now over a month that I handle my data with
fetchngs
and I am pretty satisfied with the results. However, I recently encountered some difficulties when trying to force data download via sratools. Previously everything worked fine (in this context previously refers to the month May) but I had to reprocess and thus redownload some of the samples which resulted in pipeline fails due to error when fetching the data withprefetch
. I vaguely remember reading somewhere that the SRA has made changes to its data storage policies or similar around beginning of June and the error I get as well as the timing (i.e. rerunning the same pipe command with as in May in June) is quite a hint towards a connection to this change. Looking at the.command.log
file of the respective jobs reveals the core of the issue whereprefetch
will not download the typical*.sra
file but something called*.sralite
which is not recognized by the subsequentvdb-validate
command asprefetch
just puts it in the temp directory and not in the./temp_dir/SRAsomething
directory as expected byvdb-validate
. This in turn causes the pipeline to fail. I haven't looked into it further as to ifvdb-validate
also excepts the*.sralite
file and the problem being resolved by just checking if prefetch generates the expected folder or the*.sralite
file and handling the cases accordingly. However, downloading the failing samples via the ENA FTP is still possible so a temporary fix is downloading everything I can withsratools
and fetching the rest from the FTP.Command used and terminal output
Relevant files
No response
System information
No response
The text was updated successfully, but these errors were encountered: