Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All grabseqs SRA downloads failing #55

Open
cdiener opened this issue Jun 28, 2022 · 10 comments
Open

All grabseqs SRA downloads failing #55

cdiener opened this issue Jun 28, 2022 · 10 comments

Comments

@cdiener
Copy link

cdiener commented Jun 28, 2022

Looks like some changes on the NCBI side lead to failures in SRA downloads:

grabseqs sra SRR11733975
Traceback (most recent call last):
  File "/users/cdiener/miniconda3/envs/sra/bin/grabseqs", line 11, in <module>
    sys.exit(main())
  File "/users/cdiener/miniconda3/envs/sra/lib/python3.7/site-packages/grabseqslib/__init__.py", line 58, in main
    metadata_agg = process_sra(args, zip_func)
  File "/users/cdiener/miniconda3/envs/sra/lib/python3.7/site-packages/grabseqslib/sra.py", line 31, in process_sra
    metadata_agg)
  File "/users/cdiener/miniconda3/envs/sra/lib/python3.7/site-packages/grabseqslib/sra.py", line 97, in get_sra_acc_metadata
    run_col = lines[0].index("Run")
ValueError: 'Run' is not in list

This seems to be caused by a hardcoded address to download the SRA manifest that is not reachable anymore.

@AntonioBaeza
Copy link

Having exactly the same issue (tried a few min ago)

@Zeroo11
Copy link

Zeroo11 commented Jun 30, 2022

same issue

@louiejtaylor
Copy link
Owner

louiejtaylor commented Jun 30, 2022

Thanks for reporting the issue! Looks like @cdiener is right on, http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term= redirects to https://www.ncbi.nlm.nih.gov/sviewer/?db=sra&1%3Fdb=sra&rettype=runinfo&save=efetch&term= and no longer returns metadata. I'll try to figure out the proper endpoint for their API to hit for the SRA metadata. (and see if I can get the tests passing in the meantime).

This is probably due to NCBI retiring Trace.

Looking through the NCBI E-utils API documentation, I should be able to get the same metadata by:

  1. Finding the identifiers associated with esearch, e.g. https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=sra&term=PRJNA836386&retmax=999
  2. Passing that id list to efetch, e.g. https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=sra&id=22439955&rettype=fasta&retmode=text

I'll just have to move it from XML to tab-separated since it looks like the e-utils love XML. This approach also has the advantage of using a defined API, rather than that trace URL (which worked great but I think I found it originally on StackOverflow or something).

@cdiener
Copy link
Author

cdiener commented Jul 1, 2022

You can also request JSON from esearch which should be easier to convert with Python, for instance for your example: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=sra&term=PRJNA836386&retmax=999&retmode=json .

@GitUser42
Copy link

Hello :) Is there any workaround until this will be fixed?

@zhengjxj
Copy link

Looks like some changes on the NCBI side lead to failures in SRA downloads:

grabseqs sra SRR11733975
Traceback (most recent call last):
  File "/users/cdiener/miniconda3/envs/sra/bin/grabseqs", line 11, in <module>
    sys.exit(main())
  File "/users/cdiener/miniconda3/envs/sra/lib/python3.7/site-packages/grabseqslib/__init__.py", line 58, in main
    metadata_agg = process_sra(args, zip_func)
  File "/users/cdiener/miniconda3/envs/sra/lib/python3.7/site-packages/grabseqslib/sra.py", line 31, in process_sra
    metadata_agg)
  File "/users/cdiener/miniconda3/envs/sra/lib/python3.7/site-packages/grabseqslib/sra.py", line 97, in get_sra_acc_metadata
    run_col = lines[0].index("Run")
ValueError: 'Run' is not in list

This seems to be caused by a hardcoded address to download the SRA manifest that is not reachable anymore.

Try replacing /usr/local/lib/python3.6/site-packages/grabseqslib/sra.py line 94 with
metadata = requests.get("https://trace.ncbi.nlm.nih.gov/Traces/sra-db-be/sra-db-be.cgi?rettype=runinfo&term="+pacc)

@AntonioBaeza
Copy link

AntonioBaeza commented Jul 17, 2022

Thanks [zhengjxj] (https://github.com/zhengjxj). I replaced the info in the file you indicated and is working again!

@chansigit
Copy link

thank you.
it seems that the ncbi api changed.

@xiachenrui
Copy link

Thanks ! @zhengjxj

@AMMHasan
Copy link

Hi, is grabseqs sra facing the same problem? what would be the solution this time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants