-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error using python API for batch SRAweb search #46
Comments
Thanks a lot for the bug report @anwarMZ. Would you be able to share a SRP or SRR so that I can reproduce it at my end? |
I just pushed b1fa5d6 which might fix it.
|
Thank for prompt reply, i have attached the file of study accessions here - |
Thanks for the SRP list. I will update here once I have a proper fix. |
The last fix works. Here is an example with your SRP list: https://colab.research.google.com/drive/1pNeuZJjjHliYFk582kGNRpGJ1Fa2h9cn?usp=sharing Let me know if you still face any errors. I prefer giving it a few seconds of sleep time to make sure it doesn't hit NCBI's API limits. |
Hi @saketkc This works well for querying the ids. However, in this case it creates separate files for each query. In my case, I would like to have one file combined for all SRP queries. But i am not sure if the except can catch the error if the list is passed directly. any thoughts? |
You should be able to concat the dataframes using pandas:
It is possible to query multiple SRPs at once, however given the NCBI's API limits it might time out if there are multiple SRRs (100s of them as in this case). |
Sure so i just wanted to confirm that querying multiple (100s) of ids at once doesn't work with NCBI's API. |
That's correct. Closing this, feel free to reopen if you still encounter issues. |
It worked well for me when we last spoke but now i am gradually increasing my list to fetch metadata and i am facing an issue. The problem is when there is a certain Study accession that for some reason doesn't fetch metadata it takes long time catch the exception and move on to next one.For example in the current loop as we discussed - here in collab it stalls on following IDs and it takes significant time to get pass these IDs. In this case i checked that for example these two accession IDs have had issues: Also after looking at #47 i tried to update |
Thanks for reporting @anwarMZ, I will be taking a look at it later tomorrow. Thanks! |
Hi @saketkc did you get a chance to reproduce the error? Cheers, |
Sorry about the delay in responding. I am able to obtain results for the first two of these ids:
The problem with the third id is a missing organism tag |
Also, SRP040281 has 120k+ records, so it takes approximately 7 minutes on Colab to fetch it which I think is reasonable. |
Okay, I was trying to get the details about the host specie which only comes with detailed flag e.g. |
Yes, for a project with lot of runs, the retrieval time for metadata will increase (though only linearly as you would see in the last Colab notebook). The detailed mode adds an additional overhead, I haven't done any benchmarking but it should take at least 2x the time for the non-detailed mode. I have fixed the issue with ERP000171, so I am closing this. Please feel free to reopen this if you face any issues. For projects with a lot of runs, you can expect it to take ~ |
Hi again @saketkc , Thank you for insights, i managed to get this done. I am now trying to download the
With this the process was killed, I would like to know if you have any idea about this ? I believe it could be becasue the API timed out and needs time delay between successive downloads? Also if there is a way to skip the files that are already downloaded? Thank you |
The download method first downloads to a temporary location which in this case is In this case the error you get seems to likely be arising because the parallel module is getting confused if this particular file has already been downloaded (it thinks it hasn't been, but probably its download is already complete). You should have Thanks, |
Thanks, i will open a new issue to discuss downloading |
0.10.4
3.8.3
10.15.5
. But using anaconda environment and pip installation ofpysradb
Description
Came across
pysradb
to extract the metadata for a batch of SRA runs (~9K). I tried two different approaches, however, both gave different error. Likely because of a missing value onSRAweb
, but i am not sure how an error can either be ignored and moved forward.1st Method
I tried to convert 9K SRA run accessions to SRA study IDs using
srr_to_srp
and then search approx. 500 accession ids againstSRAweb
Error
2nd Method
In this case I tried to run all 9K SRA run accessions directly against
SRAweb
Error
Thanks in advance, looking forward to hear from you.
Zohaib
The text was updated successfully, but these errors were encountered: