Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Api key #61

Merged
merged 14 commits into from
Sep 11, 2020
Merged

Api key #61

merged 14 commits into from
Sep 11, 2020

Conversation

Maarten-vd-Sande
Copy link
Contributor

I decided to just implement the skeleton instead of leaving it up to you, see #60. This still needs the sleeps changed, but I am not sure which sleep is for what, since some are 1/10th, 1/3rd, 1/2 etc.

import pysradb
print(pysradb.__version__)
print(pysradb.SRAweb(api_key="__secret__").sra_metadata("GSM1020644", detailed=True).experiment_alias.values)

works and does not crash (and takes the same amount of time as without a key).

@codecov
Copy link

codecov bot commented Sep 8, 2020

Codecov Report

Merging #61 into master will decrease coverage by 0.01%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #61      +/-   ##
==========================================
- Coverage   42.23%   42.22%   -0.02%     
==========================================
  Files           5        5              
  Lines        1030     1035       +5     
==========================================
+ Hits          435      437       +2     
- Misses        595      598       +3     
Impacted Files Coverage Δ
pysradb/sraweb.py 84.12% <66.66%> (-0.48%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5e54c4e...aeaa494. Read the comment docs.

@saketkc
Copy link
Owner

saketkc commented Sep 8, 2020

Thanks @Maarten-vd-Sande , this looks great and is very helpful! I have not revisited the sleep time issues for a while given things looked mostly stable. But this PR should be a good opportunity to do so.

Might be helpful for @bscrow's PR #57 too. I would need some time to review this.

@Maarten-vd-Sande
Copy link
Contributor Author

Glad that it is considered useful 😄 . Let me know if I can somehow help to move this forward.

@saketkc
Copy link
Owner

saketkc commented Sep 8, 2020

Do you think we should propogate the self.sleep_time in the retry functions? It is not being used anyhwere. One idea would be to not have it and let the retry block as is (it has a sleep of 0.5 seconds by default)

@Maarten-vd-Sande
Copy link
Contributor Author

I think the API key is not really necessary actually. I only used SRAweb with detailed=True, and I thought it took so "long" because pysradb looked up each sample separately (which it doesn't). I just discovered that the reason it takes so long is checking for the ENA download link.

I changed the time.sleeps to use the self.time_sleep, so at least that is cleaned up. This works and the small tests I did do not run into API limits. However the api-key is not really necessary, so we can also close this PR and ignore this :)

@bscrow
Copy link
Collaborator

bscrow commented Sep 9, 2020

Yep @saketkc including an API key for the search feature should speed up large queries for SRA metadata, since the current implementation retrieves metadata in batches of 300 entries. I'll test it out for the search module

@Maarten-vd-Sande
Copy link
Contributor Author

Maarten-vd-Sande commented Sep 9, 2020

Seems like I keep on missing stuff in the code 👀. Where is this split (batches of 300) made @bscrow ?

@Maarten-vd-Sande Maarten-vd-Sande changed the title Skeleton for api key Api key Sep 9, 2020
@bscrow
Copy link
Collaborator

bscrow commented Sep 9, 2020

@Maarten-vd-Sande this split is only implemented in the upcoming pysradb search feature (#57 ), which is different from the pysradb metadata feature found in pysradb/sraweb.py in that it fetches metadata based on text queries instead of accession numbers

@saketkc
Copy link
Owner

saketkc commented Sep 10, 2020

I guess, this is still useful if someone is interested in trying out the API key and the current behavior remains unaffected.
I might expose this to command line in the future.

What are your thoughts @Maarten-vd-Sande and @bscrow?

@Maarten-vd-Sande
Copy link
Contributor Author

It doesn't hurt to have it, although it might be misleading in that it speeds up pysradb with "normal" usage. If it helps with the search function then that's nice!

I have to say it requires more thorough testing to see if it hits the API limit, since I haven't really stress-tested it.

@saketkc saketkc merged commit 1eaa0a8 into saketkc:master Sep 11, 2020
@saketkc
Copy link
Owner

saketkc commented Sep 11, 2020

I decided to merge it. In my experience, stress testing NCBI is hard (very unreliable behavior overall). Since it doesn't change the current behavior, and might potentially be useful (hopefully), we can keep it.

@saketkc
Copy link
Owner

saketkc commented Sep 11, 2020

Many thanks for your contribution @Maarten-vd-Sande!

@saketkc saketkc mentioned this pull request Sep 11, 2020
@Maarten-vd-Sande
Copy link
Contributor Author

Just a FYI, every once in a while I get:

Unable to parse xml: {"error":"API rate limit exceeded","api-key":"131.174.27.98","count":"4","limit":"3"}

Maybe the sleep time should be slightly increased to avoid this.

@saketkc
Copy link
Owner

saketkc commented Nov 4, 2020

Thanks for catching this @Maarten-vd-Sande. Do you have an example I can test this against?

@Maarten-vd-Sande
Copy link
Contributor Author

Let me see if I can cook something up :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants