Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work out how to handle changes to rate limiting #117

Closed
dwinter opened this issue Nov 12, 2017 · 7 comments
Closed

Work out how to handle changes to rate limiting #117

dwinter opened this issue Nov 12, 2017 · 7 comments
Assignees

Comments

@dwinter
Copy link
Member

dwinter commented Nov 12, 2017

As per #114, users will be allowed to make more than 3 requests per second if they are using an API key. The limit for users with a key will be 10 requests per second, but it will be possible for some users to get even faster connections.

We should check with NCBI about how to handle client-side limiting, but here is a proposal.

  • Move the calls to Sys.sleep into make_entrez_query
  • In most cases, set the sleep time to 1/3 or 1/10th of a second based on the presence or absence of an api_key argument
  • Add an optional argument or environmental variable (only a few power users will take advantage of this) to override the sleep time
  • document that fact that most users should not change the sleep time variable and they may get in trouble if do so without permission.
@dwinter dwinter self-assigned this Nov 12, 2017
@dwinter dwinter added this to the match NCBI rate limiting milestone Nov 13, 2017
@dwinter
Copy link
Member Author

dwinter commented Nov 13, 2017

NCBI are OK with this plan ✅

@sckott
Copy link
Contributor

sckott commented Nov 30, 2017

@dwinter you aware of any rate limiting information returned in the headers or body of entrez API requests? I haven't seen any. If there isn't any, that really sucks

@dwinter
Copy link
Member Author

dwinter commented Dec 1, 2017

Hi @sckott, unfortunately, there is no info in the headers, and I don't think there is any plan to include it. I gather the requests will just return an error if the user is sending too many too quick.y

At present rentez just Sys.sleeps for 1/3 of a second on every request. In the feature branch for this that changes to 1/10th of a second if ENTREZ_KEY is given is set as an envrioment variable.

Maybe not a great solution (and probably slower than it could be if rate-limiting could be taken from the headers) but seems like this simplest way to handle this?

@sckott
Copy link
Contributor

sckott commented Dec 1, 2017

Bummer. I've already emailed them, hopefully will lead to something eventually. Right, i think that's (sleeping) what we do when using entrez stuff in other pkgs.

@dwinter dwinter closed this as completed Mar 20, 2018
@boopsboops
Copy link

Apologies, if this isn't the best place to discuss this, but I'm having some problems in dealing with these recent changes to NCBI's API. Running entrez_search or entrez_fetchas a single process was glacially slow, so I used mcmapply to distribute searches over cores. This worked very well until the situation now, where all requests are rejected due to the API rate limit. Even with an API key, I am rejected (even when the number of cores is reduced).

I appreciate that this isn't an issue with rentrez, but wondering if there are any tricks you are aware of to access NCBI data in a more controlled manner?

Cheers

@dwinter
Copy link
Member Author

dwinter commented Dec 4, 2018

Hi @boopsboops,

I think the only option is to email the NCBI support desk and explain you use-case and how the current rules precent you from achieving resonable research goals. I understand they are able to specify custom rates for specific API keys.

@boopsboops
Copy link

Thanks @dwinter I'll try that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants