-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cache status to objects #125
Labels
Comments
Hi @ale-de-vries and thanks so much for this issue. You raise many of connected issues, all of which are worth thinking about! I respond in reverse order:
|
With fde4a8c, any pybliometrics class can show how old the cached file is. That's certainly a good step in the right direction. |
Closed
Throttling implemented in e32c349 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
For any of the data entities (i.e. AuthorRetrieval, ContentAffiliationRetrieval, AbstractRetrieval, and conceivably also the search types) it would be helpful to include a property/method that indicates whether a local data cache already exists for that entity, and if so, how old it is. This allows a script to inspect if the data needs to be fetched/refreshed from the REST endpoint, which in turn can be used to apply throttling when needed.
Background:
Note that the Scopus API endpoints enforce throttling; any requests that exceed the default request/seconds limit will fail. Also, any client that continuously exceeds throttling limits, risks having its API key suspended. This means that the client needs to monitor/control the rate at which it is calling the API to avoid such failed requests, e.g. by including a timeout (`sleep') when looping over API calls.
The challenge is that this timeout is not necessary when initiating a retrieval/search object for which a cache already existed, as for such cached objects, the API call isn't made. In fact, doing so would be unhelpful, as looping with a timeout over a series of objects that have been cached, means that initiating those objects will take longer than needed, unnecessarily increasing program run time.
(A more elegant approach would be for pybliometrics to enforce throttling, eg. by building a timeout into the get_content.py module - but that requires that module to persist the timestamp of the last request made to api.elsevier.com one way or another, which isn't trivial as this either needs to be persisted on-disk - or maintained in memory, like the elsapy library does.)
The text was updated successfully, but these errors were encountered: