Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find the right frequency for refreshing the each dataset #288

Open
rjsparks opened this issue Sep 1, 2022 · 2 comments
Open

Find the right frequency for refreshing the each dataset #288

rjsparks opened this issue Sep 1, 2022 · 2 comments

Comments

@rjsparks
Copy link
Member

rjsparks commented Sep 1, 2022

At #277, @ajeanmahoney requests that the RFC dataset be updated at least once every 12 hours. I'm wondering if the community will want the updates to appear even more rapidly, and suggest we look at the cost of an update once per hour.

We should tune this for all the datatasets.

Is there a quick link to point to what the current configuration is?

@strogonoff
Copy link
Collaborator

strogonoff commented Sep 2, 2022

A couple of notes:

  • Reindex delay is the same for all data sources (https://bib.ietf.org/static/docs/howto/auto-reindex-sources.html)
    • The delay, if any, is configured in environment variables (Kesara may know the value, I believe he mentioned it’s 1 day)
  • Having a source reindexed more often than it is rebuilt (specified in GHA) will not provide any improvement.
    • A data source should be rebuild more frequently first, and then reindexing can be made more frequent.
  • Bulk-indexable sources are generally not designed to provide near-real-time availability. For realtime availability it’s more appropriate to configure an external source for RFCs, the way individual I-Ds are retrieved from Datatracker or DOIs from Crossref. (See docs for the distinction between external and indexable bibliographic data sources.)

@ronaldtse
Copy link
Collaborator

@strogonoff is exactly right. Other than those considerations, we also want to make sure that overlapping fetching and indexing processes don't happen simultaneously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants