Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement various harvesting strategies properly. #5

Closed
miku opened this issue Aug 24, 2017 · 1 comment
Closed

Implement various harvesting strategies properly. #5

miku opened this issue Aug 24, 2017 · 1 comment

Comments

@miku
Copy link
Owner

miku commented Aug 24, 2017

metha should implement various harvesting strategies:

  • normal/default (for standard conform endpoints), harvest windows, daily, monthly, yearly, all
  • single records, so individual records may fail or servers are not overloaded
  • other modes: all at once

Implementation ideas:

Instead of relying only on files, introduce a small manifest.json describing the harvested content (ids, dates, harvesting dates, files).

@miku
Copy link
Owner Author

miku commented Oct 5, 2017

metha was meant be a very simple program (no database, only files and not even metadata about files). In order to keep it simple, a more resilient harvesting approach has been implemented in a separate program: oaicrawl.

@miku miku closed this as completed Oct 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant