Implement various harvesting strategies properly. #5

miku · 2017-08-24T08:26:44Z

metha should implement various harvesting strategies:

normal/default (for standard conform endpoints), harvest windows, daily, monthly, yearly, all
single records, so individual records may fail or servers are not overloaded
other modes: all at once

Implementation ideas:

Instead of relying only on files, introduce a small manifest.json describing the harvested content (ids, dates, harvesting dates, files).

miku · 2017-10-05T10:05:18Z

metha was meant be a very simple program (no database, only files and not even metadata about files). In order to keep it simple, a more resilient harvesting approach has been implemented in a separate program: oaicrawl.

miku closed this as completed Oct 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement various harvesting strategies properly. #5

Implement various harvesting strategies properly. #5

miku commented Aug 24, 2017

miku commented Oct 5, 2017

Implement various harvesting strategies properly. #5

Implement various harvesting strategies properly. #5

Comments

miku commented Aug 24, 2017

miku commented Oct 5, 2017