Join GitHub today
DCAT harvester refactoring #2096
This PR refactor the DCAT harvesting to store only one graph for all datasets in a DCAT catalog.
These changes allows to process way more datasets in a single Job. As each harvesting job store all its items into a single MongoDB document, each job can store at most
This also prevent some data loss because properly slicing a graph is difficult in RDF when you don't known by advance all node properties. This will allow to parse more triplets for each dataset and resource.
These changes also allow processing of DCAT Dataset as Blank nodes instead of URIRef.