RDF/DCAT support #966

noirbizarre · 2017-06-12T09:44:42Z

This PR adds DCAT support:

These points can/should be done in other PRs:

spatial coverage handling (parsing and exposition), idealy reusing geozones
harvest content type negociation improvement (use headers...)
allow search result to be available as rdf in addition to CSV (very easy)

davidbgk · 2017-06-12T09:46:40Z

Fixtures are huge 😱

davidbgk · 2017-06-12T10:05:44Z

udata/rdf.py

+}
+
+# Map formats to default used extensions
+RDF_EXTENSIONS = {


Do we really need all these extensions? I would only activate jsonld for now. It impacts (at least) the number of links rendered in the template.

These are main RDF supported extensions, supporting them comes without extra programming cost and choice has been discussed and validated with @ColinMaudry.
Different format for different usage/tools.
We can't predict what usage or tools will be use to consume udata rdf (not so many tools support JSON-LD). In fact, I've been using different tools during that PR and not all of them support json-ld.
On the harvest part, we need to be able to harvest these formats: not all portals expose json-ld which is the younger format in the list (this is why this is the only one which requires an extra depdency to rdflib).
The DCAT spec is a RDF specification, not JSON-LD specific (DCAT vocabulary itself is not exposed as JSON-LD but only xml and turtle)

noirbizarre · 2017-06-12T10:54:02Z

I removed the fixtures that shouldn't be there.

ColinMaudry · 2017-06-12T11:58:54Z

Regarding content negotiation, is it possible to negotiate an RDF format from /datasets/{id} ?

noirbizarre · 2017-06-12T12:15:53Z

Yes, still to be done (it's in the checklist).
The helpers are done, I just miss the view integration

ColinMaudry · 2017-06-12T12:20:20Z

Fixture = endpoint? I'm not sure of what you call a fixture.

noirbizarre · 2017-06-12T12:22:33Z

A fixture a a set of data required by the test(s).
In our case, fixtures are RDF files to simulate server response.

noirbizarre · 2017-06-12T14:44:42Z

Requires #967

noirbizarre · 2017-06-12T14:53:33Z

Requires #968

noirbizarre · 2017-06-12T16:11:12Z

Requires #969

vinyll

Suggestions for English sentences

vinyll · 2017-06-12T13:09:58Z

docs/harvesting.md

+
+## Vocabulary
+
+- **Backend**: designate a protocol implementation to harvest a remote endpoint.


implementation protocol?

No, it doesn't mean the same thing ;)

vinyll · 2017-06-12T13:10:17Z

docs/harvesting.md

+## Vocabulary
+
+- **Backend**: designate a protocol implementation to harvest a remote endpoint.
+- **Source**: it's remote end point to harvest. Each harvest source is caracterized by


vinyll · 2017-06-12T13:11:45Z

docs/harvesting.md

+
+- **Backend**: designate a protocol implementation to harvest a remote endpoint.
+- **Source**: it's remote end point to harvest. Each harvest source is caracterized by
+  a single endpoint URL and a backend implementation. An harvester is configured for each source.


A harvester (cf: https://en.oxforddictionaries.com/definition/harvest)

vinyll · 2017-06-12T13:12:17Z

docs/harvesting.md

+
+## Behavior
+
+After an harvester for a given source has been created and validated, it will run either on demand or periodically.


a harvester

vinyll · 2017-06-12T13:12:38Z

docs/harvesting.md

+
+After an harvester for a given source has been created and validated, it will run either on demand or periodically.
+
+An harvesting job is done in three separate phases:


A harvesting

vinyll · 2017-06-12T13:17:57Z

docs/harvesting.md

+    unschedule          Run an harvester synchronously
+    run                 Run an harvester synchronously
+    validate            Validate a source given its identifier
+    attach              Attach existing dataset to their harvest remote id.


no final point

vinyll · 2017-06-12T13:20:40Z

docs/harvesting.md

+    run                 Run an harvester synchronously
+    validate            Validate a source given its identifier
+    attach              Attach existing dataset to their harvest remote id.
+    delete              Delete an harvest source


a harvest

vinyll · 2017-06-12T13:21:35Z

docs/harvesting.md

+### DCAT (prefered)
+
+This backend harvest any [DCAT][] endpoint.
+This is now the recommanded way to harvest remote portals and repositories


recommended

vinyll · 2017-06-12T13:22:14Z

docs/harvesting.md

+This is now the recommanded way to harvest remote portals and repositories
+(and so to expose opendata metadata for any portal and repository).
+
+As pagination is not described into the DCAT specifcation, we try to detect some supported


specification

vinyll · 2017-06-12T16:33:20Z

docs/harvesting.md

+
+```
+
+You take a look at the [existing backends][backends-repository] to see exiting implementations.


You may take a look

davidbgk · 2017-06-19T15:43:50Z

docs/harvesting.md

+# Harvesting
+
+Harvesting is the process of fetching of automatically remote metadata (ie. from other data portals or not)
+and store them into udata for being able to search them.


Harvesting is the process of automatically syncing remote metadata (i.e. from other data portals or not) and storing them into udata to be able to find remote data.

Wow, so many of typos in my sentence 😱
I'll fix that ASAP. But I think, we need to keep ``fetchingassyncing` suggest it's bidirectionnal while harvesting is not.

davidbgk · 2017-06-19T15:46:35Z

docs/harvesting.md

+
+1. `initialize`: the harvester fetches remote identifiers to harvest and create a single task for each.
+2. `process`: each task created in the `initialize` is executed. Each item is processed independently.
+3. `finalize`: when all tasks are done, the `finilize` is a closure for the job and mark it as done.


finalize (second one)

davidbgk · 2017-06-19T17:05:01Z

docs/rdf.md

+    /dataset/{id}/rdf.{format}
+
+The dataset pages serve as identifier and perform content negociation too,
+so the following URL will all redirect to the same RDF endpoint:


davidbgk · 2017-06-19T17:07:15Z

udata/core/dataset/rdf.py

+        elif isinstance(period_of_time, RdfResource):
+            return temporal_from_resource(period_of_time)
+    except:
+        # There is a lot of case where parsing could/should fail


are a lot of cases

davidbgk · 2017-06-19T17:12:19Z

udata/core/dataset/views.py

+        elif dataset.deleted:
+            abort(410)
+
+    format = guess_format(format)


These few lines are duplicated in rdf_catalog_format, mutualize?

davidbgk · 2017-06-22T11:31:46Z

udata/core/dataset/rdf.py

+        elif isinstance(period_of_time, RdfResource):
+            return temporal_from_resource(period_of_time)
+    except:
+        # There are a lot of cases where parsing could/should fail


Is that worth logging it for future improvements?

davidbgk · 2017-06-22T11:34:03Z

udata/rdf.py

+    resulting JSON-LD.
+
+    See: https://github.com/RDFLib/rdflib-jsonld/blob/master/rdflib_jsonld/serializer.py#L101-L103
+    '''  # noqa


If you put only a URL on one line, the linter shouldn't complain even without a # noqa comment. At least mine 😉

Mine is crying without 😢

noirbizarre added the enhancement label Jun 12, 2017

noirbizarre added this to the 1.1 milestone Jun 12, 2017

noirbizarre requested review from davidbgk, abulte, vinyll, jdesboeufs, thimy and teleboas June 12, 2017 09:44

noirbizarre added the in progress label Jun 12, 2017

davidbgk reviewed Jun 12, 2017

View reviewed changes

vinyll reviewed Jun 12, 2017

View reviewed changes

davidbgk reviewed Jun 19, 2017

View reviewed changes

davidbgk approved these changes Jun 22, 2017

View reviewed changes

noirbizarre added 7 commits June 22, 2017 14:06

Added a DCAT harvester

49b3b09

Initial basic harvesting documentation

29805f8

WIP udata.core.dataset.rdf module

edb08c7

Expose RDF for catalog and datasets, factorize RDF handling

02a66fd

Expose alternate rdf links in header

4e96021

Remove unused fixtures and test methods

5037e4d

Add missing DCT.format

716ff3c

noirbizarre added 16 commits June 22, 2017 14:06

Temporal coverage RDF support

d051674

Failsafe temporal coverage parsing

3705b51

Perform content negociation on dataset view for RDF mime types

fb343f3

Fix typos in documentation

0c04fd2

Lots of fixes

f87a00d

Pagination handling in DCAT harvester

dbe8526

Paginate catalog

49669c7

Improve context for smaller and easier to read payload

2db307b

Prevent redirects on pagination

d7871f8

Handle frequency

b978981

Small fixes

da0e332

RDF documentation

5d5bc34

Improve changelog

33c9996

Mutualize graph serialize to flask response

7226a2a

Fix som typos

b46e04a

Last RDF fixes

fb7fb5f

noirbizarre merged commit a66184e into opendatateam:dev Jun 22, 2017

noirbizarre deleted the dcat branch June 22, 2017 12:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RDF/DCAT support #966

RDF/DCAT support #966

noirbizarre commented Jun 12, 2017 •

edited

Loading

davidbgk commented Jun 12, 2017

davidbgk Jun 12, 2017

noirbizarre Jun 12, 2017

noirbizarre commented Jun 12, 2017

ColinMaudry commented Jun 12, 2017 •

edited

Loading

noirbizarre commented Jun 12, 2017

ColinMaudry commented Jun 12, 2017

noirbizarre commented Jun 12, 2017

noirbizarre commented Jun 12, 2017

noirbizarre commented Jun 12, 2017

noirbizarre commented Jun 12, 2017

vinyll left a comment

vinyll Jun 12, 2017

noirbizarre Jun 13, 2017

vinyll Jun 12, 2017

vinyll Jun 12, 2017

vinyll Jun 12, 2017

vinyll Jun 12, 2017

vinyll Jun 12, 2017

vinyll Jun 12, 2017

vinyll Jun 12, 2017

vinyll Jun 12, 2017

vinyll Jun 12, 2017

davidbgk Jun 19, 2017

noirbizarre Jun 19, 2017

davidbgk Jun 19, 2017

davidbgk Jun 19, 2017

davidbgk Jun 19, 2017

davidbgk Jun 19, 2017

davidbgk Jun 22, 2017

davidbgk Jun 22, 2017

noirbizarre Jun 22, 2017


		## Vocabulary

		- Backend: designate a protocol implementation to harvest a remote endpoint.


		## Behavior

		After an harvester for a given source has been created and validated, it will run either on demand or periodically.


		After an harvester for a given source has been created and validated, it will run either on demand or periodically.

		An harvesting job is done in three separate phases:


		```

		You take a look at the [existing backends][backends-repository] to see exiting implementations.

RDF/DCAT support #966

RDF/DCAT support #966

Conversation

noirbizarre commented Jun 12, 2017 • edited Loading

davidbgk commented Jun 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

noirbizarre commented Jun 12, 2017

ColinMaudry commented Jun 12, 2017 • edited Loading

noirbizarre commented Jun 12, 2017

ColinMaudry commented Jun 12, 2017

noirbizarre commented Jun 12, 2017

noirbizarre commented Jun 12, 2017

noirbizarre commented Jun 12, 2017

noirbizarre commented Jun 12, 2017

vinyll left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

noirbizarre commented Jun 12, 2017 •

edited

Loading

ColinMaudry commented Jun 12, 2017 •

edited

Loading