-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDF/DCAT support #966
RDF/DCAT support #966
Conversation
Fixtures are huge 😱 |
} | ||
|
||
# Map formats to default used extensions | ||
RDF_EXTENSIONS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need all these extensions? I would only activate jsonld for now. It impacts (at least) the number of links rendered in the template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are main RDF supported extensions, supporting them comes without extra programming cost and choice has been discussed and validated with @ColinMaudry.
Different format for different usage/tools.
We can't predict what usage or tools will be use to consume udata rdf (not so many tools support JSON-LD
). In fact, I've been using different tools during that PR and not all of them support json-ld
.
On the harvest part, we need to be able to harvest these formats: not all portals expose json-ld which is the younger format in the list (this is why this is the only one which requires an extra depdency to rdflib
).
The DCAT spec is a RDF specification, not JSON-LD specific (DCAT
vocabulary itself is not exposed as JSON-LD
but only xml
and turtle
)
I removed the fixtures that shouldn't be there. |
Regarding content negotiation, is it possible to negotiate an RDF format from |
Yes, still to be done (it's in the checklist). |
Fixture = endpoint? I'm not sure of what you call a fixture. |
A fixture a a set of data required by the test(s). |
Requires #967 |
Requires #968 |
Requires #969 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions for English sentences
docs/harvesting.md
Outdated
|
||
## Vocabulary | ||
|
||
- **Backend**: designate a protocol implementation to harvest a remote endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implementation protocol?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it doesn't mean the same thing ;)
docs/harvesting.md
Outdated
## Vocabulary | ||
|
||
- **Backend**: designate a protocol implementation to harvest a remote endpoint. | ||
- **Source**: it's remote end point to harvest. Each harvest source is caracterized by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its
docs/harvesting.md
Outdated
|
||
- **Backend**: designate a protocol implementation to harvest a remote endpoint. | ||
- **Source**: it's remote end point to harvest. Each harvest source is caracterized by | ||
a single endpoint URL and a backend implementation. An harvester is configured for each source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A harvester (cf: https://en.oxforddictionaries.com/definition/harvest)
docs/harvesting.md
Outdated
|
||
## Behavior | ||
|
||
After an harvester for a given source has been created and validated, it will run either on demand or periodically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a harvester
docs/harvesting.md
Outdated
|
||
After an harvester for a given source has been created and validated, it will run either on demand or periodically. | ||
|
||
An harvesting job is done in three separate phases: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A harvesting
docs/harvesting.md
Outdated
unschedule Run an harvester synchronously | ||
run Run an harvester synchronously | ||
validate Validate a source given its identifier | ||
attach Attach existing dataset to their harvest remote id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no final point
docs/harvesting.md
Outdated
run Run an harvester synchronously | ||
validate Validate a source given its identifier | ||
attach Attach existing dataset to their harvest remote id. | ||
delete Delete an harvest source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a harvest
docs/harvesting.md
Outdated
### DCAT (prefered) | ||
|
||
This backend harvest any [DCAT][] endpoint. | ||
This is now the recommanded way to harvest remote portals and repositories |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recommended
docs/harvesting.md
Outdated
This is now the recommanded way to harvest remote portals and repositories | ||
(and so to expose opendata metadata for any portal and repository). | ||
|
||
As pagination is not described into the DCAT specifcation, we try to detect some supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specification
docs/harvesting.md
Outdated
|
||
``` | ||
|
||
You take a look at the [existing backends][backends-repository] to see exiting implementations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may take a look
docs/harvesting.md
Outdated
# Harvesting | ||
|
||
Harvesting is the process of fetching of automatically remote metadata (ie. from other data portals or not) | ||
and store them into udata for being able to search them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Harvesting is the process of automatically syncing remote metadata (i.e. from other data portals or not) and storing them into udata to be able to find remote data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, so many of typos in my sentence 😱
I'll fix that ASAP. But I think, we need to keep ``fetchingas
syncing` suggest it's bidirectionnal while harvesting is not.
docs/harvesting.md
Outdated
|
||
1. `initialize`: the harvester fetches remote identifiers to harvest and create a single task for each. | ||
2. `process`: each task created in the `initialize` is executed. Each item is processed independently. | ||
3. `finalize`: when all tasks are done, the `finilize` is a closure for the job and mark it as done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finalize
(second one)
docs/rdf.md
Outdated
/dataset/{id}/rdf.{format} | ||
|
||
The dataset pages serve as identifier and perform content negociation too, | ||
so the following URL will all redirect to the same RDF endpoint: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
URLs
udata/core/dataset/rdf.py
Outdated
elif isinstance(period_of_time, RdfResource): | ||
return temporal_from_resource(period_of_time) | ||
except: | ||
# There is a lot of case where parsing could/should fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are a lot of cases
udata/core/dataset/views.py
Outdated
elif dataset.deleted: | ||
abort(410) | ||
|
||
format = guess_format(format) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These few lines are duplicated in rdf_catalog_format
, mutualize?
elif isinstance(period_of_time, RdfResource): | ||
return temporal_from_resource(period_of_time) | ||
except: | ||
# There are a lot of cases where parsing could/should fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that worth logging it for future improvements?
udata/rdf.py
Outdated
resulting JSON-LD. | ||
|
||
See: https://github.com/RDFLib/rdflib-jsonld/blob/master/rdflib_jsonld/serializer.py#L101-L103 | ||
''' # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you put only a URL on one line, the linter shouldn't complain even without a # noqa comment. At least mine 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mine is crying without 😢
This PR adds DCAT support:
/context.jsonld
)/catalog
)/catalog.xml
)/catalog.ttl
)/catalog.json(ld)
)/catalog.n3
)/catalog.nt
)/datasets/{id}/rdf
){lang}/datasets/{id}/
)/datasets/{id}/rdf.xml
)/datasets/{id}/rdf.ttl
)/datasets/{id}/rdf.json(ld)
)/datasets/{id}/rdf.n3
)/datasets/{id}/rdf.nt
)/data.{json,xml,ttl}
)These points can/should be done in other PRs: