Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add harvested dcat properties as extras #2968

Merged
merged 4 commits into from Feb 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Expand Up @@ -13,6 +13,9 @@
- Fix: do not send mail about discussions when there is no owner / no organisation members [#2962](https://github.com/opendatateam/udata/pull/2962)
- Fix: 'backend' is now required in `HarvestSource` [#2962](https://github.com/opendatateam/udata/pull/2962)
- Fix: URL to organizations in mails are now independent from `udata-front` (show the URL of the API if no `udata-front`) [#2962](https://github.com/opendatateam/udata/pull/2962)
- Add harvested dcat properties as extras [#2968](https://github.com/opendatateam/udata/pull/2968):
- DCT.provenance [0..n]
- DCT.accessRights [0..1]

## 7.0.3 (2024-02-15)

Expand Down
16 changes: 15 additions & 1 deletion udata/core/dataset/rdf.py
Expand Up @@ -17,7 +17,7 @@
from udata.core.dataset.models import HarvestDatasetMetadata, HarvestResourceMetadata
from udata.models import db, ContactPoint
from udata.rdf import (
DCAT, DCT, FREQ, SCV, SKOS, SPDX, SCHEMA, EUFREQ, EUFORMAT, IANAFORMAT, VCARD,
DCAT, DCT, FREQ, SCV, SKOS, SPDX, SCHEMA, EUFREQ, EUFORMAT, IANAFORMAT, VCARD, RDFS,
namespace_manager, schema_from_rdf, url_from_rdf
)
from udata.utils import get_by, safe_unicode
Expand Down Expand Up @@ -521,6 +521,20 @@ def dataset_from_rdf(graph, dataset=None, node=None):
if temporal_coverage:
dataset.temporal_coverage = temporal_from_rdf(d.value(DCT.temporal))

# Adding some metadata to extras - may be moved to property if relevant
access_rights = rdf_value(d, DCT.accessRights)
if access_rights:
dataset.extras["harvest"] = {
"dct:accessRights": access_rights,
**dataset.extras.get("harvest", {})
}
provenance = [p.value(RDFS.label) for p in d.objects(DCT.provenance)]
if provenance:
dataset.extras["harvest"] = {
"dct:provenance": provenance,
**dataset.extras.get("harvest", {})
}

licenses = set()
for distrib in d.objects(DCAT.distribution | DCAT.distributions):
resource_from_rdf(distrib, dataset)
Expand Down
7 changes: 7 additions & 0 deletions udata/harvest/tests/dcat/catalog.xml
Expand Up @@ -3,6 +3,7 @@
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:dcat="http://www.w3.org/ns/dcat#"
xmlns:dct="http://purl.org/dc/terms/"
xmlns:dcterms="http://purl.org/dc/terms/"
Expand All @@ -24,6 +25,7 @@
<dcat:keyword>Tag 1</dcat:keyword>
<dcat:distribution rdf:resource="datasets/3/resources/1"/>
<dct:license>Licence Ouverte Version 2.0</dct:license>
<dct:accessRights rdf:resource="http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/INSPIRE_Directive_Article13_1e"/>
<dcat:landingPage>http://data.test.org/datasets/3</dcat:landingPage>
<dct:accrualPeriodicity xmlns:dct="http://purl.org/dc/terms/">daily</dct:accrualPeriodicity>
<dct:temporal>
Expand All @@ -32,6 +34,11 @@
<schema:endDate rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2016-12-05T00:00:00</schema:endDate>
</dcterms:PeriodOfTime>
</dct:temporal>
<dct:provenance>
<dct:ProvenanceStatement>
<rdfs:label xml:lang="fr">Description de la provenance des données</rdfs:label>
</dct:ProvenanceStatement>
</dct:provenance>
</dcat:Dataset>
</dcat:dataset>
<dcterms:title>Sample DCAT Catalog</dcterms:title>
Expand Down
3 changes: 3 additions & 0 deletions udata/harvest/tests/test_dcat_backend.py
Expand Up @@ -335,6 +335,9 @@ def test_xml_catalog(self, rmock):
assert dataset.temporal_coverage.start == date(2016, 1, 1)
assert dataset.temporal_coverage.end == date(2016, 12, 5)

assert dataset.extras["harvest"]["dct:accessRights"] == "http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/INSPIRE_Directive_Article13_1e"
assert dataset.extras["harvest"]["dct:provenance"] == ["Description de la provenance des données"]

dataset = Dataset.objects.get(harvest__dct_identifier='1')
# test html abstract description support
assert dataset.description == '# h1 title\n\n## h2 title\n\n **and bold text**'
Expand Down