Skip to content
This repository has been archived by the owner on Apr 18, 2024. It is now read-only.

Add migration from ods to dcat #247

Merged
merged 16 commits into from Jan 9, 2024
Merged

Add migration from ods to dcat #247

merged 16 commits into from Jan 9, 2024

Conversation

maudetes
Copy link
Contributor

@maudetes maudetes commented May 26, 2023

Fix datagouv/data.gouv.fr#1086
Update HarvestSource as well as harvested Datasets to migrate to the DCAT catalog endpoint.

@maudetes maudetes marked this pull request as draft May 26, 2023 13:00
@maudetes maudetes marked this pull request as ready for review January 5, 2024 14:15
Due to a bug in ODS DCAT export when filtering on (localized?) keywords.
Adding lang=fr is a workaround for now
@maudetes maudetes merged commit cfec4f7 into master Jan 9, 2024
1 check passed
@maudetes maudetes deleted the chore/migrate-ods-to-dcat branch January 9, 2024 09:51
@maudetes
Copy link
Contributor Author

maudetes commented Jan 23, 2024

A correction has been made due to some redirect from domain registered in source and results in DCAT.
Here is the sample code made for this correction:

from datetime import datetime
from udata.app import create_app
from udata.models import Dataset
from udata.harvest.models import HarvestSource


# Mapping of old domains -> new domains
source_mapping = {
    'https://sdem.opendatasoft.com': 'https://www.opendata56.fr',
    'https://paysdelaloire.opendatasoft.com': 'https://data.paysdelaloire.fr',
    'https://enedis.opendatasoft.com': 'https://data.enedis.fr',
    'http://data.haute-garonne.fr': 'https://data.haute-garonne.fr',
    'http://breizh.opendatasoft.com': 'https://data.bretagne.bzh',
    'http://data.ratp.fr': 'https://data.ratp.fr',
    'http://data.laregion.fr': 'https://data.laregion.fr',
    'https://datainfogreffe.fr': 'https://opendata.datainfogreffe.fr',
    'http://opendata.stif.info': 'https://data.iledefrance-mobilites.fr',
}


def rename(url, domain):
    return url.replace(domain, source_mapping[domain])

app = create_app()

with app.app_context():
	for domain in source_mapping:
		for source in HarvestSource.objects(url__contains=domain, validation__state="accepted"):
			print(source.id)
			# Rename source
			source.url = rename(source.url, domain)
			source.save()

			# Delete new datasets duplicate created with new domain
			for dat in Dataset.objects(
				harvest__source_id=str(source.id),
				harvest__remote_id__contains=source_mapping[domain],
				created_at_internal__gte="2024-01-08"
			):
				# print(f"Suppression de : {dat.harvest.remote_id}")
				dat.deleted=datetime.now()
				dat.save()

			# Rename old datasets and resources with new domain
			for dat in Dataset.objects(
				harvest__source_id=str(source.id),
				harvest__remote_id__contains=domain,
				created_at_internal__lte="2024-01-08"
			):
				# print(f"Renommage : {dat.harvest.remote_id} -> {rename(dat.harvest.remote_id, domain)}")
				dat.harvest.remote_id = rename(dat.harvest.remote_id, domain)
				for res in dat.resources:
					res.url = rename(res.url, domain)
				dat.save()

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test et validation de migration ODS -> DCAT
2 participants