-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mapping data packages to DCAT2 #551
Comments
@augusto-herrmann sure - do you want to prep one? We generally would like a "json-ld" context for data packages frictionlessdata/datapackage#218 |
@rufuspollock I did search the repository for JSON-LD, but somehow I missed those previous threads. Sorry. 😳 Anyway, for this issue, what I had in mind is smaller in scope: just mapping the data package metadata to DCAT2 classes and properties. DCAT2, as a W3C recommendation, is very recent – less than a month old. JSON-LD would be just a tool for achieving that goal. Why is that important? Because it would make easier for data catalogs (such as CKAN, with the DCAT plugin) to automate importing data packages as datasets. I am also well aware of the previous failed attempts trying to reconcile data packages and the W3C's CSVW and I'm not trying to delve into that at the moment. That and fully converting the data to linked data is a much harder problem that should be considered out of scope to what I'm proposing here. I can try to sketch something, but I wanted to first make sure that it made sense, so as not to be a wasted effort. |
@augusto-herrmann 👍 on doing the mapping - go for it. |
@augusto-herrmann any update here 😄 ? |
I don't have a lot of time to devote to this right now, but here are some steps I thought about how to undertake this:
|
@augusto-herrmann we now have a spreadsheet via frictionlessdata/forum#11 and i've added a sheet for DCAT 2 https://docs.google.com/spreadsheets/d/1XdqGTFni5Jfs8AMbcbfsP7m11h9mOHS0eDtUZtqGVSg/edit#gid=729988073 - would you be up for adding the list of DCAT 2 attributes there? |
Sure. I'm working on it. |
@augusto-herrmann how is this going? |
I remember doing some of that work last year, but I need to verify at which point I left it (I do not remember anymore) and resume it when I have some time. |
Here is how much of it is done. Classes:
Though I expect that most of the latter ones either don't map to Frictionless or are less relevant. |
Hi, Thanks for all of the existing work on mappings, I'd like to help progress this and map out what's left. @augusto-herrmann are you continuing to work on this? Initially I plan to focus solely on mapping the Table Schema as that format fits the majority of the datasets we [in the project described below] will be working with. I appreciate some of what's covered in this post might be slightly out of scope for this issue, let me know if it would be more useful to create a new one. ContextI'm working on a project where the metadata of the datasets will be expressed in RDF, we're looking at using the Frictionless Data specifications and wider tools as a way to reduce the friction for users [of the climate data hub we're working on] to generate the RDF metadata. Specifically, one user story would be using data package creator to generate a datapackage.json, we'll create a Python library that can map from the datapackage.json to an RDF representation - though the use of a JSON-LD context could enable the datapackage.json to be understood as RDF (removing the need for a secondary mapping step). Reasons we ideally want to have an RDF metadata representation:
Key components needed to enable this:
Existing FD/RDF DiscussionsI thought it might be handy to summarise some of the existing discussions relating to using RDF for Frictionless Data packages
|
Wow, @AyrtonB, this seems like an amazing project! 😀 Also you've made an awesome summary there of the discussions surrounding Frictionless and RDF. Good job! A month ago I progressed a little bit more on the mapping, but I was taking this slowly, because I'm always busy with lots of other things. If you've got a dedicated project to take this forward, it makes total sense that you'd take it over from here. The checklist above pretty much marks the point where I left off. So you could continue the mapping by reviewing the parts I've already done and continue what is still left to do. |
Hi @AyrtonB! This looks awesome! I think you've done a great job summarizing the work that needs to be done and the current situation. I'm on the Frictionless team (with @roll) and would be happy to support you if you have questions or need help. Communicating on github works very well for us, but if you want to have a call to chat let me know :-) |
Thanks. That's handy to know @augusto-herrmann, sounds good I'll work from that. @lwinfree that would be really helpful, thank you. I think for this specific issue the next steps are pretty clear and I'll try to keep the convo in this thread atm. For the project I'm working on we're also looking at creating a custom schema for 'data dictionaries' that can act as a central link between different datasets - e.g. where for the original datasets you only have to specify one foreignKey that maps to the primaryKey in a data dictionary, removing duplication and ensuring only one central dataset needs to be updated when a new dataset is mapped in. It would be really useful to have a call around the implementation of this use-case if possible :) |
Sounds great @AyrtonB! |
Update OverviewI've made some progress towards creating a parser for generation an RDF representation of the table schema datapackage.json. Currently I have a Python script that includes a This is very much a pilot but has been useful in terms of working out which terms in different ontologies could be useful, as well as identifying how best to approach the parsing with Python. Next I plan to create an ontology for the frictionless data spec that includes a tableSchema object (building on existing specs), I'll then use the concepts described in it and refactor the Python code. Progression
Open Questions
Next Steps
|
That is a cool experiment! Have you finished already the mapping I started on the Google Spreadsheet linked above? Because I think we should only start developing practical implementations once the mapping of concepts between Tabular Data Packages and DCAT2 is completely finished. Otherwise we may end up doing unnecessary repetition.
Why would creating a JSON-LD
Perhaps data packages and resources should not be blank nodes, but URI identified instead. There could be a determined way to create those URIs, appending hashes at the end, something like
DCAT2 makes a distinction between dataset and distribution in the way that the former is more abstract and the latter is a concrete representation or serialization of the same data. While this makes philosophical sense, from a pragmatic point of view there is no use doing that separation. In Frictionless, the different representations of the same data are just different resources in the same data package. It is not an exact match, but maybe the
I do not remember exactly, but I think the main problem is that CSVW makes using RDF mandatory and many in the community would rather make it optional.
I think the initial scope of this issue was just mapping data packages in a general sense, which may not even be tabular. Table schema could be a possible next step after this is done.
I think this mapping should enable conversion of metadata from the basic Data Package -> DCAT2 and DCAT2 -> Data Package with as little information loss as possible. Except, for the moment, for profiles, like table schema, of course.
The differences between |
Hi @AyrtonB and @augusto-herrmann! I'm working on cleaning up issues in this repo and wanted to touch base with you two about this. Thanks for all the work you've done so far! Are you still interested in working on this? There is no pressure from me, and no time crunch, I'm just inquiring about the status :-) Hope you are both doing OK! |
Hi @lwinfree, thanks for bringing this up. Yes, I am still interested in this issue, but don't have much time do dedicate to solve it – last I checked, some effort was required to finish the mapping between the f11d specs and DCAT2 classes and properties in RDF. I'm curious to find out how far @AyrtonB has been able to progress with it, or in more detail the questions I asked above, if possible. |
Hi people, I have implemented an initial version of DCAT mapper (highly based on PS. |
The W3C has just published as a recommendation the Data Catalog Vocabulary – DCAT, version 2. Should there be a mapping from the Data Package specification to DCAT2?
If so, I think it would be possible to implement with a JSON-LD context.
WDYT, @lwinfree, @rufuspollock, @roll ?
The text was updated successfully, but these errors were encountered: