Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add possibility to describe attributes of a dataset/distribution #183

Open
metaodi opened this issue Oct 27, 2021 · 11 comments
Open

Add possibility to describe attributes of a dataset/distribution #183

metaodi opened this issue Oct 27, 2021 · 11 comments
Assignees

Comments

@metaodi
Copy link
Member

metaodi commented Oct 27, 2021

To my knowledge there is currently no way to describe attributes of a dataset (e.g. columns of a CSV). This would include the following information (minimal):

  • name of the attribute
  • description of the attribute
  • datatype/allowed values of the attribute

On http://data.stadt-zuerich.ch we provide this information on a dataset level (i.e. it does not differ between distributions).

Example: Daten der Verkehrszählung zum motorisierten Individualverkehr (Stundenwerte), seit 2012

Attributes of a dataset

@AFoletti
Copy link

+1 ! 👍
This is to be honest an ongoing discussion of mine with the opendata.swiss team. It is possible (that's the "Data Dictionary" part of the Datastore plugin, which is present on the opendata.swiss CKAN installation) and available in the CKAN edit interface, but not implemented in the frontend GUI.
I did not really understand the reason why it's a... aehm... half-assed implementation at the moment. But it's nice to see someone other than me finds it useful 😉

@Juan-Juan-1
Copy link

@metaodi very interesting. This seems to be what in statistics is called https://ec.europa.eu/eurostat/web/sdmx-web-services/data-struct-def. It's definitely worth discussing. A couple of questions:

  • To my knowledge this information is usually not provided through the DCAT-Layer. Do you know of any example?
  • Wouldn't it be more efficient, especially from a user perspective, to provide this information as a separated resource, i.e. downloadble resource?

@AFoletti
Copy link

@metaodi very interesting. This seems to be what in statistics is called https://ec.europa.eu/eurostat/web/sdmx-web-services/data-struct-def. It's definitely worth discussing. A couple of questions:

  • To my knowledge this information is usually not provided through the DCAT-Layer. Do you know of any example?
  • Wouldn't it be more efficient, especially from a user perspective, to provide this information as a separated resource, i.e. downloadble resource?

The information on the page and as downloadable resource (frictionless datapackage or similar) are in my opinion complementary. It is nice for a power user to have the datapackage, but you also have to account for the more casual audience unable to work with such a file. For those, a table with the attributes description and types could do wonders to correctly understand the data.
Of course, just my two cents

@metaodi
Copy link
Member Author

metaodi commented Oct 27, 2021

I think it's an important part of the metadata to be able to find/search for attributes.

On data.stadt-zuerich.ch all attributes and their descriptions are part of the search index, so you can find a dataset by the description of it's data.

I honestly don't know why this is not part of DCAT so far. But I'm sure this is the reason for it's current implementation on opendata.swiss 😉

@Juan-Juan-1
Copy link

Just fyi: Some data publishers still found a way to somehow bring this information to the users:
https://opendata.swiss/de/dataset/covid-19-schweiz
But yeah, I agree it should be easier to do and maybe in a more visible fashion.

@sabinem
Copy link
Contributor

sabinem commented Oct 30, 2021

@metaodi Your issue really resonates with me, since this was also a question that was sort of always on my mind. I am myself coming from the datascience side and without proper description of the fields, tabular data such as csv files can't really be used for data analysis.

But this issue is not an issue of DCAT-AP CH: it is already build into DCAT, that does not offer any vocabulary in that regard.

Therefore Inspired by your cause, I raised an issue with DCAT to better understand DCAT's reasoning on this. The discussion there might interest you and maybe you also want to join in: w3c/dxwg#1418

@Juan-Juan-1
Copy link

I feel that DCAT doesn't and shouldn't have too much

@metaodi very interesting. This seems to be what in statistics is called https://ec.europa.eu/eurostat/web/sdmx-web-services/data-struct-def. It's definitely worth discussing. A couple of questions:

  • To my knowledge this information is usually not provided through the DCAT-Layer. Do you know of any example?
  • Wouldn't it be more efficient, especially from a user perspective, to provide this information as a separated resource, i.e. downloadble resource?

The information on the page and as downloadable resource (frictionless datapackage or similar) are in my opinion complementary. It is nice for a power user to have the datapackage, but you also have to account for the more casual audience unable to work with such a file. For those, a table with the attributes description and types could do wonders to correctly understand the data. Of course, just my two cents

I agree... DCAT is the upper, "generic" information layer on data (data catalogue vocabulary) with interoperability as a primary goal - it shouldn't go too deep and mix with domain standards like SDMX, FHIR,... it should just reference the necessary information to understand and use data (see for instance https://www.w3.org/TR/vocab-dcat-2/#Property:distribution_conforms_to). Yet having a standardized form to describe and present variables could be really valuable...!

@metaodi
Copy link
Member Author

metaodi commented Nov 2, 2021

Comment by @makxdekkers in w3c/dxwg#1418:

I agree with @rob-metalinkage that adding specificity to the 'general' property conformsTo is the role of a profile. For example, the European DCAT-AP adds details: for Dataset, it refers to "an implementing rule or other specification" while for Distribution, it specifies "an established schema". Both fit in the general semantics of conformsTo. But if for some reason, an application would find this still too vague -- maybe because a stronger need for validation -- the profile could create subclasses of conformsTo, e.g. conformsToSpec and conformsToSchema.

Maybe then this group could investigate whether there is a set of 'common' subproperties of conformsTo for the description of datasets that could be added to DCAT?

So this could very well be something DCAT-AP Switzerland could define without violating the DCAT Standard.

@sabinem
Copy link
Contributor

sabinem commented Nov 14, 2021

What about adding dct:conformsTo as optional or even recommended property on dcat:Distribitution and dcat:Dataset. For users it would be very helpful to have that link to the dataset structure especially on Distributions. On the Dataset level, this could also help to better distinguish geodata by giving them a
conformsTo:<https://www.geocat.admin.ch/en/dokumentation/gm03.html> whereas dcat:Datasets get a
conformsTo:<https://dcat-ap.ch/>

@tlorusso
Copy link

In the current version of the draft i see the 'conforms-to' property only at the dataset level (https://www.dcat-ap.ch/releases/2.0/dcat-ap-ch.html#dataset-conforms-to). Is it planned to add it at the distribution-level too or will it be limited to dcat:Dataset?

@sabinem
Copy link
Contributor

sabinem commented Sep 9, 2022

@tlorusso The property conformsTohas been added on both Dataset and Distribution: see here for the property on the class Distribution: https://www.dcat-ap.ch/releases/2.0/dcat-ap-ch.html#distribution-linked-schemas. Hope that answers your question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants