Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Improving Dataset descriptions #1083
Talking with Natasha Noy about possible improvements around dataset description. Some things to look into:
This all starts to get into the business of looking inside the dataset, which was discussed at schema.org previously - e.g. see Looking inside tables thread from Omar. Subsequently in W3C CSVW some of these ideas went standards track, in particular a templating mechanism to map tabular data into RDF.
This is also related to #975 for versioning dependencies, particularly on datasets (discussed in more detail in https://research.science.ai/article/web-first-data-citations).
A couple of follow up comments:
This was referenced
May 31, 2016
referenced this issue
Jun 3, 2016
Most of these suggestions are now implemented/committed and published on our draft webschemas.org site for review: http://webschemas.org/docs/releases.html#g1083
The corresponding pull request was #1247
I copy here some supporting notes. Of all these points, only the overlap with releasedEvent remains unexplored.
1.) for temporal and spatial coverage.
As of v3.0 we have:
Relating to Dataset specifically,
http://schema.org/spatial (Dataset -> Place),
http://schema.org/temporal (Dataset -> Datetime),
The temporal property superseded by the awkwardly named http://schema.org/datasetTimeInterval -
http://schema.org/datasetTimeInterval (Dataset -> Datetime),
Relating to CreativeWork,
http://schema.org/contentLocation (CreativeWork -> Place),
http://schema.org/locationCreated (CreativeWork -> Place),
Note also http://schema.org/releasedEvent which structures things a little differently, grouping place/time within an Event.
1a. a minor detail re releasedEvent, but documenting here:
spatialCoverage: "The spatialCoverage of a CreativeWork indicates the place(s) which are the focus of some work. It is a subproperty of
temporalCoverage: "The temporalCoverage of a CreativeWork indicates the period that the content applies to, i.e. that it describes. In
spatialCoverage subPropertyOf contentLocation.
referenced this issue
Jul 15, 2016
added a commit
Oct 3, 2016
Thanks @danbri - I can see that it now complies with the definition as its range is Text or PropertyValue.
Maybe what remains to be fixed is the documentation at
@danbri: is the description actually correct about PropertyValue as range?
On Tue, Nov 1, 2016 at 7:29 AM Alejandra Gonzalez-Beltran <
Checking this again, both properties singular and plural are live in the pending version:
The documentation (https://developers.google.com/search/docs/data-types/datasets) refers to the singular variableMeasured, which it is the one we had discussed it was a better option. Right?
What is the conclusion about the range?
referenced this issue
Nov 8, 2016
@agbeltran I'm sorry the site doesn't make this clear enough, but roughly: schema.org is the official site, updated in named releases several times a year; webschemas.org is the editor's working draft of the proposed next release, typically edited several times a week. In the webschemas version if you look up the obsolete plural variablesMeasured, you will find youself directed to http://pending.webschemas.org/variablesMeasured -> http://attic.webschemas.org/variablesMeasured which is an area we have made for things that are "as good as removed", for complete transparency.
For range, yes PropertyValue should be in the range - looks like it needs adding on the Google side.
BTW - the use of the word 'Measured' also has this problem - 'Measure' usually applies to data collection activities with quantitative, but not categorical results. So variableMeasured has the risk that it implicitly excludes datasets where the 'values' are categories rather than numbers.
There are precedents from several scientific domains to use the more general term 'Observed' and 'Observation' (rather than Measured and Measurement) to allow for both categories and quantities. SSN  & O&M  use 'observedProperty' and OBOE  has 'ofCharacteristic'.
@dr-shorthair But Simon, I would prefer we still give publishers the ability to collect both quantitative and categorical results. Doing that makes data flow tooling easier and systems have a bit more information provided to them for proper analysis by machine learning and humans. I think your stance is from a collection effort primarily. However, my stance is we should consider the data after the collection efforts, which is were value in the data is finally extracted for publishers and mankind.
@dr-shorthair Could this be anything, like say "loss of life" as a Result ? http://w3c.github.io/sdw/ssn/#SOSAResult that didn't really specify a "kind" of result and I found the description a bit lacking to determine if there were any limits of its usage.