-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving Dataset descriptions #1083
Comments
This is also related to #975 for versioning dependencies, particularly on datasets (discussed in more detail in https://research.science.ai/article/web-first-data-citations). |
See also #1066 for a quick bugfix (spotted by Natasha too) |
A couple of follow up comments:
|
Notes from a F2F meeting on lifescience datasets
|
See also http://scholarly.vernacular.io/ w.r.t. data citation /cc @darobin |
Hi! http://meta.schema.org I have „The requested URL / was not found on this server”. Best, |
Filed #1189 re datasetTimeInterval |
Most of these suggestions are now implemented/committed and published on our draft webschemas.org site for review: http://webschemas.org/docs/releases.html#g1083 The corresponding pull request was #1247 I copy here some supporting notes. Of all these points, only the overlap with releasedEvent remains unexplored. CHANGES 1.) for temporal and spatial coverage. As of v3.0 we have: Relating to Dataset specifically, http://schema.org/spatial (Dataset -> Place), http://schema.org/temporal (Dataset -> Datetime), The temporal property superseded by the awkwardly named http://schema.org/datasetTimeInterval - http://schema.org/datasetTimeInterval (Dataset -> Datetime), Relating to CreativeWork, http://schema.org/contentLocation (CreativeWork -> Place), http://schema.org/locationCreated (CreativeWork -> Place), Note also http://schema.org/releasedEvent which structures things a little differently, grouping place/time within an Event. PROPOSAL: 1a. a minor detail re releasedEvent, but documenting here: 1b. 1c. 1d. spatialCoverage: "The spatialCoverage of a CreativeWork indicates the place(s) which are the focus of some work. It is a subproperty of temporalCoverage: "The temporalCoverage of a CreativeWork indicates the period that the content applies to, i.e. that it describes. In 1e. spatialCoverage subPropertyOf contentLocation. temporalCoverage equivalentProperty http://purl.org/dc/terms/temporal |
So we have arrived at the names spatialCoverage and temporalCoverage, after all. Agreed that they are appropriate for other CreativeWorks, and it's nice to have the explicit mapping into DCMI Terms. |
Yes, I think this terminology bridges well with usage elsewhere, as well as better connecting schema.org dataset description with the approach for other kinds of CreativeWork. Does this work ok for others following along here? |
On reflection, and after further feedback, I believe variableMeasured would be a more appropriate name for this property. I'll work on migrating unless anyone objects. |
In addition to the change to singular, it seems that the variableMeasured property is missing PropertyValuePair in the range to comply with the definition. |
In addition to comment from @agbeltran note that Google's use of variableMeasured extends the expected type from text to include URL. |
@danbri should we open a new issue about the two problems with variablesMeasured reported above? |
@agbeltran I believe they're fixed ok in our next release, previewable at http://webschemas.org/variableMeasured - can you confirm? |
Thanks @danbri - I can see that it now complies with the definition as its range is Text or PropertyValue. Maybe what remains to be fixed is the documentation at |
@danbri: is the description actually correct about PropertyValue as range? On Tue, Nov 1, 2016 at 7:29 AM Alejandra Gonzalez-Beltran <
|
Checking this again, both properties singular and plural are live in the pending version: http://pending.schema.org/variablesMeasured The documentation (https://developers.google.com/search/docs/data-types/datasets) refers to the singular variableMeasured, which it is the one we had discussed it was a better option. Right? What is the conclusion about the range? |
@agbeltran I'm sorry the site doesn't make this clear enough, but roughly: schema.org is the official site, updated in named releases several times a year; webschemas.org is the editor's working draft of the proposed next release, typically edited several times a week. In the webschemas version if you look up the obsolete plural variablesMeasured, you will find youself directed to http://pending.webschemas.org/variablesMeasured -> http://attic.webschemas.org/variablesMeasured which is an area we have made for things that are "as good as removed", for complete transparency. For range, yes PropertyValue should be in the range - looks like it needs adding on the Google side. |
Thanks! (I was aware about the releases/working draft but had missed the attic redirection.) |
BTW - the use of the word 'Measured' also has this problem - 'Measure' usually applies to data collection activities with quantitative, but not categorical results. So variableMeasured has the risk that it implicitly excludes datasets where the 'values' are categories rather than numbers. There are precedents from several scientific domains to use the more general term 'Observed' and 'Observation' (rather than Measured and Measurement) to allow for both categories and quantities. SSN [1] & O&M [2] use 'observedProperty' and OBOE [3] has 'ofCharacteristic'. [1] http://w3c.github.io/sdw/ssn/ |
I realize I didn't reply explicitly here @dr-shorthair. I'd like to bring most of SOSA into schema.org (as discussed with SpatialWeb WG) and hope it will address the topic more thoroughly. @agbeltran any thoughts from a bioschemas/lifesci perspective? |
@dr-shorthair But Simon, I would prefer we still give publishers the ability to collect both quantitative and categorical results. Doing that makes data flow tooling easier and systems have a bit more information provided to them for proper analysis by machine learning and humans. I think your stance is from a collection effort primarily. However, my stance is we should consider the data after the collection efforts, which is were value in the data is finally extracted for publishers and mankind. @dr-shorthair Could this be anything, like say "loss of life" as a Result ? http://w3c.github.io/sdw/ssn/#SOSAResult that didn't really specify a "kind" of result and I found the description a bit lacking to determine if there were any limits of its usage. |
Is this prop likely to make it out of pending any time soon? |
I see this was closed a long time ago, but for completeness: However (RDF noise ahead) - |
A bit more explanation here (if you can get to it - I'm not responsible for the permissions here) - https://bitbucket.org/terndatateam/ternplotdata-ontology/src/ef9d9f05b7a3eba915bf8c47708e40fb55d7e1f6/schema/result-types.md |
So if I understand correctly, I can use Also comment here says it is still a proposal, but this is not true anymore? Now it is accepted, no? |
Talking with Natasha Noy about possible improvements around dataset description. Some things to look into:
timestep (dct:accrualPeriodicity)
Related work
This all starts to get into the business of looking inside the dataset, which was discussed at schema.org previously - e.g. see Looking inside tables thread from Omar. Subsequently in W3C CSVW some of these ideas went standards track, in particular a templating mechanism to map tabular data into RDF.
The text was updated successfully, but these errors were encountered: