Bug in Dataset / DataCatalog relationship #1066

Closed
danbri opened this Issue Mar 30, 2016 · 9 comments

Projects

None yet

4 participants

@danbri
Contributor
danbri commented Mar 30, 2016 edited

http://schema.org/includedDataCatalog seems to point the wrong way :(

“N.B. – Do not on any account attempt to write on both sides of the paper at once.”
― W.C. Sellar, 1066 and All That: A Memorable History of England

@danbri danbri self-assigned this Apr 6, 2016
@danbri
Contributor
danbri commented Apr 6, 2016

/cc @vholland

According to http://schema.org/docs/releases.html this was renamed from http://schema.org/catalog in the v2.0 terminology cleanup.

Currently we have:

These both point from Dataset to DataCatalog. The original wording was poor, we had "A data catalog which contains a dataset."; I think it should have been "A data catalog which contains this dataset.".

The new wording is wrong, in that catalogs are the containers for datasets rather than the other way around. It is always easier to change textual definitions than term IDs but in this case I suggest we do both.

Also note that http://schema.org/DataCatalog has http://schema.org/dataset property, defined as "A dataset contained in a catalog." - this is the same relationship named in the opposite direction (and also with awkward wording).

# Proposals

  • Rename includedDataCatalog includedInDataCatalog using supersededBy
  • Update includedInDataCatalog (and includedDataCatalog and catalog) definition "A data catalog which contains this dataset.".
  • Mark includedInDataCatalog and dataset as inverseOf each other.
  • Update dataset definition "A dataset contained in this catalog."
@danbri danbri added this to the sdo-deimos release milestone Apr 6, 2016
@danbri danbri pushed a commit that referenced this issue Apr 6, 2016
Dan Brickley Fixes around Dataset / DataCatalog for #1066 baa6d24
@danbri
Contributor
danbri commented Apr 6, 2016

Ok, sanity checks welcomed. Queued for next release:

@vholland
Contributor
vholland commented Apr 6, 2016

+1

@joshsh
Contributor
joshsh commented Apr 7, 2016

In this case, why not includedInCatalog and includesDataset, for symmetry? Alternatively, containedInCatalog and containsDataset. Note that dataset agrees with dcat:dataset, but that there is no dcat:catalog.

@danbri
Contributor
danbri commented Apr 7, 2016

This is mostly because we have a huge vocabulary so imposed the belated discipline of avoiding terms that could have multiple independent meanings. When v2.0 shipped 'catalog' was deemed too general for a property name (it might mean very different things in a digital library versus ecommerce versus datasets setting); whereas a 'dataset' property pretty much does what it says on the tin.

Unfortunately the name 'includedDataCatalog' was misnamed based on the impression that the catalog was within the dataset rather than vice-versa. Hence this tweak. Is it bearable, @joshsh ?

@chaals
Contributor
chaals commented Apr 7, 2016

Works for me

@joshsh
Contributor
joshsh commented Apr 7, 2016

Hi @danbri. That makes sense. I was suggesting that dataset be renamed not because it is ambiguous, but so that it is more obviously the inverse of the new includedInDataCatalog. dataset/catalog --> includesDataset/includedInDataCatalog.

@danbri
Contributor
danbri commented Apr 7, 2016

Thanks. Yes, there's a tradeoff between verbosity and consistency. Given that it is already called 'dataset' and that it has been that name all along, I'm not feeling a very strong case for changing it to a longer name. So we pay the price of the two properties being named in different styles. On the positive side, I have actually marked them in the schemas as inverseOf each other now, so they are properly cross-linked.

Another thing we could do on the usability front is to collect a few more inspirational examples and add them to the site. Any suggestions?

@danbri danbri closed this May 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment