New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in Dataset / DataCatalog relationship #1066

Closed
danbri opened this Issue Mar 30, 2016 · 9 comments

Comments

Projects
None yet
4 participants
@danbri
Contributor

danbri commented Mar 30, 2016

http://schema.org/includedDataCatalog seems to point the wrong way :(

“N.B. – Do not on any account attempt to write on both sides of the paper at once.”
― W.C. Sellar, 1066 and All That: A Memorable History of England

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Apr 6, 2016

Contributor

/cc @vholland

According to http://schema.org/docs/releases.html this was renamed from http://schema.org/catalog in the v2.0 terminology cleanup.

Currently we have:

These both point from Dataset to DataCatalog. The original wording was poor, we had "A data catalog which contains a dataset."; I think it should have been "A data catalog which contains this dataset.".

The new wording is wrong, in that catalogs are the containers for datasets rather than the other way around. It is always easier to change textual definitions than term IDs but in this case I suggest we do both.

Also note that http://schema.org/DataCatalog has http://schema.org/dataset property, defined as "A dataset contained in a catalog." - this is the same relationship named in the opposite direction (and also with awkward wording).

# Proposals

  • Rename includedDataCatalog includedInDataCatalog using supersededBy
  • Update includedInDataCatalog (and includedDataCatalog and catalog) definition "A data catalog which contains this dataset.".
  • Mark includedInDataCatalog and dataset as inverseOf each other.
  • Update dataset definition "A dataset contained in this catalog."
Contributor

danbri commented Apr 6, 2016

/cc @vholland

According to http://schema.org/docs/releases.html this was renamed from http://schema.org/catalog in the v2.0 terminology cleanup.

Currently we have:

These both point from Dataset to DataCatalog. The original wording was poor, we had "A data catalog which contains a dataset."; I think it should have been "A data catalog which contains this dataset.".

The new wording is wrong, in that catalogs are the containers for datasets rather than the other way around. It is always easier to change textual definitions than term IDs but in this case I suggest we do both.

Also note that http://schema.org/DataCatalog has http://schema.org/dataset property, defined as "A dataset contained in a catalog." - this is the same relationship named in the opposite direction (and also with awkward wording).

# Proposals

  • Rename includedDataCatalog includedInDataCatalog using supersededBy
  • Update includedInDataCatalog (and includedDataCatalog and catalog) definition "A data catalog which contains this dataset.".
  • Mark includedInDataCatalog and dataset as inverseOf each other.
  • Update dataset definition "A dataset contained in this catalog."

@danbri danbri added this to the sdo-deimos release milestone Apr 6, 2016

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Apr 6, 2016

Contributor

Ok, sanity checks welcomed. Queued for next release:

Contributor

danbri commented Apr 6, 2016

Ok, sanity checks welcomed. Queued for next release:

@vholland

This comment has been minimized.

Show comment
Hide comment
@vholland

vholland Apr 6, 2016

Contributor

+1

Contributor

vholland commented Apr 6, 2016

+1

@joshsh

This comment has been minimized.

Show comment
Hide comment
@joshsh

joshsh Apr 7, 2016

Contributor

In this case, why not includedInCatalog and includesDataset, for symmetry? Alternatively, containedInCatalog and containsDataset. Note that dataset agrees with dcat:dataset, but that there is no dcat:catalog.

Contributor

joshsh commented Apr 7, 2016

In this case, why not includedInCatalog and includesDataset, for symmetry? Alternatively, containedInCatalog and containsDataset. Note that dataset agrees with dcat:dataset, but that there is no dcat:catalog.

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Apr 7, 2016

Contributor

This is mostly because we have a huge vocabulary so imposed the belated discipline of avoiding terms that could have multiple independent meanings. When v2.0 shipped 'catalog' was deemed too general for a property name (it might mean very different things in a digital library versus ecommerce versus datasets setting); whereas a 'dataset' property pretty much does what it says on the tin.

Unfortunately the name 'includedDataCatalog' was misnamed based on the impression that the catalog was within the dataset rather than vice-versa. Hence this tweak. Is it bearable, @joshsh ?

Contributor

danbri commented Apr 7, 2016

This is mostly because we have a huge vocabulary so imposed the belated discipline of avoiding terms that could have multiple independent meanings. When v2.0 shipped 'catalog' was deemed too general for a property name (it might mean very different things in a digital library versus ecommerce versus datasets setting); whereas a 'dataset' property pretty much does what it says on the tin.

Unfortunately the name 'includedDataCatalog' was misnamed based on the impression that the catalog was within the dataset rather than vice-versa. Hence this tweak. Is it bearable, @joshsh ?

@chaals

This comment has been minimized.

Show comment
Hide comment
@chaals

chaals Apr 7, 2016

Contributor

Works for me

Contributor

chaals commented Apr 7, 2016

Works for me

@joshsh

This comment has been minimized.

Show comment
Hide comment
@joshsh

joshsh Apr 7, 2016

Contributor

Hi @danbri. That makes sense. I was suggesting that dataset be renamed not because it is ambiguous, but so that it is more obviously the inverse of the new includedInDataCatalog. dataset/catalog --> includesDataset/includedInDataCatalog.

Contributor

joshsh commented Apr 7, 2016

Hi @danbri. That makes sense. I was suggesting that dataset be renamed not because it is ambiguous, but so that it is more obviously the inverse of the new includedInDataCatalog. dataset/catalog --> includesDataset/includedInDataCatalog.

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Apr 7, 2016

Contributor

Thanks. Yes, there's a tradeoff between verbosity and consistency. Given that it is already called 'dataset' and that it has been that name all along, I'm not feeling a very strong case for changing it to a longer name. So we pay the price of the two properties being named in different styles. On the positive side, I have actually marked them in the schemas as inverseOf each other now, so they are properly cross-linked.

Another thing we could do on the usability front is to collect a few more inspirational examples and add them to the site. Any suggestions?

Contributor

danbri commented Apr 7, 2016

Thanks. Yes, there's a tradeoff between verbosity and consistency. Given that it is already called 'dataset' and that it has been that name all along, I'm not feeling a very strong case for changing it to a longer name. So we pay the price of the two properties being named in different styles. On the positive side, I have actually marked them in the schemas as inverseOf each other now, so they are properly cross-linked.

Another thing we could do on the usability front is to collect a few more inspirational examples and add them to the site. Any suggestions?

@danbri

This comment has been minimized.

Show comment
Hide comment
Contributor

danbri commented May 20, 2016

@danbri danbri closed this May 20, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment