Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CategoryCode Proposal (formerly EnumerationValue Proposal) #894

Closed
RichardWallis opened this issue Nov 13, 2015 · 60 comments
Closed

CategoryCode Proposal (formerly EnumerationValue Proposal) #894

RichardWallis opened this issue Nov 13, 2015 · 60 comments

Comments

@RichardWallis
Copy link
Contributor

Background

When marking-up with Schema.org there is often need to associate the Thing being described with a pre-defined value - a type, category, subject, topic, definition, etc.

In certain specific cases the vocabulary handles this using Enumerations and Enumeration subtypes to provide a specific type value. For example BookFormatType, which has subtypes of EBook, HardCover, PaperBack, and in the bib.schema.org extension, http://bib.schema.org/GraphicNovel. This mechanism works well with enumerations containing a small number of enumeration types and of fairly static content.

Where it is not practical, or desired, for Schema.org to become the authority for many, various, and or large sets of values, external enumerations are recommended. In a blog post referenced from the Schema.org documentation the mechanism for external enumerations is introduced for referencing lists of values external to the vocabulary.

What could be viewed as a compromise between these approaches is demonstrated in the DayOfWeek Type. It could be argued that the actual days of the week should have been defined in Schema.org, Monday, Tuesday, etc., as subtypes of DayOfWeek. Instead values in the GoodRelations vocabulary, for days of the week, are documented as commonly used. Thus both encouraging the use of external values whilst, expressing implied preference for a particular external set of values.

Markup for External Enumerations
What Schema.org does not yet address however, is the markup of external enumeration values in the context of them being shared on the web. The use-cases for this include potential addition of Schema.org markup for existing sets of values and for the creation of new sets. Examples include adding Schema markup to values, often referred to as authorities in the library domain, for subjects and persons at national level libraries such as The Library of Congress; the markup of a new authoritative list of sports types, or bank account types, or medical treatment types.

Previous discussions [1] [2] referencing an earlier MiniSKOS proposal provide background and some use cases to view this simple proposal against.

Proposal

This proposal consists of:

  • A new Type - EnumerationValue - a subtype of Enumeration ."An enumeration value"
  • A new Type - EnumerationValueSet - a subtype of CreativeWork. "A set of enumeration values."
  • Three new properties:
    • valueCode - Domain: EnumerationValue Range: Text. "Provides the ability to share item codes or similar which are often a key value in existing code sets.:
    • partOfValueSet - sub-property of isPartOf. Domain: EnumerationValue Range: EnumerationValueSet. "The value set of which this value is part of."
    • hasEnumerationValue - sub-property of hasPart Domain: EnumerationValueSet Range: EnumerationValue

Definition RDFa

<div typeof="rdfs:Class" resource="http://schema.org/EnumerationValue">
  <span class="h" property="rdfs:label">EnumerationValue</span>
  <span property="rdfs:comment">An enumeration value.</span>
  <span>Subclass of: <a property="rdfs:subClassOf" href="http://schema.org/Enumeration">Enumeration</a></span>
</div>

<div typeof="rdfs:Class" resource="http://schema.org/EnumerationValueSet">
  <span class="h" property="rdfs:label">EnumerationValueSet</span>
  <span property="rdfs:comment">A set of enumerated values.</span>
  <span>Subclass of: <a property="rdfs:subClassOf" href="http://schema.org/CreativeWork">CreativeWork</a></span>
</div>

<div typeof="rdf:Property" resource="http://schema.org/enumerationValueCode">
    <span class="h" property="rdfs:label">enumerationValueCode</span>
    <span property="rdfs:comment">A short textual code that uniquely identifies the value. The code is typically used in structured URLs.</span>
    <span>Domain: <a property="http://schema.org/domainIncludes" href="http://schema.org/EnumerationValue">EnumerationValue</a></span>
    <span>Range: <a property="http://schema.org/rangeIncludes" href="http://schema.org/Text">Text</a></span>
</div>

<div typeof="rdf:Property" resource="http://schema.org/partOfEnumerationValueSet">
    <span class="h" property="rdfs:label">partOfEnumerationValueSet</span>
    <span property="rdfs:comment">The set (enumeration) of values of which contains this value.</span>
    <link property="rdfs:subPropertyOf" href="http://schema.org/isPartOf" />
    <span>Domain: <a property="http://schema.org/domainIncludes" href="http://schema.org/EnumerationValue">EnumerationValue</a></span>
    <span>Range: <a property="http://schema.org/EnumerationValueSet" href="http://schema.org/Text">EnumerationValueSet</a></span>
</div>

<div typeof="rdf:Property" resource="http://schema.org/hasEnumerationValue">
    <span class="h" property="rdfs:label">hasEnumerationValue</span>
    <span property="rdfs:comment">Value contained in value set.</span>
    <link property="rdfs:subPropertyOf" href="http://schema.org/hasPart" />
    <span>Domain: <a property="http://schema.org/domainIncludes" href="http://schema.org/EnumerationValueSet">EnumerationValueSet</a></span>
    <span>Range: <a property="http://schema.org/EnumerationValue" href="http://schema.org/Text">EnumerationValue</a></span>
</div>

Examples (Turtle)

1- A Library of Congress resource type

    <http://id.loc.gov/vocabulary/resourceTypes/Man>
       a schema:EnumerationValue;
       schema:name "Manuscript";
       schema:enumerationvalueCode "Man";
       schema:partOfEnumerationValueSet <http://id.loc.gov/vocabulary/resourceTypes>.

2- An animal classification term and term set

    <http://mammals.example.com/Carnivore>
       a schema:EnumerationValue;
       schema:name "Carnivore";
       schema:description "A mammal that feeds on other animals";
       schema:partOfEnumerationValueSet <http://mammals.example.com>;
       schema:sameAs <https://www.wikidata.org/wiki/Q81875>.

    <http://mammals.example.com>
      a schema:EnumerationValueSet;
      schema:name "The Mammal Classification List".

3- Terms in a dictionary of legal terms

    <http://openjurist.org/dictionary/Ballentine>
      a schema:CreativeWork, schema:EnumerationValueSet;
      schema:name "Ballentine’s Law Dictionary".

    <http://openjurist.org/dictionary/Ballentine/term/schema> 
      a schema:EnumerationValue;
      schema:name "schema";
      schema:description "A representation of a plan or theory in the form of an    outline or model.";
      schema:partOfEnumerationValueSet <http://openjurist.org/dictionary/Ballentine>.

    <http://openjurist.org/dictionary/Ballentine/term/calendar-year> 
      a schema:EnumerationValue;
      schema:name "calendar year";
      schema:description "The period from January 1st to December 31st, inclusive, of any year.";
      schema:sameAs <https://www.wikidata.org/wiki/Q3186692>;
      schema:partOfEnumerationValueSet <http://openjurist.org/dictionary/Ballentine>.

4- A occupation term defined by O*Net Online

    <http://onetonline.org/link/details/51-6042.00> 
      a schema:EnumerationValue;
      schema:enumerationValueCode "51-6042.00";
      schema:name "Shoe Machine Operators and Tenders";
      schema:description "Operate or tend a variety of machines to join, decorate, reinforce, or finish shoes and shoe parts.";
      schema:partOfEnumerationValueSet <http://onetonline.org>.

5- An ISO639-2 Language Code

    <http://id.loc.gov/vocabulary/iso639-2>
      a schema:EnumerationValueSet;
      schema:name "ISO 639-2: Codes for the Representation of Names of Languages";
      schema:hasEnumerationValue <http://id.loc.gov/vocabulary/iso639-2/cze>.

    <http://id.loc.gov/vocabulary/iso639-2/cze> 
      a schema:EnumerationValue;
      schema:enumerationValueCode "cze";
      schema:name "Czech"@en;
      schema:name "tchèque"@fr;
      schema:name "Tschechisch"@de;
      schema:partOfEnumerationValueSet <http://id.loc.gov/vocabulary/iso639-2>.
@stuartasutton
Copy link

This is a very useful proposal for all of us developing markup of external enumerations of considerable size and working with organizations for the deployment of such enumerations in the context of sharing them on the web. I'm pleased to see a pull request in place.

@westurner
Copy link
Contributor

In OWL, are these "Object Property Restrictions" [edit slash "Data Property Restrictions"] like owl:allValuesFrom and owl:someValuesFrom?

https://www.w3.org/TR/owl2-quick-reference/#Class_Expressions

... The class / instance distinction here is less than clear.

@Dataliberate
Copy link
Contributor

Firstly as with the rest of Schema.org there are no implied constraints, rules, or inference implied in this proposal. That is not to say that a set of values applicable to a specific situation could not be described (and published using Schema) using the types and properties proposed here.

It would be up to an individual application to apply its own internal [OWL] rules to indicate the all the EnumerationValues that are 'part of' a particular EnumerationValueSet are valid in a specific circumstance. However, in Schema markup no such inference could be made.

@westurner
Copy link
Contributor

However, in Schema markup no such inference could be made.

@Dataliberate
Copy link
Contributor

On 23 January 2016 at 07:40, Wes Turner wrote:

However, in Schema markup no such inference could be made.

That is correct. The Schema.org use case is to enable mark up of structured
data within html. Not to provide inference over, or ontological control of,
a data set.

A recommended read that lays out the underlying principles and history of
Schema.org: http://queue.acm.org/detail.cfm?id=2857276

@philbarker
Copy link
Contributor

@Dataliberate Q: what is the difference between (or relation between) the proposed EnumerationValueSet and http://schema.org/Enumeration ?

@Dataliberate
Copy link
Contributor

@philbarker good question!

The main relation is between EnumerationValue and Enumeration.

The relation being very similar to that of the core Schema.org vocabulary, hosted extensions such as auto.schema.org, and external extensions.

In theory any value, or identifier for that value, could be defined in the Schema.org vocabulary as an Enumeration-subtype type. As per current examples - OrderInTransit a subtype of OrderStatus, Paperback a subtype of BookFormatType, etc. all themselves subtypes of Enumeration.

However, other than for commonly known/used values, it is not practical to burden the vocabulary, and the agreement process for managing it, with the maintenance of all the potential lists of values for such things. Wanting to address the need to be able to mark up, using Schema.org, these terms and values that probably will never get assigned in the vocabulary, is what is behind the proposal for EnumerationValue.

EnumerationValue could be considered an 'external Enumeration value'. So in answer to your question, they are closely related at least in how they are/would be used. So much so, that I am considering updating the proposal to make EnumerationValue a subtype of Enumeration.

There are already in existence many candidates for terms that could be marked up in Schema using this approach. These examples often have properties in addition to their URI value (name, description, code, etc.) and are often grouped together in sets/dictionaries/terms such as the Library of Congress Subject Headings. That style of need being catered for with the EnumerationValueSet and valueCode property in the proposal.

Hope that helps.
~Richard.

@stuartasutton
Copy link

@Dataliberate Q: Did you consider making EnumerationValue a subtype of Enumeration? I think it would be useful.

@Dataliberate
Copy link
Contributor

Yes I did, and having be asked a couple of times about it, I have concluded that it would be the right thing to do.

So my @RichardWallis persona has just done it.

@nichtich
Copy link

nichtich commented Apr 4, 2016

I'm trying to translate this proposal for those familiar with SKOS. Please correct my if I'm wrong:

  • schema:EnumerationValueSetskos:ConceptScheme
  • schema:EnumerationValueskos:Concept
  • schema:enumerationValueCodeskos:notation
  • schema:partOfEnumerationValueSetskos:inScheme
  • schema:hasEnumerationValue ≈ inverse of skos:inScheme (in some cases skos:hasTopConcept)

I miss counterparts of skos:related (see #582) and skos:broader/skos:narrower (see #251).

@Dataliberate
Copy link
Contributor

@nichtich your list of approximate relations to SKOS terms is about right for the proposal as it stands. Glad you used '≈'.

As described above this is a simple proposal mainly targeted at simple use cases. For example an already existent list of values for some types of things (eg. The list of ISO639-2 Language Codes). Many of these do not have any of the hierarchy or relationship concepts that would require the extra modelling power of SKOS (related,broader,narrower,exactMatch,etc.).

Yes it could be applied to sets of terms already defined in SKOS, but for an initial simple proposal adding much more would a) Introduce complexity; b) Consequentially reduce the potential for broad adoption across the [mostly non-SKOS] web.

The several issues/threads, dedicated to the re-creation of SKOS in Schema, that as yet are to come to a satisfactory conclusion are I believe symptomatic of a lack of a view of where it would be implemented widely.

My approach in making this proposal was to take something simple, with obvious simple use cases, that could possibly be used to partially address more complex issues. If we implement it and it gets used, we would have a real foundation with real usage to build on for future extension/enhancement.

Meanwhile the much wider discussions around similarity, relatedness, matching, and sameAs can come to a natural conclusion in their own time and future proposals

So I think we should continue with this in its current state.

@ldodds
Copy link
Contributor

ldodds commented Jan 19, 2017

@Dataliberate I like this approach, fits well with what I need for my current project. However I also need to relate an EnumerationValue to its parent.

This would be equivalent to skos:broader. Supporting skos:narrower might also be useful.

@ldodds
Copy link
Contributor

ldodds commented Jan 19, 2017

Further question: is it still recommended to add types for these category values, as defined in the original guidance. Or is this route only intended whether that is not useful/advised?

I'm having a hard time deciding which option might be best, so was hoping I could use this approach and later add some types & extra semantics if necessary.

@RichardWallis
Copy link
Contributor Author

RichardWallis commented Jan 19, 2017

Update:
Proposals to change the naming of the terms in this proposal (eg. from EnumerationValue to CategoryCode) have now been published complete with examples on the webschemas.org preview site:

@ldodds Part of the motivations behind this proposal were previous discussions about if Schema.org should include/support/reference SKOS and if so by how much. It was designed as a very lightweight approach that could be built upon based usage experience. I would suggest that, at least initially, broader/narrower hierarchical relationships between values would be best handled in localised data structures that the [proposed] Schema types would be added to for wider sharing.

As per the examples, it depends on what is already in place as to the final modelling of a set of CategoryCodes. If the terms are already defined (in SKOS or something else), it would be a matter of adding further Schema Types. For example a term could be defined as being both a skos:Concept and a schema:CategoryCode.

@RichardWallis RichardWallis changed the title EnumerationValue Proposal CategoryCode Proposal (formerly EnumerationValue Proposal) Jan 19, 2017
@ldodds
Copy link
Contributor

ldodds commented Jan 19, 2017

@RichardWallis I'm not sure what you mean by "best handled in localised data structures", that we'd need to define a custom set of properties?

For the openactive project none of these category code sets exists as SKOS, or even as publicly available data for the most part. So part of my interest here is in helping that data be made open. The broader/narrower relationships are quite important for tieing together physical activities. Also for many, many different controlled vocabularies.

Their addition here seems like a relatively small change to me?

@RichardWallis
Copy link
Contributor Author

@ldodds By localised structures I meant where they were already defined (in SKOS for instance). In such cases what you describe would already be in place if needed.

As to starting from scratch I can understand your desire, in this use case, to introduce broader/narrower into Schema.

In isolation it does seem like a small addition. However, it would also potentially introduce some assumptions about things marked up as CategoryCode types and the relationships between them that do not exist. SKOS, and hence its terms, assumes an organised structure of terms, such as in some controlled vocabularies. Whereas CategoryCodes could be applied to disconnected things with no such relationships or hierarchy.

In previous discussions around possibly including SKOS in Schema, potential issues about introducing a too constraining structure, were raised. Also the issue of where you draw the line as to which terms would/would not be a small change to include was subject of some debate.

Although such relationships are important within controlled vocabularies, I wonder if or how they would be used by data consumers

At this stage I am still inclined to keep this proposal as simple as possible, which in itself will be a major step forward in this area. Looking to future proposals to possibly extended it based on experience of implementation.

@thadguidry
Copy link
Contributor

thadguidry commented Nov 22, 2017

@danbri I personally prefer to just leave CategoryCode as a subtype. That aligns with DMCI Abstract Model and in particular their nearly equivalent usecase with VocabularyEncodingScheme as a Class http://dublincore.org/documents/dcmi-terms/#section-8

Anyways, can we just get DefinedTerm already ? :) Its Simple English and will foster more growth compared to "EnumerationBlah's" and "VocabularyBlah's" (and it comes from @philbarker who makes good judgement calls and has never failed us yet :)

@jvandriel
Copy link

jvandriel commented Nov 22, 2017

I like the idea of having CategoryCode as a subType as well, especially since it represents more closely what marketers are looking for during day to day work. I fear using just TermDefinition will cause many marketers (and the developers that implement markup for them) to overlook it.

@rvguha
Copy link
Contributor

rvguha commented Nov 22, 2017 via email

@RichardWallis
Copy link
Contributor Author

Liking @danbri's suggestion of keeping CategoryCode as a subtype of DefinedTerm.

I will follow that logic through and map out how that would look with DefinedTermSet and relevant properties.

@philbarker
Copy link
Contributor

CategoryCode as a subtype of DefinedTerm sounds good.

I agree with @thadguidry's sentiment: I think these being in pending also inhibits uptake. Any prospect of moving this into the main vocabulary?

@RichardWallis
Copy link
Contributor Author

I have now updated the PR (#1776) to reflect the proposal of making CategoryCode a subtype of TermDefinition - looks good to me.

This is still all in pending - as expressed by others, it would be good to get this in the core.

@thadguidry
Copy link
Contributor

thadguidry commented Jun 23, 2018

@RichardWallis GraphicNovel link is broken at the bottom of https://schema.org/BookFormatType

@hekl
Copy link

hekl commented Jun 25, 2018

I am rather late to this discussiion. First thinking about this from a practical viewpoint. In my institute we have a lot of vocabularies, classifications, termlists. Managing them in a specifc vocabulary tool, taking them out of excel files or normal webpages is my goal these days. Tools like that use SKOS. I might consider adding schema.org equivalents like proposed here. But there it stops for me. I want to use them as separate applications. I do agree that copying SKOS into schema.org is not a good thing. Still, going on about the case of Datasets and DCAT that is also mentioned, I would think that the case for SKOS support in schema.org is strong. Could definedTermSet in the future be one of the type SKOS? I am also triggered by the fact that Google in the schema.org/Dataset now gives support to DCAT as such. Anyway, these are interesting developments.

@dgrahn
Copy link

dgrahn commented Jul 3, 2018

I'm going to jump in here. Does this extension support a range of categories? I.e. My Thing is in categories C-E.

Common in patents.

@thadguidry
Copy link
Contributor

@dgrahn yes, that would be the valueCode now in https://pending.schema.org/CategoryCode as "codeValue" that @RichardWallis proposed. It can be used to hold any "key" or "code" and where there is an associated meaningful value when that "key" or "code" is looked up. Your "C-E" key/code has some associated meaningful value to the publishers or consumers of patents.

 {
                "@type": "CategoryCode",
                "@id": "http://example.org/patentCodes/C/12",
                "identifier": "http://example.org/patentCodes/C/C-E"
                "codeValue": "C-E",
                "name": {
                        "en": "The name of the codeValue if it has one"
                },
                "description": "A fuller description or meaning of the codeValue or its usage.",
                "inCodeSet": "http://example.org/patentCodes"
}

By the way, what does "C-E" mean ?

@dgrahn
Copy link

dgrahn commented Jul 3, 2018

@thadguidry It's an example. Could have done foo-bar.

What if you don't want to have users parse the value?

@thadguidry
Copy link
Contributor

thadguidry commented Jul 3, 2018

@dgrahn sorry, I don't understand. Can you explain further what you mean ? Give us the scenario or problem you have that you are trying to solve. That will help.

@dgrahn
Copy link

dgrahn commented Jul 3, 2018

So categories can sometimes be given as a range. i.e. from C to E. Those category names can actually have "special" characters in them like, - and /. It would be nice to be able to say something like this.

{
  "startCode": "C",
   "endCode": "E"
}

In fact, that's what I'm using right now as an extension of CategoryCode. I was just wondering if there was a pending canonical way to do it.

@thadguidry
Copy link
Contributor

thadguidry commented Jul 3, 2018

@dgrahn what would the parent types be for that scenario ? Can you give an example of the Thing that has a startCode of C and endCode of E ? I'm trying to understand that parent Thing and what is it called in Patent terminology ? If we can understand that better, then maybe we can find an easier path or different way to help.

@dgrahn
Copy link

dgrahn commented Jul 3, 2018

I've been thinking about proposing Patent as a new CreativeWork. But in any case, patents have different classifications depending on where they are granted. One system is the USPC. These classifications can be presented as a range.

Is that making sense?

@thadguidry
Copy link
Contributor

@dgrahn Yeap, makes much more sense now. OK, someone else has a common need and opened an issue for Patents: https://github.com/schemaorg/schemaorg/issues/1863

I would suggest to begin working with the community to create and maintain an extension for Patents (this might involve working with the loose Law proposals also in our issues, just search them).

To begin - See "Extensibility Mechanisms" section and other sections in How We Work

And use our mailing list to begin, or if you want to get formal, you can request a W3C Community, https://www.w3.org/community/schemaorg/

@stuartasutton
Copy link

If I am getting your intention @dgrahn, I don't think CategoryCode fits the bill since it provides the means for expressing a single defined category. So, if you have a range of categories 'C' through 'E', CategoryCode would be useful for expressing each category separately unless the range of 'C' through 'E' can itself be expressed as a category including a definition describing the range. I doubt that such defined expressions actually exist.

@MichaelAndrews-RM
Copy link

MichaelAndrews-RM commented Dec 19, 2018

I wanted to follow up on some discussion on Twitter that seems related to this. First, I think the general proposal is very useful. My chief concern is that some users may find the approach difficult to implement and may need a light weight alternative.

Two hurdles I see to adoption are:

  1. knowing what external enumerations are available that would be helpful
  2. knowing the URIs of these enumerations

I like how the MedicalCode provides an option to use codingSystem and codeValue in addition to being able to specify a CategoryCode with URLs. Users can stick with short text values and not need to enter long URLs several times.

I wonder if we could extend this option to cover any CategoryCode scenario. Instead of having the codingSystem be a text value, schema.org could offer the most commonly used external coding systems as enumerations, so that a new Type would exist: Enumerations > CodingSystemType that could include:

  • GPC (GS1)
  • MeSH (NLM)
  • IPTC codes
  • Getty Thesaurus of Geographic Names (TGN)
  • ISICv4
  • NAICS
  • Standard Occupational Classification codes

Note: only the name of the classification systems would be enumerated, not all the individual codes belonging to that system, which would stay outside of schema.org.

The benefits of having these external codes as enumerated options is that it will point users to potentially helpful classifications, and will save them the effort of finding and entering the URL. The list of enumerated options does not need to be fixed. If they need to indicate a less commonly used classification, users still have the ability to indicate using the CategoryCode type.

I know some of these are already available as dedicated properties, a practice I would guess schema.org does not want to proliferate. Currently their are dedicated properties for:

  • naics
  • isicv4
  • occupationalCategory
  • iatacode
  • icaocode

In pending, there is also a category property that seems to overlap.

One question raised is if users want a quick way to indicate a category value, why can't they use wikidata? While wikidata is useful to resolve entities, it is not the most friendly resource for classifying the kind of thing being described. It has a flat structure and can sometimes be prone to duplication. I think it will be easiest for users to start with a list of values that they can choose from.

@RichardWallis
Copy link
Contributor Author

@MichaelAndrews-RM Creating enumeration types for all the potential classification systems across all useful domains I believe would be beyond the remit of Schema.org and not least a significant task to keep up to date.

With the CategoryCode & CategoryCodeSet and their super-types DefinedTerm & DefinedTermSet there is the flexibility to describe any DefinedTerm/CategoryCode in any DefinedTermSet/CategoryCodeSet as the several examples on those pages show.

Where those may be equivalent to individual terms/categories in external authoritative classification systems or the classification systems themselves the sameAs property can be used in the definitions to assert that equivalence

@MichaelAndrews-RM
Copy link

@RichardWallis I am happy that the proposal can accommodate so many different kinds of classifications. While it is good at addressing the "long tail" of the distribution of many classifications available, it isn't so good at helping people who need to use one of the 5-10 most-frequently cited classification systems because it is so complex, and I doubt webmasters will find it easy to understand and use. Having these classification is a big benefit to support the findability and aggregation of data, but for all scale adoption to occur, search engines will need to be able to promote an easy-to-use way for webmasters to add these codes.

I know that classification code schemes aren't fixed enumerations like days-of-the-week, but I expect the most popular ones are reasonably stable, so that updating them (e.g. from ISICv4 to ISICv5 in the future) would not be too hard to do.

Having said that, I recognize the first priority is getting the CategoryCode available, after which we can assess its mainstream adoption.

@nichtich
Copy link

So the updated proposal would have a mapping to SKOS as following:

  • schema:DefinedTermskos:Concept
  • schema:termCodeskos:notation (if unique within in skos:ConceptScheme)
  • schema:inDefinedTermSetskos:inScheme
  • schema:DefinedTermSetskos:ConceptScheme
  • schema:hasDefinedTerm ≈ inverse of skos:inScheme (in some cases skos:hasTopConcept)

By the way I don't fully get why we need both DefinedTerm and CategoryCode as their difference seems too subtle to understand, but nevermind.

Anyway I think the proposal does not catch the use case of referencing a concept/term/category/... by its code/notation/... For instance how to express the DDC notation of a book is 940.27?

<div itemscope itemtype="http://schema.org/Book">
   <span ???>DDC</span>
   <span ???>940.27</span>
</div>

@RichardWallis
Copy link
Contributor Author

@nichtich Regarding the need for both CategoryCode and DefinedTerm. In principle there there is some duplication. However in practice the broad spectrum of potential applications, from coding schemes such as DDC, as you reference, to glossary & dictionary entries, and name authorities give rise to concerns about the appropriateness of naming. The pragmatic solution being to provide suitable types for those concerned with describing terms (words, names, acronyms, phrases, etc.) and those wanting to capture codes.

As to your question about usage... There are a few ways that this could be described using CategoryCode / CategoryCodeSet for example (in JSON-LD)

{
    "@context": "http://schema.org",
    "@type": "Book",
    "name": "The French Revolution and Napoleon",
    "author": "Leo Gershoy",
    "about": [
        {
            "@type": "CategoryCode",
            "name": "",
            "codeValue": "944.04",
            "inCodeSet": {
                "@type": "CategoryCodeSet",
                "name": "Dewey Decimal Classification",
                "alternateName": "DDC",
                "sameAs": "http://www.wikidata.org/entity/Q48460"
            }
        },
        {
            "@type": "CategoryCode",
            "name": "France--History--Revolution, 1789-1799",
            "codeValue": "85051319",
            "inCodeSet": {
                "@type": "CategoryCodeSet",
                "name": "Library of Congress Subject Headings",
                "alternateName": "LCSH",
                "sameAs": ["http://id.loc.gov/authorities/subjects","http://www.wikidata.org/entity/Q1823134"]
            },
            "sameAs": "http://id.loc.gov/authorities/subjects/sh85051319"
        }
    ]
}

Note: due to licensing restrictions, the sharing of DDC classification names is difficult. Other schemes are not so restrictive - I have included a Library of Congress Subject heading to demonstrate.

As the LCSH data is public and online a potential more light-weight version could be like this:

        {
            "@type": "CategoryCode",
            "name": "France--History--Revolution, 1789-1799",
            "codeValue": "85051319",
            "inCodeSet": {
                "@type": "CategoryCodeSet",
                "@id": "http://id.loc.gov/authorities/subjects"
            },
            "sameAs": "http://id.loc.gov/authorities/subjects/sh85051319"
        }

or possibly:

        {
            "@type": "CategoryCode",
            "name": "France--History--Revolution, 1789-1799",
            "@id": "http://id.loc.gov/authorities/subjects/sh85051319",
            "inCodeSet": {
                "@type": "CategoryCodeSet",
                "@id": "http://id.loc.gov/authorities/subjects"
            }
        }

How these would be represented in Microdata would depend on individual implementations, but here is one possible way:

<div>
  <div itemtype="http://schema.org/Book" itemscope>
    Title: <span itemprop="name">The French Revolution and Napoleon</span>
    <span itemprop="author">"Leo Gershoy" </span>
    <div itemprop="about" itemtype="http://schema.org/CategoryCode" itemscope>
      <div itemprop="inCodeSet" itemtype="http://schema.org/CategoryCodeSet" itemscope>
        <span itemprop="alternateName">DDC</span>: 
        <link itemprop="sameAs" href="http://www.wikidata.org/entity/Q48460" />
        <meta itemprop="name" content="Dewey Decimal Classification" />
      </div>
      <span itemprop="codeValue">944.04</span>
    </div>
    <div itemprop="about" itemtype="http://schema.org/CategoryCode" itemscope>
        <div itemprop="inCodeSet" itemtype="http://schema.org/CategoryCodeSet" itemscope>
            <meta itemprop="name" content="Library of Congress Subject Headings" />
            <link itemprop="sameAs" href="http://id.loc.gov/authorities/subjects" />
            <link itemprop="sameAs" href="http://www.wikidata.org/entity/Q1823134" />
            <span itemprop="alternateName">LCSH</span>:
        </div>
        (<span itemprop="codeValue">85051319</span>)
        <span itemprop="name">France--History--Revolution, 1789-1799"</span>
        <link itemprop="sameAs" href="http://id.loc.gov/authorities/subjects/sh85051319" />
    </div>
  </div>
</div>

@dr-shorthair
Copy link

Is this necessary? Why not just use SKOS?

@RichardWallis
Copy link
Contributor Author

@dr-shorthair
From the Background of the initial proposal:

Previous discussions [1] [2] referencing an earlier MiniSKOS proposal provide background and some use cases to view this simple proposal against.

Also in response to a similar question/comment:

Part of the motivations behind this proposal were previous discussions about if Schema.org should include/support/reference SKOS and if so by how much. It was designed as a very lightweight approach that could be built upon based usage experience. I would suggest that, at least initially, broader/narrower hierarchical relationships between values would be best handled in localised data structures that the [proposed] Schema types would be added to for wider sharing.

The previous discussions, over several years, concluded that adopting SKOS [in Schema.org], in whole or part, would not fit well with the [current] approach and use cases for the vocabulary.

Currently the terms created are located in the pending section of the vocabulary. In the future, as they become widely adopted, I expect a proposal to move them into the core of the vocabulary. At that time it may be appropriate to also suggest that the terms should be mapped to their SKOS equivalents.

@RichardWallis
Copy link
Contributor Author

Implemented in PR #1255

@rosepac
Copy link

rosepac commented Oct 20, 2020

Is there still no suitable formula with which to mark a glossary page with many terms included? Thank s for all. Because I have several pages with complete dictionaries. And the truth is that they are quite despised by "google" and its indexing.

Maybe this would help make it worthwhile. Something so useful, like creating glossaries. If not. They are a real waste of work time. It's not worth spending your time creating a glossary / dictionary. If then a book sale that carries that term in the title is going to have total preference ...

There is no formula that revalues ​​this type of content (such as glossaries). If they know it. Could you suggest me an idea.
Or directly. I generate FAQ and I turn each term into a question .. Because I cannot find another utility.

For example (https://ciberninjas.com/glosario/completo-tecnologias-python/) is a dictionary of terms related to Python.
I add What is x? To each of the terms. And I generate a FAQ of 48 questions, of the style What is asyncio? What is the bee? Etc.

The result will be 48 "what is" questions on a single page? Isn't Google going to penalize me if I do that? If there is no specific markup for this type of publication.
Any worker or connoisseur of the topic present? Do you know someone you can ask about that?

Sorry for me bad english. Thousand pardons. I hope you have been able to understand me, more or less.

@philbarker
Copy link
Contributor

@rosepac I think you are looking for DefinedTermSet and DefinedTerm. The example for DefinedTermSet (at the bottom of the page) shows how to use it for a dictionary; a glossary would be the same.

Whether Google likes the markup is beyond the scope of schema.org

@ptsefton
Copy link

I am looking at using Defined term for classifying data files on a language project - so there would be sets of terms for classifying CreativeWorks in various dimensions. Is there a way to leverage the sets so you can can that a particular property, say linguisticDataType can have a value that comes from one of the sets?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests