schema.org's JSON-LD context file should enumerate all terms #990

Closed
danbri opened this Issue Feb 17, 2016 · 20 comments

Projects

None yet

5 participants

@danbri
Contributor
danbri commented Feb 17, 2016
  • Currently we rely on defaulting so that our JSON-LD file can be small.
  • The context doc is available by content negotiation from http://schema.org/ or directly ivia http://schema.org/docs/jsonldcontext.json
  • Doing this means that there can be interactions with other JSON-LD vocabularies e.g. extensions, when multiple extensions are in play.
    • Example 1: "@context": [ "http://schema.org/", "http://foo.example.org/" ]
    • Example 2: "@context": [ "http://foo.example.org/", "http://schema.org/" ]

In the current situation where we list only @id-typed and datatyped (e.g. DateTime) properties, other terms such as literal-valued properties, types, enumerated values are not explicitly "claimed" by schema.org. In the case of e.g. 2 above, that is OK. In the case of e.g. 1 above, a declaration of @vocab within foo.example.org's JSON-LD context file will "claim" all the non-explicit schema.org terms, so that Person will expand to http://foo.example.org/Person instead of http://schema.org/Person etc.

By making schema.org's context list everything, we allow instance data and external context authors to choose how to mix and superimpose it with other vocabulary.

<- this is my understanding of the situation at least, but am open to corrections. --@danbri

@danbri danbri self-assigned this Feb 17, 2016
@lanthaler
Collaborator

Correct. The contexts are evaluated in order. foo's @vocab declaration would override schema.org's in example 1 above which would lead to surprising results.

@danbri
Contributor
danbri commented Feb 17, 2016

So if we go ahead and implement #990 then we can tell publishers: "use pattern from example 1 if we want to overlay/overide foo's schemas on top of schema.org"; and to use example 2 pattern if they want to do the opposite and overlay schema.org on top of foo's schemas. The last named context in the array "gets the last word", which potentially affects two things: which vocabulary gets assigned any undeclared terms, and what to do if both contexts have declarations for the same term.

From a schema.org perspective we might prefer "schema.org" to be last in the list, but other options could still be made to work if the situation called for it. In either case we can protect publishers from having to memorize which terms are at which URL and have some potential for transparently evolving things e.g. if terms migrate into the core or into an extension over time.

/cc @vrandezo since Wikidata is one motivating usecase (#280), as is GS1's Web vocabulary (#258) at http://gs1.org/voc/

@lanthaler
Collaborator

So if we go ahead and implement #990 then we can tell publishers: "use pattern from example 1 if we want to overlay/overide foo's schemas on top of schema.org"; and to use example 2 pattern if they want to do the opposite and overlay schema.org on top of foo's schemas. The last named context in the array "gets the last word", which potentially affects two things: which vocabulary gets assigned any undeclared terms, and what to do if both contexts have declarations for the same term.

Exactly. The "which vocabulary gets assigned any undeclared terms" only applies if both context's use @vocab.

I agree that having schema.org last is probably better in most cases.

@mgh128
mgh128 commented Feb 17, 2016

I've just been looking into this and using the JSON-LD playground tool to check the resulting triples. The JSON-LD playground report an error when attempting to use an array of values as you suggest, e.g. "@vocab": ["http://gs1.org/voc/","http://schema.org/"] - it says that the value of @vocab in @context must be a string or null.

Looking at section 8.7 of the JSON-LD spec it says:
"If the context definition has an @vocab key, its value must be a absolute IRI, a compact IRI, a blank node identifier, a term, or null."
The JSON-LD spec does not say that arrays of IRIs etc. are permitted but does not explicitly prohibit them - yet the JSON-LD playground tool appears to prohibit arrays of IRIs as a value for @vocab.

Since Markus @lanthaler is one of the co-authors of the JSON-LD spec, I'm sure he can provide clarification on this point and advise whether the JSON-LD playground and other validation tools should be updated to accept arrays of IRIs for the value of @vocab.

@danbri
Contributor
danbri commented Feb 17, 2016

I believe (@lanthaler can correct!) that while @context can take an array, whereas there is only one @vocab current at any place in the JSON tree.

@mgh128
mgh128 commented Feb 17, 2016

Hi @danbri,

Looking at the actual examples of @context in https://www.w3.org/TR/json-ld/#context-definitions I don't see any examples where @context is taking an array in the way you suggested above. @context expects an object (in curly brackets, not square brackets) and within that, it's the @vocab term that indicates what the default vocabulary is - the vocabularies cannot just be stated without an @vocab key.

Even if we were using multiple @context blocks within an array, as in https://www.w3.org/TR/json-ld/#advanced-context-usage , the spec says that the most recently defined term wins.

So, if we can only have one @vocab per @context and it can only take a single IRI value, then even if we later specify an additional @context with one @vocab with a different IRI value, its value will win and the previous value of @vocab will be ignored.

Or am I missing something? If Markus @lanthaler or anyone else can provide a correct JSON-LD snippet that validates correctly in the JSON-LD playground tool and also allows us to indicate a prioritised list of default vocabularies as @danbri intended, that would be really helpful.

@lanthaler
Collaborator

"@vocab": ["http://gs1.org/voc/","http://schema.org/"]

It is not @vocab but @context. So a complete example would look somewhat like this:

{
  "@context: [
    "http://gs1.org/voc/",
    "http://schema.org/"
  ],
  "productName": ...
  ...
}

Assuming that GS1's context is being served from http://gs1.org/voc/.

So, if we can only have one @vocab per @context and it can only take a single IRI value, then even if we later specify an additional @context with one @vocab with a different IRI value, its value will win and the previous value of @vocab will be ignored.

Right. That's why it is important to explicitly define each term instead of relying on @vocab.

@mgh128
mgh128 commented Feb 17, 2016

Hi Markus @lanthaler

Thanks for joining the discussion. However, after fixing the typo (missing double-quote after "@context ), I can get this to validate in the JSON-LD playground:

{
"@context" : [ "http://schema.org" ],
"productName" : "Nexus 7"
}

but not this:

{
"@context" : [ "http://gs1.org/voc/", "http://schema.org" ],
"productName" : "Nexus 7"
}

This does not validate. If I've misunderstood what you meant, please could you correct / expand the example and also check that it validates at http://json-ld.org/playground/ ?

In the JSON-LD markup examples for the GS1 vocabulary we are explicitly defining each term - especially for every owl:ObjectProperty , anything expecting an rdf:langString, anything expecting a URI or an xsd datatype. The downside is that we have a fairly large @context block for the whole vocabulary.

@danbri
Contributor
danbri commented Feb 18, 2016

{
"@context" : [ "http://bib.schema.org/", "http://schema.org" ],
"productName" : "Nexus 7"
}

... this works (even if it does not make much sense). I think your "does not validate" must refer to an error in fetching a real context file from the URL. Are you seeing "jsonld.InvalidUrl - Dereferencing a URL did not result in a valid JSON-LD object..." or some different error?

@mgh128
mgh128 commented Feb 18, 2016

Thanks - it looks like I previously misunderstood that the notation above was actually retrieving remote contexts - and we don't currently have a remote context file on the gs1.org site.

I have now got the following snippets validating correctly:

{
"@context" : [ "http://schema.org","http://milecastle.media/pureGS1context.jsonld" ],
"productName" : "Nexus 7"
}

or

{
"@context" : [ "http://milecastle.media/pureGS1context.jsonld", http://schema.org" ],
"productName" : "Nexus 7"
}

In the first snippet, the draft GS1 @context is called last, so it overrides the schema.org definition of 'productName', resulting in the triple:

_:b0 http://gs1.org/voc/productName "Nexus 7" .

In the second snippet, the schema.org context is called last, so it overrides the GS1 definition of 'productName', resulting in

_:b0 http://schema.org/productName "Nexus 7" .

Obviously there is still a problem with http://schema.org/productName - that property is not defined.

We have tried to align with many schema.org terms where they are an exact semantic match (e.g. unitCode, Product, Offer, etc.) - but we avoided overloading terms such as 'name' and 'description', preferring instead to define distinct terms for productDescription, offerDescription etc. So it looks as though we'll still need to define two context files for the GS1 web vocabulary - one purely for GS1 terms and a mixed context file that maps 'productName' explicitly to 'schema:name' etc. for those terms where there is a mapping.
In the JSON-LD markup tool we're developing, we'll probably default to referring to the remote mixed schema.org/GS1 context file but will also give the options of referencing or embedding the mixed schema.org + GS1 or pure-GS1 context blocks.

@danbri
Contributor
danbri commented Feb 18, 2016

For fooName, fooDescription, barName, barDescription etc, would it make sense to declare them (in RDFS rather than JSON-LD context mechanism) as sub-properties?

@mgh128
mgh128 commented Feb 18, 2016

In the GS1 vocabulary itself, we define the following RDFS mappings to schema.org

gs1:afterHoursContact rdfs:subPropertyOf schema:contactPoint .
gs1:availableLanguage rdfs:subPropertyOf schema:availableLanguage .
gs1:brand rdfs:subPropertyOf schema:brand .
gs1:brandName rdfs:subPropertyOf schema:name .
gs1:brandOwner rdfs:subPropertyOf schema:brand .
gs1:contactPoint rdfs:subPropertyOf schema:contactPoint .
gs1:drainedWeight rdfs:subPropertyOf schema:weight .
gs1:eligibleQuantity rdfs:subPropertyOf schema:eligibleQuantity .
gs1:eligibleQuantityMaximum rdfs:subPropertyOf schema:eligibleQuantity .
gs1:eligibleQuantityMinimum rdfs:subPropertyOf schema:eligibleQuantity .
gs1:equivalentProduct rdfs:subPropertyOf schema:isSimilarTo .
gs1:fileLanguageCode rdfs:subPropertyOf schema:inLanguage .
gs1:filePixelHeight rdfs:subPropertyOf schema:height .
gs1:filePixelWidth rdfs:subPropertyOf schema:width .
gs1:grossWeight rdfs:subPropertyOf schema:weight .
gs1:netWeight rdfs:subPropertyOf schema:weight .
gs1:offerDescription rdfs:subPropertyOf schema:description .
gs1:organizationName rdfs:subPropertyOf schema:name .
gs1:primaryAlternateProduct rdfs:subPropertyOf schema:isSimilarTo .
gs1:productDescription rdfs:subPropertyOf schema:description .
gs1:productName rdfs:subPropertyOf schema:name .
gs1:referencedFileEffectiveEndDateTime rdfs:subPropertyOf schema:expires .
gs1:referencedFileSize rdfs:subPropertyOf schema:contentSize .
gs1:referencedFileURL rdfs:subPropertyOf schema:url .
gs1:replacedByProduct rdfs:subPropertyOf schema:isSimilarTo .
gs1:replacedProduct rdfs:subPropertyOf schema:isSimilarTo .
gs1:subBrandName rdfs:subPropertyOf schema:name .
gs1:warranty rdfs:subPropertyOf schema:warranty .
gs1:warrantyScopeDescription rdfs:subPropertyOf schema:description .

We also define the following subclass relationships relevant to schema.org:

gs1:Beverage rdfs:subClassOf gs1:FoodBeverageTobaccoProduct .
gs1:Clothing rdfs:subClassOf gs1:WearableProduct .
gs1:FoodBeverageTobaccoProduct rdfs:subClassOf gs1:Product .
gs1:Footwear rdfs:subClassOf gs1:WearableProduct .
gs1:FruitsVegetables rdfs:subClassOf gs1:FoodBeverageTobaccoProduct .
gs1:MeatPoultry rdfs:subClassOf gs1:FoodBeverageTobaccoProduct .
gs1:MilkButterCreamYogurtCheeseEggsSubstitutes rdfs:subClassOf gs1:FoodBeverageTobaccoProduct .
gs1:NutritionMeasurementType rdfs:subClassOf gs1:QuantitativeValue .
gs1:ReferencedFileDetails rdfs:subClassOf schema:MediaObject .
gs1:Seafood rdfs:subClassOf gs1:FoodBeverageTobaccoProduct .
gs1:WearableProduct rdfs:subClassOf gs1:Product .

However, our RDFS statements are currently missing:

gs1:Offer rdfs:subClassOf schema:Offer
gs1:Organization rdfs:subClassOf schema:Organization
gs1:Product rdfs:subClassOf schema:Product

because so far we've only stated a skos:exactMatch relationship for these.
This means that the rdfs:subClassOf mappings we have defined so far don't yet help to map to related classes in schema.org - though we could easily add these three statements above if that helps - and if it is reasonable to assume that validation tools such as the Google Structured Data Testing Tool are fully aware of RDFS (including predicate paths such as rdfs:subClassOf+ ) and vocabularies outside schema.org. If that were the case and we added these three missing relationships above, we could write that

http://example.org/abc rdf:type gs1:FruitsVegetables and any schema.org validation tools would understand that because
gs1:FruitsVegetables rdfs:subClassOf gs1:FoodBeverageTobaccoProduct
gs1:FoodBeverageTobaccoProduct rdfs:subClassOf gs1:Product
gs1:Product rdfs:subClassOf schema:Product

then http://example.org/abc rdf:type schema:Product

For other properties such as gs1:width , gs1:depth and gs1:height , the mapping to schema.org terms is more complicated because the rdfs:domain (or sometimes the rdfs:range) of these is different -
e.g. a gs1:width has an rdfs:domain of gs1:Dimension instead of a gs1:Product or schema:Product, so that we can use gs1:inPackageDimensions and gs1:outOfPackageDimensions to separately distinguish between packaged dimensions and out-of-package dimensions.
There are probably also a few other GS1 properties where the domain or range are different from similarly named schema.org properties for good reasons.

@lanthaler
Collaborator

{
"@context" : [ "http://milecastle.media/pureGS1context.jsonld", http://schema.org" ],
"productName" : "Nexus 7"
}

In the second snippet [above], the schema.org context is called last, so it overrides the GS1 definition of 'productName', resulting in

_:b0 http://schema.org/productName "Nexus 7" .

Obviously there is still a problem with http://schema.org/productName - that property is not defined.

@mgh128 that only happens because http://milecastle.media/pureGS1context.jsonld doesn't define productName. It only defines productName_en.

@ekgs1
ekgs1 commented Feb 18, 2016

Perhaps it is best in the GS1 Vocabulary that we remove the Dimensions class and create gs1:inPackageWidth, gs1:outOfPackageWidth etc . Best not to have a gs1:width and schema:width with different meanings.

@danbri danbri assigned RichardWallis and unassigned danbri Apr 11, 2016
@danbri danbri added this to the sdo-deimos release milestone Apr 11, 2016
@danbri
Contributor
danbri commented Apr 11, 2016

Thinking out loud:

I was assuming that we need only define properties exhaustively, because that's the only piece of JSON-LD syntax that can be ambiguous. But now I'm thinking it must be all terms because otherwise Person etc might end up in the wrong namespace too.

@mgh128
mgh128 commented Apr 11, 2016

We're in the process of overhauling our JSON-LD markup tool for GS1 SmartSearch so that all datatype coercion (for anything other than xsd:string) and all @language tagging is done within the main body of the JSON-LD block, next to the value, in order to keep the @context file fairly minimal and make it more scalable, since many of our properties have an rdf:langString as their range and there are countries (e.g. Canada, Belgium, Switzerland) with multiple official languages and therefore multi-lingual values for various product details such as lists of ingredients etc.

In the GS1 SmartSearch vocabulary, we have mostly tried to align with schema.org nomenclature except that we have not overloaded properties such as 'name', 'description' but instead define separate specific properties such as gs1:productDescription, gs1:offerDescription etc., although the mappings to schema.org equivalent properties are indicated in the GS1 SmartSearch vocabulary at http://gs1.org/voc , where we define mostly rdfs:subPropertyOf, rdfs:subClassOf relationships to related terms in schema.org

We have been discussing with you the capability to use schema.org terms in combination with terms from GS1 SmartSearch vocabulary for more specific details not currently covered in schema.org.
We could handle this situation with a more extended @context block that explicitly maps the related terms to the schema.org equivalent. So for example, the extended @context block would explicitly map
productDescription to http://schema.org/description and effectively override the @vocab "http://gs1.org/voc/" that would otherwise expand productDescription to http://gs1.org/voc/productDescription

I think you're noting that we have some subclasses of schema:Product such as gs1:WearableProduct, gs1:Footwear, gs1:Clothing, gs1:FoodBeverageTobaccoProduct, gs1:Seafood etc.
In those situations, a particular product might be declared to be of rdf:type gs1:Seafood
In the mixed-mode markup ('schema.org + GS1 extras' as I think you (@danbri) called it), we'd probably just explicitly state that the product is simultaneously of rdf:type ["gs1:Seafood","schema:Product"]

As far as I recall, apart from schema:Product the only other subclass we define is a gs1:ReferencedFileDetails as a subclass of schema:MediaObject

If it helps, I can provide a markup example of mixed vocabulary use (schema.org + GS1 SmartSearch) and the extended @context block and show how this compares against a markup example for only GS1 SmartSearch vocabulary use.

@lanthaler
Collaborator

I was assuming that we need only define properties exhaustively, because that's the only piece of JSON-LD syntax that can be ambiguous. But now I'm thinking it must be all terms because otherwise Person etc might end up in the wrong namespace too.

That's right. You would also need to explicitly define classes etc. so that they aren't moved to a different namespace by another context's @vocab definition.

@RichardWallis RichardWallis added a commit that referenced this issue Apr 12, 2016
@RichardWallis RichardWallis Context file now includes all Types, Properties, Enumerations, and Da…
…taTypes, from core and all extensions.

Addresses issue (#990)
720c323
@danbri
Contributor
danbri commented Apr 12, 2016

I've merged @RichardWallis 's work, here's a staged version to review:

http://webschemas.org/docs/jsonldcontext.json

@lanthaler
Collaborator

The context looks good to me.

@RichardWallis
Contributor

Fixed in V3.0

@stuartasutton stuartasutton referenced this issue in CredentialEngine/vocabularies Nov 23, 2016
Closed

Referencing the JSON context file (and other schema files) #234

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment