Skip to content
This repository

Add '@language' container type #133

Closed
lanthaler opened this Issue June 07, 2012 · 70 comments

6 participants

Markus Lanthaler Manu Sporny Gregg Kellogg Dave Longley Niklas Lindström Lin Clark
Markus Lanthaler
Collaborator

The idea of this issue is to set @container of a property to @language to allow L10n of JSON property values as shown in the following example:

{
  "@context": {
    "label": {
      "@id": "http://example.com/label",
      "@container": "@language"
    }
  },
  "@id": "http://buckingham.uk/queenie",
  "label": {
    "en": "The Queen",
    "de": "Die Koenigin"
  }
}

When expanded, this should result in:

[
  {
    "@id": "http://buckingham.uk/queenie",
    "http://example.com/label": [
      { "@value": "The Queen", "@language": "en" },
      { "@value": "Die Königin", "@language": "de" }
    ]
  }
]

Compaction might be a bit trickier if there are other properties that are not language tagged for the same property. They either have to stay under the full IRI in that case or contain at least one keyword to be distinguishable from language maps, something like:

{
  "@context": {
    "label": {
      "@id": "http://example.com/label",
      "@container": "@language"
    }
  },
  "@id": "http://buckingham.uk/queenie",
  "label": [
    {
      "en": "The Queen",
      "de": "Die Koenigin"
    },
    "No language",
    5,
    true,
    {
      "@id": "_:b1",   <-- a keyword MUST be present to distinguish an object from a language map
      "prop": value"
    }
  ]
}

Something similar was discussed before under the term "language map" (#29) and came up again in a discussion Gregg had with @vrandezo. There has also been some discussion on the mailing list:


Gregg originally proposed to use something he called "folding" for this and #134:

{
  "@context": {
    "en": {"@id": null", "@language": "en", "@fold": true},
    "de": {"@id": null", "@language": "de", "@fold": true},
    "queenie": {"@id": null", "@fold": true}
  },
  "queenie": {
    "@id": "http://buckingham.uk/queenie",
    "label": {
      "en": { "@value": "The Queen" },
      "de": { "@value": "Die Königin"}
    }
  }
}
Gregg Kellogg
Owner

RESOLVED: Attempt to add other @container options, such as "@container": "@language" to support Wikidata's language-map use case.

Dave Longley
Owner

I think the general idea can work. I'm against the compaction example that Markus provided, however, because I think it's confusing. It's particularly confusing because I see @container as currently defining how to interpret the JSON value associated with the term.

For example, {"foo": {"@container": "@set"}} currently means to me: the JSON value for the key "foo" is an unordered array. It follows then that {"foo": {"@container": "@language"}} means: the JSON value for the key "foo" is a language map, that is, a JSON object where the keys are language identifiers and the values are language strings. I'm fine with that. However, in the compaction example, the JSON value for "foo" is instead an array, which is actually a @set (presumably), that presumably somewhere contains a language map but also other values that aren't maps at all.

I'd much prefer us to take a different route for handling the compaction case. I might be ok with the JSON value of "foo" being an array, but only so long as each JSON value within that array is a language map, nothing else. This would permit multiple language maps (is there actually a use case for this?). However, IMO, there should definitely be nothing other than language maps for the term "foo". If there exists a value for "foo" that cannot be unambiguously converted to a language map, then it should not be used with that term -- which means selecting the absolute IRI if there are no other applicable terms available.

I think this same compaction behavior should apply to @id and @type maps, etc. (See: #134).

Gregg Kellogg
Owner

So, this is not going in a direction that I was originally proposing, based on the WikiData use-cases. Basically, there are two ways in which you might consider using language maps:

  1. As a value-map, where a property contains an object who's keys are language elements, who's values in turn are language-tagged strings.
  2. As property-map, where some or all of the keys of a subject definition are language elements, who's values are objects containing properties of the original subject definition.

Consider a typical internationalization use case, where you have a resource with values expressed in multiple language; for example, an abbreviated example from DBPedia:

@prefix dbpedia:    <http://dbpedia.org/resource/> .
@prefix owl:    <http://www.w3.org/2002/07/owl#> .
@prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
@prefix yago:   <http://dbpedia.org/class/yago/> .

dbpedia:Linked_Data rdf:type    yago:Buzzwords ;
owl:sameAs  <http://rdf.freebase.com/ns/m/02r2kb1>, dbpedia:Linked_Data .
dbpedia:Linked_Data rdfs:comment
  "La propuesta de datos vinculados (linked data) surge dentro de marco general de la Web sem\u00E1ntica. El t\u00E9rmino \"datos vinculados\" hace referencia al m\u00E9todo con el que se pueden mostrar, intercambiar y conectar datos a trav\u00E9s de URIs desreferenciables en la Web."@es ,
    "Linked Open Data (LOD) bezeichnet im World Wide Web frei verf\u00FCgbare Daten, die per Uniform Resource Identifier (URI) identifiziert sind und dar\u00FCber direkt per HTTP abgerufen werden k\u00F6nnen und ebenfalls per URI auf andere Daten verweisen. Idealerweise werden zur Kodierung und Verlinkung der Daten das Resource Description Framework (RDF) und darauf aufbauende Standards wie SPARQL und die Web Ontology Language (OWL) verwendet, so dass Linked Open Data gleichzeitig Teil des Semantic Web ist."@de ,
    "I dati collegati (linked data in inglese) sono un aspetto del web semantico. Il termine dati collegati \u00E8 usato per descrivere un metodo di esporre, condividere e connettere dati attraverso URI deferenziabili."@it ,
    "In computing, linked Data describes a method of publishing structured data so that it can be interlinked and become more useful. It builds upon standard Web technologies such as HTTP and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried."@en ,
    "Le Web des donn\u00E9es (Linked Data, en Anglais) est une initiative du W3C(Consortium World Wide Web) visant \u00E0 favoriser la publication de donn\u00E9es structur\u00E9es sur le Web, non pas sous la forme de silos de donn\u00E9es isol\u00E9s les uns des autres, mais en les reliant entre elles pour constituer un r\u00E9seau global d'informations."@fr ,
    "\u9375\u9023\u8CC7\u6599\u662F\u6B63\u5728\u5FEB\u901F\u767C\u5C55\u7684\u8A9E\u7FA9\u7DB2\u7684\u4E00\u7CFB\u5217\u7684\u6D3B\u52D5\uFF0C\u5B83\u63CF\u8FF0\u4E86\u4E00\u5957\u5728\u5168\u7403\u8CC7\u8A0A\u7DB2\u4E0A\u767C\u4F48\u3001\u5206\u4EAB\u3001\u53CA\u9023\u7D50\u8CC7\u6599\u7684\u65B9\u6CD5\u3002\u4E3B\u8981\u4EE5\u53EF\u53C3\u7167\u7684URI\u4F5C\u70BA\u6700\u57FA\u672C\u7684\u8981\u7D20\u3001\u4EE5RDF\u4F5C\u70BA\u63CF\u8FF0\u9023\u7D50\u7684\u8A9E\u8A00\u3002"@zh ;
rdfs:label
  "Linked Open Data"@de ,
    "Datos vinculados"@es ,
    "\u9375\u9023\u8CC7\u6599"@zh ,
    "Linked Data"@en ,
    "Dati collegati"@it ,
    "Web des donn\u00E9es"@fr ;

Using the value-map syntax, this could be represented as follows:

{
  "@context": {
    ...
    "rdfs:comment": {"@container": "@language"}
    "rdfs:label": {"@container": "@language"}
  },
  "@id": "dbpedia:Linked_Data",
  "owl:sameAs": ["http://rdf.freebase.com/ns/m/02r2kb1", "dbpedia:Linked_Data"],
  "rdfs:comment": {
    "es": "La propuesta de datos vinculados (linked data) surge dentro de marco general de la Web sem\u00E1ntica. El t\u00E9rmino \"datos vinculados\" hace referencia al m\u00E9todo con el que se pueden mostrar, intercambiar y conectar datos a trav\u00E9s de URIs desreferenciables en la Web.",
    "de": "Linked Open Data (LOD) bezeichnet im World Wide Web frei verf\u00FCgbare Daten, die per Uniform Resource Identifier (URI) identifiziert sind und dar\u00FCber direkt per HTTP abgerufen werden k\u00F6nnen und ebenfalls per URI auf andere Daten verweisen. Idealerweise werden zur Kodierung und Verlinkung der Daten das Resource Description Framework (RDF) und darauf aufbauende Standards wie SPARQL und die Web Ontology Language (OWL) verwendet, so dass Linked Open Data gleichzeitig Teil des Semantic Web ist.",
    "it": "I dati collegati (linked data in inglese) sono un aspetto del web semantico. Il termine dati collegati \u00E8 usato per descrivere un metodo di esporre, condividere e connettere dati attraverso URI deferenziabili.",
    "en": "In computing, linked Data describes a method of publishing structured data so that it can be interlinked and become more useful. It builds upon standard Web technologies such as HTTP and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried.",
    "fr": "Le Web des donn\u00E9es (Linked Data, en Anglais) est une initiative du W3C(Consortium World Wide Web) visant \u00E0 favoriser la publication de donn\u00E9es structur\u00E9es sur le Web, non pas sous la forme de silos de donn\u00E9es isol\u00E9s les uns des autres, mais en les reliant entre elles pour constituer un r\u00E9seau global d'informations.",
    "zh": "\u9375\u9023\u8CC7\u6599\u662F\u6B63\u5728\u5FEB\u901F\u767C\u5C55\u7684\u8A9E\u7FA9\u7DB2\u7684\u4E00\u7CFB\u5217\u7684\u6D3B\u52D5\uFF0C\u5B83\u63CF\u8FF0\u4E86\u4E00\u5957\u5728\u5168\u7403\u8CC7\u8A0A\u7DB2\u4E0A\u767C\u4F48\u3001\u5206\u4EAB\u3001\u53CA\u9023\u7D50\u8CC7\u6599\u7684\u65B9\u6CD5\u3002\u4E3B\u8981\u4EE5\u53EF\u53C3\u7167\u7684URI\u4F5C\u70BA\u6700\u57FA\u672C\u7684\u8981\u7D20\u3001\u4EE5RDF\u4F5C\u70BA\u63CF\u8FF0\u9023\u7D50\u7684\u8A9E\u8A00\u3002";
  },
  "rdfs:label": {
    "es": "Datos vinculados",
    "de": "Linked Open Data",
    "it": "Dati collegati",
    "en": "Linked Data",
    "fr": "Web des donn\u00E9es",
    "zh": "\u9375\u9023\u8CC7\u6599";
  }
}

The problem here, is that indexing of values always requires d-referencing through the property before looking for a language variant. If a developer is looking for all resources that have descriptions in some language, this requires deeper navigation.

If, however the property-map version is used, all values sharing a common language are nicely contained together. This might be represented as follows:

{
  "@context": {
    ...
    "es": {"@container": "@language"},
    "de": {"@container": "@language"},
    ...
  },
  "@id": "dbpedia:Linked_Data",
  "owl:sameAs": ["http://rdf.freebase.com/ns/m/02r2kb1", "dbpedia:Linked_Data"],
  "es": {
    "rdfs:comment": "La propuesta de datos vinculados (linked data) surge dentro de marco general de la Web sem\u00E1ntica. El t\u00E9rmino \"datos vinculados\" hace referencia al m\u00E9todo con el que se pueden mostrar, intercambiar y conectar datos a trav\u00E9s de URIs desreferenciables en la Web.",
    "rdfs:label": "Datos vinculados"
  },
  "de": {
    "rdfs:comment": "Linked Open Data (LOD) bezeichnet im World Wide Web frei verf\u00FCgbare Daten, die per Uniform Resource Identifier (URI) identifiziert sind und dar\u00FCber direkt per HTTP abgerufen werden k\u00F6nnen und ebenfalls per URI auf andere Daten verweisen. Idealerweise werden zur Kodierung und Verlinkung der Daten das Resource Description Framework (RDF) und darauf aufbauende Standards wie SPARQL und die Web Ontology Language (OWL) verwendet, so dass Linked Open Data gleichzeitig Teil des Semantic Web ist.",
    "rdfs:label": "Linked Open Data"
  },
  "it": {
    "rdfs:comment": "La propuesta de datos vinculados (linked data) surge dentro de marco general de la Web sem\u00E1ntica. El t\u00E9rmino \"datos vinculados\" hace referencia al m\u00E9todo con el que se pueden mostrar, intercambiar y conectar datos a trav\u00E9s de URIs desreferenciables en la Web.",
    "rdfs:comment": "I dati collegati (linked data in inglese) sono un aspetto del web semantico. Il termine dati collegati \u00E8 usato per descrivere un metodo di esporre, condividere e connettere dati attraverso URI deferenziabili.",
    "rdfs:label": "Dati collegati"
  },
  "en": {
    "rdfs:comment": "In computing, linked Data describes a method of publishing structured data so that it can be interlinked and become more useful. It builds upon standard Web technologies such as HTTP and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried.",
    "rdfs:label": "Linked Data"
  },
  "fr": {
    "rdfs:comment": "Le Web des donn\u00E9es (Linked Data, en Anglais) est une initiative du W3C(Consortium World Wide Web) visant \u00E0 favoriser la publication de donn\u00E9es structur\u00E9es sur le Web, non pas sous la forme de silos de donn\u00E9es isol\u00E9s les uns des autres, mais en les reliant entre elles pour constituer un r\u00E9seau global d'informations.",
    "rdfs:label": "Web des donn\u00E9es"
  },
  "zh": {
    "rdfs:comment": "\u9375\u9023\u8CC7\u6599\u662F\u6B63\u5728\u5FEB\u901F\u767C\u5C55\u7684\u8A9E\u7FA9\u7DB2\u7684\u4E00\u7CFB\u5217\u7684\u6D3B\u52D5\uFF0C\u5B83\u63CF\u8FF0\u4E86\u4E00\u5957\u5728\u5168\u7403\u8CC7\u8A0A\u7DB2\u4E0A\u767C\u4F48\u3001\u5206\u4EAB\u3001\u53CA\u9023\u7D50\u8CC7\u6599\u7684\u65B9\u6CD5\u3002\u4E3B\u8981\u4EE5\u53EF\u53C3\u7167\u7684URI\u4F5C\u70BA\u6700\u57FA\u672C\u7684\u8981\u7D20\u3001\u4EE5RDF\u4F5C\u70BA\u63CF\u8FF0\u9023\u7D50\u7684\u8A9E\u8A00\u3002";
    "rdfs:label": "\u9375\u9023\u8CC7\u6599"
  }
}

This mapping chunks properties together sharing a common language, and makes it easier to see all relevant information in the same place, and do a common query for a language (object.en) to find all keys appropriate for that language.

It's possible that we could include both representations. Consider a possible change to Expansion and Value Expansion Algorithms:

For value-map; in Expansion:

Before 2.2:

  • If active property is has @container: @language, and every key in element is of the form language (from BCP47) and does not map to an absolute IRI, the return value is an array constructed from the result of performing Value Expansion on each value using a copy of context with @language set to each key from element in turn.

For property-map; in Expansion:

Before 2.2.2:

  • If property does not expand to a keyword or absolute IRI and property has @container: @language, value MUST be a JSON object.
    • Process the object using a copy of context with @language set to property using the existing active subject and active property.
    • For each key in the resulting expanded object, either merge value into an existing property property of element, or create a new property property with value as value.

We may want to use different @container values to distinguish the use-cases, but the algorithm can handle each case as is without anything further.

Markus Lanthaler
Collaborator

Now I see where you are coming from Gregg but I'm still not sure I agree with the proposal. If you model your data in that way isn't that basically the same as having multiple documents - one per language? And if so, wouldn't that be simply solved by using an array at the document's top-level (or @graph) and setting a new default language in each "sub-document". Something like

{
  "@context": { .. shared terms .. }
  "@graph": [
   {
     "@context": { "@language": "en" },
     ... English "sub-document" ...
   },
   {
     "@context": { "@language": "de" }
      ... German "sub-document" ...
  }
 ]
}

The access is not as convenient as in your proposed solution but I think it would adress the same use cases -- even though I doubt you agree :-) -- and wouldn't require any further changes to JSON-LD.

I agree with what Dave said regarding compaction of language maps. Values that can't be brought to language map form should stay under their absolute IRI (or go under another term).

Markus Lanthaler
Collaborator

RESOLVED: Support language-maps via the "@container": "@language" pattern in @context.

Gregg Kellogg
Owner

Based on today's call, the use of value-map proposal where the value can be a node, as well as a string, makes sense to me. In this case, the node would be anonymous, and use the skos-xl pattern to designate the primary value, but this would also allow other properties to be asserted on the value (such as pronunciation). Extending the example from above, this might be represented as follows:

{
  "@context": {
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "ipa": "http://dbpedia.org/resource/International_Phonetic_Alphabet",
    "skosxl": "http://www.w3.org/2008/05/skos-xl#",
    "label": {"@id": "rdfs:label", "@container": "@language"}
  },
  "@id": "http://dbpedia.org/resource/Queen_Elizabeth",
  "label": {
    "en": {
      "@type": "skosxl:Label",
      "skosxl:literalForm": "Queen Elizabeth",
      "ipa": "kwiːn ʔeˈliːzabɛt əˈlɪzəbəθ"
    },
    "de": {
      "@type": "skosxl:Label",
      "skosxl:literalForm": "Königin Elisabeth",
      "ipa": "ˈkøːnɪɡɪn ʔeˈliːzabɛt"
    }
  }
}

If the @container: @language means to apply the property as a language within a context, this could expand to the following:

[{
  "@id": "http://dbpedia.org/resource/Queen_Elizabeth",
  "http://www.w3.org/2000/01/rdf-schema#label": [
    {
      "@type": "http://www.w3.org/2008/05/skos-xl#Label",
      "http://www.w3.org/2008/05/skos-xl#literalForm": [{
        "@value": "Queen Elizabeth",
        "@language": "en"
      }],
      "http://dbpedia.org/resource/International_Phonetic_Alphabet": [{
        "@value": "kwiːn ʔeˈliːzabɛt əˈlɪzəbəθ",
        "@language": "en"
      }]
    },
    {
      "@type": "http://www.w3.org/2008/05/skos-xl#Label",
      "http://www.w3.org/2008/05/skos-xl#literalForm": [{
        "@value": "Königin Elisabeth",
        "@language": "de"
      }],
      "http://dbpedia.org/resource/International_Phonetic_Alphabet": [{
        "@value": "ˈkøːnɪɡɪn ʔeˈliːzabɛt",
        "@language": "de"
      }]
    }
  ]
}]

In Turtle, this might look like the following:

<http://dbpedia.org/resource/Queen_Elizabeth> rdfs:label
  [ a skosxl:Label;
    skosxl:literalForm "Queen Elizabeth"@en;
    ipa: "kwiːn ʔeˈliːzabɛt əˈlɪzəbəθ"@en ],
  [ a skosxl:Label;
    skosxl:literalForm "Königin Elisabeth"@de;
    ipa: "ˈkøːnɪɡɪn ʔeˈliːzabɛt"@de ],
Markus Lanthaler
Collaborator

Should this

[{
  "@id": "http://dbpedia.org/resource/Queen_Elizabeth",
  "http://www.w3.org/2000/01/rdf-schema#label": [
    {
      "@type": "http://www.w3.org/2008/05/skos-xl#Label",
      "http://www.w3.org/2008/05/skos-xl#literalForm": [{
        "@value": "Queen Elizabeth",
        "@language": "en"
      }],
      "http://www.example.com": [ { "@value": "not language tagged" } ],
      "http://dbpedia.org/resource/International_Phonetic_Alphabet": [{
        "@value": "kwiːn ʔeˈliːzabɛt əˈlɪzəbəθ",
        "@language": "en"
      }]
    },
    {
      "@type": "http://www.w3.org/2008/05/skos-xl#Label",
      "http://www.w3.org/2008/05/skos-xl#literalForm": [{
        "@value": "Königin Elisabeth",
        "@language": "de"
      }],
      "http://www.example.com": [ { "@value": "not language tagged" } ],
      "http://dbpedia.org/resource/International_Phonetic_Alphabet": [{
        "@value": "ˈkøːnɪɡɪn ʔeˈliːzabɛt",
        "@language": "de"
      }]
    }
  ]
}]

compact to:

{
  "@context": {
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "ipa": "http://dbpedia.org/resource/International_Phonetic_Alphabet",
    "skosxl": "http://www.w3.org/2008/05/skos-xl#",
    "ex": "http://www.example.com",
    "label": {"@id": "rdfs:label", "@container": "@language"}
  },
  "@id": "http://dbpedia.org/resource/Queen_Elizabeth",
  "label": {
    "en": {
      "@type": "skosxl:Label",
      "skosxl:literalForm": "Queen Elizabeth",
      "ex": [ { "@value": "not language tagged" } ],
      "ipa": "kwiːn ʔeˈliːzabɛt əˈlɪzəbəθ"
    },
    "de": {
      "@type": "skosxl:Label",
      "skosxl:literalForm": "Königin Elisabeth",
      "ex": [ { "@value": "not language tagged" } ],
      "ipa": "ˈkøːnɪɡɪn ʔeˈliːzabɛt"
    }
  }
}

or to:

{
  "@context": {
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "ipa": "http://dbpedia.org/resource/International_Phonetic_Alphabet",
    "skosxl": "http://www.w3.org/2008/05/skos-xl#",
    "ex": "http://www.example.com",
    "label": {"@id": "rdfs:label", "@container": "@language"}
  },
  "@id": "http://dbpedia.org/resource/Queen_Elizabeth",
  "rdfs:label": {
      "@type": "skosxl:Label",
      "ex": "not language tagged"
  },
  "label": {
    "en": {
      "skosxl:literalForm": "Queen Elizabeth",
      "ipa": "kwiːn ʔeˈliːzabɛt əˈlɪzəbəθ"
    },
    "de": {
      "skosxl:literalForm": "Königin Elisabeth",
      "ipa": "ˈkøːnɪɡɪn ʔeˈliːzabɛt"
    }
  }
}
Markus Lanthaler
Collaborator

July 10th, 2012 telecon:

[16:36] markus: What should we do about @value's that are not language-tagged?
[16:36] gkellogg: I think it should remain in expanded form.
[16:37] gkellogg: The way that I was proposing it was that the result is to set the language specified in the key as the default language in the context.
[16:37] gkellogg: The other way to do it would be to override the language definition of 'ex' to say that the language is null...
[16:38] niklasl: It's hard to know what X means here.
[16:38] gkellogg: We need to be careful here about how to set xsd:string - it's an RDF 1.1 model issue, so a back-end should implement it this way, though. A plain literal gets the datatype of xsd:string.
[16:39] gkellogg: From RDF Concepts: "A language-tagged string is any literal whose datatype IRI is equal to http://www.w3.org/1999/02/22-rdf-syntax-ns#langString."

Markus Lanthaler
Collaborator

... and what about when a property with a language container has a value similar to this one:

    {
      "@type": "http://www.w3.org/2008/05/skos-xl#Label",
      "http://www.example.com/deutsch": [ { "@value": "deutsch", "@language": "de" } ],
      "http://www.example.com/english": [ { "@value": "english", "@language": "en" } ],
      "http://www.example.com/italiano": [ { "@value": "italiano", "@language": "it" } ]
   }

What language should be used for the language map? The first one? That might be non-deterministic (properties are not guaranted to be processed in order) if we don't sort them ourselves first.

Niklas Lindström
Collaborator

For the simple case of creating a language map of various literals it is quite usable. But deep application of @language might make compaction very (too?) complex.

Still, for the given example, we also discussed another variant of expression. It could be more appropriate in this case to use dc:language to describe the bnode itself as being in/about a specific language. That could warrant a generic extension of @context to take any property references as values (as well as the special @set, @list or @language (and possibly @id or @type)). A property using such a construct would then provide mappings for specific property values within the referenced objects.

The corresponding example would be:

{
  "@context": {
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "dc": "http://purl.org/dc/terms/",
    "skosxl": "http://www.w3.org/2008/05/skos-xl#",
    "ipa": "http://dbpedia.org/resource/International_Phonetic_Alphabet",
    "labelByLang": {"@id": "skosxl:prefLabel", "@container": "dc:language"}
  },
  "@id": "http://dbpedia.org/resource/Queen_Elizabeth",
  "labelByLang": {
    "en": {
      "@type": "skosxl:Label",
      "skosxl:literalForm": "Queen Elizabeth",
      "ipa": "kwiːn ʔeˈliːzabɛt əˈlɪzəbəθ"
    },
    "de": {
      "@type": "skosxl:Label",
      "skosxl:literalForm": "Königin Elisabeth",
      "ipa": "ˈkøːnɪɡɪn ʔeˈliːzabɛt"
    }
  }
}

Yielding:

[{
  "@id": "http://dbpedia.org/resource/Queen_Elizabeth",
  "http://www.w3.org/2008/05/skos-xl#prefLabel": [
    {
      "http://purl.org/dc/terms/language": ["en"],
      "@type": "http://www.w3.org/2008/05/skos-xl#Label",
      "http://www.w3.org/2008/05/skos-xl#literalForm": ["Queen Elizabeth"],
      "http://dbpedia.org/resource/International_Phonetic_Alphabet": ["kwiːn ʔeˈliːzabɛt əˈlɪzəbəθ"]
    },
    {
      "http://purl.org/dc/terms/language": ["de"],
      "@type": "http://www.w3.org/2008/05/skos-xl#Label",
      "http://www.w3.org/2008/05/skos-xl#literalForm": ["Königin Elisabeth"],
      "http://dbpedia.org/resource/International_Phonetic_Alphabet": ["ˈkøːnɪɡɪn ʔeˈliːzabɛt"]
    }
  ]
}]

That is, labelByLang in this context would match anything related to via skosxl:prefLabel which itself has a property dc:language. The @container mechanism would create an object map with keys for each value of that property per related resource. It would not match a resource lacking that property or having more than one value for it.

(.. Nor a non-string value? Or we could specify that at least for this generic variant the actual mapped property and value must also be explicitly repeated in the object? It's certainly something to iron out.)

Lin Clark

I'm pretty sure that niklasl's proposal above would work for Drupal's multilingual field use case. This would be helpful, since it would mean we wouldn't have to deal with the complexity of named graphs.

Markus Lanthaler
Collaborator

RESOLVED: Add support for language maps via the "@container": "@language" annotation in @context. For example: "tags": { "@id": "http://example.com/vocab#tags", "@container": "@language"}. The child property of the term MUST be an associative array. All associative array keys MUST be BCP47 language strings.

Manu Sporny msporny closed this August 18, 2012
Gregg Kellogg
Owner

Manu, the text you added only allows strings or arrays of strings as values of language tags. As I recall, we were considering more things, such as skos-xl representations for those values. This was certainly part of Denny's use case: the ability to have other values hanging off of the language tag.

Was this an oversight, or am I mis-remembering?

Manu Sporny msporny reopened this August 19, 2012
Manu Sporny
Owner

It was an oversight, you're remembering correctly. I forgot about that part of it. The language-map approach was intended to be a short-hand for setting the @language in the @context. So, I think we can allow any value that is allowed in a regular value position, IIRC. I'll update the spec and try to think through all of the potential values.

Manu Sporny msporny closed this August 19, 2012
Markus Lanthaler lanthaler reopened this August 20, 2012
Markus Lanthaler
Collaborator

Sorry, I also have to reopen this issue until the API algorithms have been updated and we decided how this works when compacting. We also need to check whether this really fixes @linclark's problem - which I still doubt.

Markus Lanthaler
Collaborator

RESOLVED: The group is committed to support language maps and property generators in JSON-LD 1.0.

Markus Lanthaler
Collaborator

Issue #159 deals with how round-tripping of language maps ( required by the Drupal community) could be supported.

Markus Lanthaler lanthaler referenced this issue from a commit September 19, 2012
Markus Lanthaler Add language-map expansion test
This test assumes that the language is injected as the default language into the active context. Depending on the outcome of issue #159 this test might need to be updated.

This addresses #133.
4fc0e14
Gregg Kellogg
Owner

Members of the RDF working group have expressed some concern about JSON-LD diverging from the RDF data model, and our proposed solution in Issue #159 specifically adds syntactic information that is not based on the RDF data model. Other than that, I think most everything else actually is. The issue relates to round-tripping properties with @container: @language from compact form to expanded form and back. Consider the following node definition:

{
 "@context": {
   "label": {"@id": "http://example.com/label", "@container": "@language"}
 },
 "@id": "http://buckingham.uk/queenie",
 "label": {
   "en": ["The Queen", {"@id": "http://example.com/the_queen"}],
   "de": ["Die Königin", {"@id": "http://example.de/die_königin"}]
 }
}

With the current proposal, this would expand as follows:

[
 {
   "@id": "http://buckingham.uk/queenie",
   "http://example.com/label": [
     { "@value": "The Queen", "@language": "en" },
     { "@value": "Die Königin", "@language": "de" },
     {"@id": "http://example.com/the_queen", "@language": "en"},
     {"@id": "http://example.de/de_königin", "@language": "de"}
   ]
 }
]

The problem is that the @languge added to the node definitions does not relate to the RDF model; in fact, if it is translated to RDF, you get something like the following:

<http://buckingham.uk/queenie> <http://example.com/label">
 "The Queen"@en,
 "Die Königin"@de,
 <http://example.com/the_queen>,
 <http://example.de/die_königin>.

Any language association is lost.

As an alternative, we could consider using a specific blank-node pattern which does generate a reasonable RDF translation. The data could instead expand as follows:

 [{
   "@id": "http://buckingham.uk/queenie",
   "http://example.com/label": [
     { "@value": "The Queen", "@language": "en" },
     { "@value": "Die Königin", "@language": "de" },
     {
       "http://purl.org/dc/terms/language": "en",
       http://www.w3.org/1999/02/22-rdf-syntax-ns#value: {"@id": "http://example.com/the_queen"}
      },
     {
       "http://purl.org/dc/terms/language": "de",
       "http://www.w3.org/1999/02/22-rdf-syntax-ns#value": {"@id": "http://example.de/de_königin}
     }
   ]
 }]

Note that we're now using node deefinitions with a dc:language property, and an rdf:value that references the other value. Of course, a major downside of this is placing built-in dependencies on external vocabularies. We could consider creating equivalents in a json-ld namespace (jsonld:language, jsonld:value), but I don't know if that really helps too much.

The resulting Turtle representation would look like the following:

<http://buckingham.uk/queenie> <http://example.com/label">
  "The Queen"@en,
  "Die Königin"@de,
  [dc:language "en"; rdf:value <http://example.com/the_queen>], 
  [dc:language "de"; rdf:value <http://example.de/die_königin>].

The compaction rules would need to consider such node definitions when selecting values for each language tag. Another advantage is that it allows for all value types to be round-tripped, including typed values and those represented as value objects.

Markus Lanthaler
Collaborator
Gregg Kellogg
Owner

They need to ba generated when expanding, or it won't come out in the RDF.

I believe it's round-trip-able in JSON-LD, so that the blank nodes would be consumed in compaction, but of course not if compaction didn't include a language container term. Still, it's an odd corner case anyway.

Markus Lanthaler
Collaborator
Gregg Kellogg
Owner

How would you know to go back to @language when coming from RDF? Given that the result is flattened, you'd have no contextual knowledge to work with.

Markus Lanthaler
Collaborator
Gregg Kellogg
Owner

My point was about converting from RDF, where it would be expressed using a BNode. How would you know whether to keep the blank node definition you'd currently get, vs. using the @language bit? It might not be critical to round-trip to RDF, but relying on some data modeling outside of the RDF data model is seemingly more problematic, particularly for those who are worried that JSON-LD strays to far from RDF Concepts to be considered RDF.

My mechanism would work entirely within the RDF data model, but we'd need to come up with something better than dc:language, if not rdf:value for representation. We could consider creating a JSON-LD namespace, and bringing back the notion of a default evaluation context to define the namespace, and say jsonld:language and jsonld:value properties, which could be sub-properties of dc:language and rdf:value respectively. There's precedence for doing this: schema:additionalType takes on a subPropertyOf relationship with rdf:type for a similar purpose.

Anyway, I haven't made up my own mind as to which way is better myself; just want to see us explore the alternatives, and make sure we don't annoy important constituents.

Markus Lanthaler
Collaborator
Lin Clark

Gregg and I discussed this today. I agree with Markus, having blank nodes pop up when moving from compact to expanded is something we don't want to see. I'm fine with Markus's proposal that it be generated when moving to RDF.

Gregg Kellogg
Owner

I am somewhat worried that this further separates the JSON-LD data model from the RDF data model, but I believe we can continue to use the @language syntactic convention and factor it into the RDF transformation algorithms. As it happens, Wikia has a similar need to represent non-language-tagged-literals as values of language maps, but the model there is RDF, rather than straight JSON-LD objects, so it is important that these structures round-trip through RDF. The way this could work would be the following:

When transforming JSON-LD to RDF (already in expanded form), in step 1.5:

  • If the element has an @language property, generate a BNode containing a dc:language property set to the value of @language and an rdf:value property and call the algorithm recursively using a copy of the active object without @language.

The RDF to JSON-LD would similarly collapse objects with this form, creating an node definition with a @language property.

Note that this doesn't work for value objects having @type, unless we add support for value objects with both @type and @language, in which case it doesn't work for value objects having only @value.

As an alternative to using dc:language and/or rdf:value, we could consider JSON-LD specific terms jsonld:language and jsonld:value, but I don't think this is particularly useful, particularly in the case of rdf:value.

I still think it would be better to introduce the bnodes in the expanded JSON-LD form, but if that's not acceptable, then this mechanism will cleanly map the language map pattern into an RDF construct which is round-trippable.

Niklas Lindström
Collaborator

1) If we need to go down this path, why not skip the extra blank node and fold dc:language into the resource mapped to by language? Like:

[{
  "@id": "http://buckingham.uk/queenie",
  "http://example.com/label": [
    {"@value": "The Queen", "@language": "en"},
    {
      "@id": "http://example.com/the_queen",
      "http://purl.org/dc/terms/language": "en",
    }
  ]
}]

Of course, that would mean that @language means a different thing (dc:language, perhaps configurable..) when used on non-literals. The justification could be that the same goes for @type, meaning datatype for literals and rdf:type for resources.

2) IMHO this use case appears a bit muddled though, and I still wonder (see my previous comment above) if more interoperable information would be achieved by separating language-tagged literals and resources whose main content is expressed in a certain language. That is, I'd rather see distinct properties used for literals and resources (as in the case of skos vs. skos-xl properties).

If so, it would be useful to extend the @container mechanism to create mappings to resources based on a specific property value. In this case mapping skos-xl labels by values of dc:language, like:

"@context": {
  "extendedLabel": {"@id": "skos-xl:prefLabel", "@container": "dc:language"}
}

Similarly, this would work for translations (having distinct dc:language values), roles or other related properties. As I also said above though, it wouldn't really work for other values than string literals (since the keys are strings). So @container mappings for properties in general could be restricted to string values. Or they could (re-)use @type to mean "datatype of key".

I readily admit that this is would quite the complex feature though, for the sake of zero edit goal (since it does match a fairly common JSON usage pattern). From where I stand though, it seems to follow from the case being debated here. Not following through may be wise anyway of course. But if so, I'm not sure if we're on stable ground with the current proposals.

Markus Lanthaler
Collaborator

ACTION: @gkellogg to write up proposal in language maps.

Use @language to "language-tag" nodes, transform to dcterms:language when converting to RDF. Typed values and plain literals do not roundtrip, i.e., they will fall out of the language map when expanding -> compacting

Lin Clark

If typed values fall out when expanding -> compacting, then this probably doesn't work for us, as I expressed on the telecon a few weeks ago.

Niklas Lindström
Collaborator

What is the nature of those typed values to make them applicable for mapping by language? Can they be considered as properties of language mapped objects (one or more item per object, depending on role/usage)?

Manu Sporny
Owner

Wait a sec, we're not doing something that doesn't work for Drupal - we're at the point where that is a show stopper for me, personally. The solution that we create MUST work for Drupal and in a way that works for their developers. I'm opposed to typed values and plain literals not round-tripping as well - it seems like an unnecessary restriction to make. In the very worst case, we can introduce a new '@langmap' key that is added to expanded form, that then maps to dc:language when converted to RDF and back. That seems like a better solution to me than having ambiguity in JSON-LD expanded form about where 'dc:language' came from.

Markus Lanthaler
Collaborator
Gregg Kellogg
Owner

So, here are the possibilities for how things expand. Consider a language map term such as the following

{ "@context": {"term": {"@id": "http://example/term", "@container": "@language"}},
  "term": {"en": value}
}

Value may be any of the following:

string, value object without @type or @language, value type with either @type or @language, native value, node reference or node definition.

"string": In this case, if value is a string, it is is a language string by virtue of being a string value of a language map term. It expands to {"@value": "string", "@language", "en"}, no problem.

{"@value": "string"}: In this case, the value is specifically a plain literal, so adding @language to it when expanding would change it's meaning. If this is a useful case, it would need to expand using a different keyword; e.g., {"@value": "string", "@langmap": "en"}. This would allow it to round-trip, as "@langmap" does not imply "@language". Not sure if this is an important Drupal case. This can be avoided by expanding such values using @type: xsd:string from an RDF perspective, anyway. It could also be avoided if the @value/@type were use in the value specification, where @type is xsd:string.

{"@value": "string", "@language": "en"}: This expands to itself, however, it will compact down to just "string"; @langmap could change this, but this is probably not a useful case.

{"@value": "string", "@language": "de"}: This is seemingly a contradiction. It could be resolved using @langmap, but it is also not a useful case.

{"@value": "string", "@type": "type"}: This could expand to {"@value": "string", "@language": "en", "@type": "type"} and round-trip. There may be some semantic issues if type is xsd:string, or rdf:langString, but probably not an issue.

1, 1.5, true, false: These could all expand to {"@value": 1, "@language": en"} (for example) and not loose any semantic interpretation.

{"@id": "id"}: This is a node reference, currently identified by being an object with only a @id key. It could expand to {"@id": "id", "@language": "en"} and we could say that this is also just a node reference. It would probably survive compacting, and could be folded into the node definition when framing, but it could be that there are then multiple @language values for a node definition. I don't see any real problem with this, though.

{"@id": "id", "term2": "value"}: this is a node definition, adding @language to this is equivalent to the previous case. In both cases, we would transform this to dc:language when generating RDF. When consuming RDF, dc:language would turn back into @language, although there is probably a flag to control this, much as there is in the use of native values.

As we see, there are some cases where an @langmap would be useful for round-tripping certain values, mostly for a plain literal. I would propose that we not do this, and stick with using the existing @language. Furthermore, the presence of @language in a node definition should be equivalent to having a local context with @language.

PROPOSAL: expanding language map terms adds an @language key with the appropriate language to the expanded values except for the case that the value contains only the key @value.

PROPOSAL: The use of an @language key within a node definition may take one or more values, all of which MUST be conformant to BCP47. When transforming to RDF, this is translated to dcterms:language. When compacting, a node (or node reference) containing an @language key MUST match a term with @container: @language. (Note that it probably also has to have @type: @id given the current term matching semantics).

Niklas Lindström
Collaborator

From what I've gathered, the problem seems to be that Drupal does not use the notion of a document written in a specific language (with its own title, language, timestamps, author and subjects), but splices a language key between the document (or "node") and each of these properties. While we have seen the utility of language maps for literals (I've encountered a bunch in my work, Wikidata had needs for it), the notion of "author by language" or "updated by language" with nothing having that language appears very strange. Is there any comprehensible information model to justify it?

Maps in general are commonly used in JSON, e.g. in the sense of "an object whose keys represent the value of a property (here, language) on the value(s) for that key". But what does the key mean in the following example?

"author": {
    "en": {"@id": "/lin"},
    "de": {"@id": "/stephane"}
}

Is it "an object whose keys represent the language used for any english statements about this resource, and whose values are the authors who made any such statements"?

I think the notion of an english and a german version would be more beneficial, which could be written as:

"version": {
    "en": {
        "author": {"@id": "/lin"},
    },
    "de": {
        "author": {"@id": "/stephane"}
    }
}

With the "version" term here being declared with @container: @language. Combined with the proposal above, since the object is not a literal, that would mean that the first resource has dc:language of "en", the second of "de".

Alternatives if the resource is essentially multilingual, are to either not map properties with no inherent language (like author or timestamps), or to use the notion of an editor role for a specific language. Like:

"editor": {
    "en": {"occupant": {"@id": "/lin"}},
    "de": {"occupant": {"@id": "/stephane"}}
}

Where the objects are an editor roles with the properties 'language' (dc:language) and 'occupant'. Perhaps not the most comprehensible notion, but at least there's something there carrying the language property. (Though I'd really prefer to extend maps to any property and use something more appropriate than dc:language, if this pattern was a necessity.)

It's not obvious to me what the equivalent would be for:

"updated": {
    "en": {"@type": "xsd:dateTime", "@value": "2012-11-06T19:34:20+0100"},
    "de": {"@type": "xsd:dateTime", "@value": "2012-11-06T19:34:21+0100"}
}

in all these cases (nor if such a representation is even sought after). It seems to pertain to "the latest updating of any value in the specific language". Which might imply the notion of language versions of various parts, affecting the modeling further.

An example of that would be to represent e.g. the title as a resource on its own:

"title": {
    "en": {
        "value": "English title",
        "author": {"@id": "/lin"},
        "updated": {"@type": "xsd:dateTime", "@value": "2012-11-06T19:34:20+0100"}
    }
}

Here with "title" being defined as a language map, thus implying a "dc:language" with a value of "en" for the object. (I'd map "value" to "rdf:value" here to get the RDF, since it's a textbook structured value.)

Manu Sporny
Owner

PROPOSAL: expanding language map terms adds an @language key with the appropriate language to the expanded values except for the case that the value contains only the key @value.

-1

I'd prefer we use @langmap, instead of @language, as that doesn't produce any nasty corner cases, AFAIK. By overloading @language, we are creating a set of corner cases that we don't have to deal with if we were to just mint a new keyword. If we use @langmap, I'd prefer that we specify that a container is a language map by doing the following:

"@container": "@langmap"

PROPOSAL: The use of an @language key within a node definition may take one or more values, all of which MUST be conformant to BCP47. When transforming to RDF, this is translated to dcterms:language. When compacting, a node (or node reference) containing an @language key MUST match a term with @container: @language. (Note that it probably also has to have @type: @id given the current term matching semantics).

If we use @langmap instead, I'm +1 to the above.

Markus Lanthaler
Collaborator

I think before discussing the details we should ask @linclark to provide us some examples or a list of requirements Drupal has. The only thing I've found in that regard is http://groups.drupal.org/node/249688.

Lin Clark

Current development has moved from the groups discussion area to the drupal.org issue queue. The meta issue can be found at http://drupal.org/node/1784216. From there, you can find what a site generated vocabulary might look like and an example of the object structure in one of our two profiles (I show the other below).

I have worked up an example. This has two literal values with datatypes, rdf:HTML and xsd:integer.

{
  "@context": {
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "site": "http://ex.org/",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "body": {
      "@id": "site:body",
      "@type": "rdf:HTML"
    },
    "num_comments": {
      "@id": "site:num_comments",
      "@type": "xsd:integer"
    },
    "author": {
      "@id": "site:author",
      "@type": "site:User"
    }
  },
  "@id": "site:node/1",
  "body": {
    "en": [
      "Here is some body text for the article."
    ],
    "de": [
      "Hier sind einige Textkörper für den Artikel."
    ]
  },
  "num_comments": {
    "en": [
      5
    ],
    "de": [
      3
    ]
  },
  "author": {
    "en": [
      {
        "@id": "site:user/1"
      }
    ],
    "de": [
      {
        "@id": "site:user/2"
      }
    ]
  }
}
Markus Lanthaler
Collaborator

Thanks a lot Lin! Would it be an option to change that serialization to something like the following.

{
  "@context": {
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "site": "http://ex.org/",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "language": {
      "@id": "site:language",
      "@container": "@language"
    },
    "body": {
      "@id": "site:body",
      "@type": "rdf:HTML"
    },
    "num_comments": {
      "@id": "site:num_comments",
      "@type": "xsd:integer"
    },
    "author": {
      "@id": "site:author",
      "@type": "site:User"
    }
  },
  "@id": "site:node/1",
  "language": {
    "en": {
      "body": [ "Here is some body text for the article." ],
      "num_comments": [ 5 ],
      "author": [ { "@id": "site:user/1" } ]
    }
    "de": {
      "body": [ "Hier sind einige Textkörper für den Artikel." ],
      "num_comments": [ 3 ],
      "author": [ { "@id": "site:user/2" } ]
    }
  }
}

I had a look at the patch you attached to the issue on drupal.org (not sure if that's the current code). If I understand the code correctly the change would be quite trivial:

class DrupalJsonldEntityWrapper extends JsonldEntityWrapper {
  /**
   * Get properties, excluding JSON-LD specific properties.
   *
   * Format Entity properties for consumption by other Drupal sites. In
   * Drupal's vendor specific JSON-LD, fields which correspond to primitives
   * have an intermediary data structure between the entity and the value.
   */
  public function getProperties() {
    // Properties to skip.
    $skip = array('id');

    // Create language map property structure.
    foreach ($this->entity->getTranslationLanguages() as $langcode => $language) {
      foreach ($this->entity->getTranslation($langcode) as $name => $field) {
        $definition = $this->entity->getPropertyDefinition($name);
        $langKey = empty($definition['translatable']) ? 'und' : $langcode;
        if (!$field->isEmpty() && !(in_array($name, $skip)) {    // #### changed
          $properties[$langKey][$name] = $field->getValue();     // #### changed
        }
      }
    }

    return $properties;      // #### changed
  }
}

Or do you need to have the data in the other shape for a specific reason!? Your shape is more efficient if you want to loop over all languages of a specific property, mine is more efficient to loop over all properties of a specific language.

This would allow us to avoid the problems we otherwise face with typed values.

Lin Clark

Just to be clear, this isn't about difficulty of coding in PHP as much as difficulty in communicating to different consumers.

You are using a "language" term which has a URI, from which I would expect an intermediary blank node would be created. So in your example, the triple model would be:

node/1 language blanknodeEn
blanknodeEn body "Here is some body text for the article."^^rdf:HTML
blanknodeEn num_comments "5"^^xsd:integer
blanknodeEn author site:user/1

Correct me if I'm wrong about the way that translates to triples.

If that is the model intended, then that was already discussed in July or August as an option, and it didn't seem to work for our use case. What was also discussed was the use of named graphs, which was discouraged by the JSON-LD WG.

If the WG no longer believes it can make language maps work for our use case, I can raise the issue with the community again and see whether this way of modeling is an option... though it would be disappointing since we are getting close to feature freeze on D8 and it's unclear what impact such uncertainty would have.

Markus Lanthaler
Collaborator

I assumed some of the problems stem from the fact that Drupal is structured internally that's why I had a look at the code and saw that it would indeed be trivial to change the data model.

You are using a "language" term which has a URI, from which I would expect an intermediary blank node would be created.

Yes, that's true. The alternative is to introduce blank nodes for every typed value. There's simply no way around it if you need to round-trip to RDF. The data expressed in your example above would generate something like this:

site:node/1 body blanknodeEn
blanknodeEn rdf:value "Here is some body text for the article."^^rdf:HTML
blanknodeEn dc:language "en"

So there will be blank nodes in RDF regardless the option you choose. The only difference is that those blank nodes would be implicit, i.e., hidden by a syntactic construct, in JSON-LD. What does the following statement actually mean?

{
  "@value": 5
  "@type": "xsd:integer",
  "@language": en"
}

That 5 is an English integer? I would expect quite some pushback if we are going to allow this. Using @langmap for that matter wouldn't change much.

If that is the model intended, then that was already discussed in July or August as an option, and it didn't seem to work for our use case. What was also discussed was the use of named graphs, which was discouraged by the JSON-LD WG.

Really? As I understood it, it would only be a problem if additional blank nodes would pop up in JSON-LD. Back in July you said

I'm pretty sure that niklasl's proposal above would work for Drupal's multilingual field use case. This would be helpful, since it would mean we wouldn't have to deal with the complexity of named graphs. 2012-07-22

Niklas' proposal contained exactly the same blank nodes:

{
  "@context": {
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "dc": "http://purl.org/dc/terms/",
    "skosxl": "http://www.w3.org/2008/05/skos-xl#",
    "ipa": "http://dbpedia.org/resource/International_Phonetic_Alphabet",
    "labelByLang": {"@id": "skosxl:prefLabel", "@container": "dc:language"}
  },
  "@id": "http://dbpedia.org/resource/Queen_Elizabeth",
  "labelByLang": {
    "en": {
      "@type": "skosxl:Label",
      "skosxl:literalForm": "Queen Elizabeth",
      "ipa": "kwiːn ʔeˈliːzabɛt əˈlɪzəbəθ"
    },
    "de": {
      "@type": "skosxl:Label",
      "skosxl:literalForm": "Königin Elisabeth",
      "ipa": "ˈkøːnɪɡɪn ʔeˈliːzabɛt"
    }
  }
}

I'm confident we can find a solution for this but we need to respect a number of things we will not be able to change. Associating a language to a typed value is one of them. We could work around this by introducing metadata but that is IMHO the ugliest solution of all.

Lin Clark
Yes, that's true. The alternative is to introduce blank nodes for every typed value. There's simply no way around it if you need to round-trip to RDF.

I'm not sure where the requirement of round-tripping this language handling to RDF came from. I've said that we don't care about round-tripping this to RDF in most of the telecons I've been on, and explained it in multiple posts. We do not need resources or typed literals to be language tagged in RDF, we only need to be able to round trip the language map within the JSON-LD representation.

If this requirement to round trip this language information to RDF comes from somewhere else and the JSON-LD WG would prefer to support that use case with language maps instead of ours, that's fine. We just need to know.

That 5 is an English integer? I would expect quite some pushback if we are going to allow this... We could work around this by introducing metadata but that is IMHO the ugliest solution of all.

I explained this when you brought it up in the telecon on 2012-10-09, starting around 1:41:00 mark.

You started by asserting that JSON-LD didn't need to support round tripping typed literals.

Gregg: "We could just say that having typed values in a language map is illegal"

Me: "That won't work for us"

Gregg: "You have a case where you're going to have a language map where the values are typed literals?"

Me: "We could have that, basically, for the same reason we have language for nodes. We basically want to create a fake named graph... The value of this field is only 4 in the German version, and in the English version it's 25."

Gregg: "Oh. That's odd".

Me: "Right, that's the inconsistency between the way that we handle language and the way that RDF handles language that has kind of motivated the Drupal community's push on this."

You then asked whether we needed it to be datatyped, to which I responded that some people might need it. (Since then, CreateJS has shown a use case for knowing the datatype).

I then went on to give the example of "number of likes", which is similar to the number of comments in the example above.

Again, if the JSON-LD WG doesn't want to support this with language maps, it is something we need to know.

Gregg Kellogg
Owner

I'm not sure where the requirement of round-tripping this language handling to RDF came from. I've said that we don't care about round-tripping this to RDF in most of the telecons I've been on, and explained it in multiple posts. We do not need resources or typed literals to be language tagged in RDF, we only need to be able to round trip the language map within the JSON-LD representation.

Well, JSON-LD is an RDF serialization, and a product of the RDF Working Group. There's also a strong desire to align the JSON-LD data model with RDF Concepts. As we discussed in Berkeley, my preference would be to have the expanded (or flattened) JSON-LD representation pretty closely match that of RDF, and using BNodes for this make a lot of sense to me.

Right now, I'd be a +1 for using a BNode representation, but a +0.5 for using syntactic representations (e.g., @langmap) in case it is a show-stopper for Drupal.

Markus Lanthaler
Collaborator

Lin, we definitely want to support Drupal's use case but we can't move too far away from the RDF data model.

Just to be clear, this isn't about difficulty of coding in PHP as much as difficulty in communicating to different consumers.

Could you please elaborate a bit on the second part of the sentence above. I still can't see why the blank nodes in the example I outlined above matter.

As I see it, for a client working with it just as JSON (not JSON-LD) it doesn't matter at all. The only difference is in the efficiency to retrieve a property in all languages vs. all languages of a property.

For a client working with JSON-LD it doesn't really matter either because it round-trips cleanly when expanding-compacting.

For a client working with the data in RDF it does matter as the blank nodes move from the language-level (my proposal) to the value level (the other proposal supporting round-tripping) or the associated language disappears completely.

Dave Longley
Owner

I'd support adding @langmap to deal with the plain literal combined w/a language key roundtripping issue. In other words, when expanding we'd add a @langmap property to everything that was present in a language map, setting its value to the related language key. When compacting we'd use that value as the key again. If no @langmap property is present during compaction, then we'd use the @language value, and if there was no @language value we would not match the @langmap @container term (and use another term or the full URI).

So that means only @language would affect the meaning of the data and @langmap would affect the positioning/roundtripping.

Lin Clark
Could you please elaborate a bit on the second part of the sentence above. I still can't see why the blank nodes in the example I outlined above matter.

So if we want to allow users to map their fields to Schema.org terms in order to reach other consumers, how would the suggested structure (blanknode per translation) map? Again, that structure is:

node/1 language blanknodeEn
blanknodeEn body "Here is some body text for the article."^^rdf:HTML
blanknodeEn num_comments "5"^^xsd:integer
blanknodeEn author site:user/1

Schema.org expects the item to have properties structured in the following way:

node/1 articleBody "Here is some body text for the article."
node/1 interactionCount 5
node/1 author site:user/1
Niklas Lindström
Collaborator

Like this:

@prefix : <http://schema.org/> .
@prefix dc: <http://purl.org/dc/terms/> .

<node/1> dc:hasVersion <node/1/en> .

<node/1/en> a :Article .
<node/1/en> :inLanguage "en" .
<node/1/en> :articleBody "Here is some body text for the article."@en .
<node/1/en> :interactionCount 5 .
<node/1/en> :author <user/1> .

Of course, if <node/1/en> isn't published separately, but only accessible from <node/1> (is that so; e.g. using only conneg?), a blank node is sufficient. Note that the example in http://schema.org/Article is a blank node, captured within an undescribed resource (the web page containing that markup).

Schema.org currently lacks the notion of versions, translations etc. The bibliographical world is of course ripe with such relations, which is why for the larger interoperability concerns I suggest to use (at least) Dublin Core to provide that additional detail.

As said above, I think the notion of an english and a german version is the beneficial one. It makes the entire use case much clearer. And for it to work with @container, we either need to make @language behave like dc:language for resources, or to support basic plain literal properties as values to container. For the above, we could then define e.g.:

"@context": {
  "version": {"@id": "dc:hasVersion", "@container": "http://schema.org/inLanguage"}
}
Lin Clark

@niklasl That means that in the JSON-LD the URI is different between language versions, which is something that the Drupal community has said they don't want. It also means the URI would be different depending on whether you use language maps or not. We may want to respond without language maps when a specific language is requested (e.g. via the Accept-Language header).

I don't believe that language maps can both fit our use case and also the requirement that everything round trip to RDF.

Niklas Lindström
Collaborator

@linclark Not necessarily. I also said: "if <node/1/en> isn't published separately, but only accessible from <node/1>, a blank node is sufficient". That should cover what the Drupal community wants: don't mint a URI for the version – just use a blank node, described in the data accessed by <node/1>. When doing conneg with Accept-Language, only include the node describing the version in the matched language.

I am pretty convinced that your use case is logically sound and can be described just fine in RDF. :) We just have to figure out what the precise meaning is, and then refine the @container mechanism to support a convenient JSON-shape for that.

I recommend the above suggestion, or a variant thereof. For instance, the node may be a more abstract entity than a message possible to express in different articles. If so, the article in english and german respectively are not versions of that message per se, but describe it, like:

<node/1> wdrs:describedby
    [ a :Article; :inLanguage "en"; :author <...>; interactionCount 5 ],
    [ a :Article; :inLanguage "de"; :author <...>; interactionCount 3 ] . 

But if you prefer named graphs, your new proposal (#195) is certainly interesting as well. Just keep in mind that modeling explicitly using named graphs is usually an order of magnitude more complex than just linking to articles in different languages. What you suggest there, to splice the graph id as a key into each property, looks a lot like a "diff" view between different revisions of the same information about a resource. It is quite intriguing, and may, as you say, also be usable for revision auditing and similar.

However, it does seem more complex than necessary, unless you're absolutely adamant about the different descriptions (separated by language) being, in essence, the same resource. It's just that my "conflation alarm" goes off by that ;) – and I fear it may cause interoperability problems down the road.

Lin Clark

@niklasl It may be that it's more complex than necessary, but I have yet to see what problems the added complexity introduces, and it seems to solve our use case without any contortions in our own data handling.

If you have a chance to outline the practical implications that you predict from the added complexity of named graphs in the other issue, it would really help in fully evaluating whether it's a viable approach. Thanks!

Lin Clark

We are no longer pursuing language maps for our use case, but one proposal has come up offline a couple of times now. In case the CG plans to continue development of language maps, I want to make sure that the following flaw with the proposal is recorded in this thread.

The proposal is to wrap the object in a blank node. However, this would limit the vocabularies that you can use with language maps.

For example:

<node/1> schema:articleBody
    [ :en "This is the body text"^^rdf:HTML; ] .

This would violate the range constraint of schema:articleBody.

Gregg Kellogg
Owner

Another thing we discussed, rather than using BNodes, is to use property extensions. For example, this could result in the following:

<node/1> schema:articleBody/en "This is the body of text"^^rdf:HTML .
schema:articleBody/en rdfs:subPropertyOf schema:articleBody

Schema.org doesn't actually need the subPropertyOf, but it allows other reasoners to know that the properties are related.

Niklas Lindström
Collaborator

Yes, it's good to lay that proposal to rest. It should also be noted that (AFAIK) the only reason it came up was to attempt to preserve data which the JSON desired by Drupal expresses in an unusual shape (where our interpretation violated most known vocabularies, as was noted when proposed).

The original language map proposal on the other hand is (was originally, and can now continue to be) only about expressing as keys the languages of language-tagged literals. That does not violate these constraints. It just provides a syntactical language map in JSON for what otherwise has to be iterated over. (The other, extended, more complex proposal for @container about mapping on regular properties (see e.g. point 2 in my comment above), is also free of any odd data patterns, RDF-wise.)

The current problem (separate from this issue) is that Drupal doesn't want this information (the language-like keys) preserved at all, only to syntactically preserve the shape. The reason is (IIUC) to hide the treatment of language-based versions of the descriptions from being exposed in RDF. That seems to require either:

  • splicing faux language keys, actually representing these versions as named graphs, between term and value (issue #195), or
  • some kind of probing (akin to #84, but quite different in detail: instead ignoring the object with the special key but needing linkage preserved, plus extended for expansion), or
  • a way to concatenate a term with its object's keys, as Gregg suggests above (to be used in combination with schema.org:s particular property extension mechanism).

Again, I believe that the simplest solution would be to just acknowledge and express these language versions in public data, regardless of how they are diffused over node properties internally in Drupal. I must also stress that these things are much easier to reason about if the data is first expressed as RDF, which has grounded semantics, and only once the meaning is established seek any possible compact syntactical forms of that, for the purpose of matching desirable usage patterns (in programming or templating).

Markus Lanthaler
Collaborator

@linclark, just out of curiosity, how are you going to address your use case?

Does that also mean that you no longer need "@container": "@graph" (#195)?

Markus Lanthaler
Collaborator

We already resolved a while ago

RESOLVED: Add support for language maps via the "@container": "@language" annotation in @context. For example: "tags": { "@id": "http://example.com/vocab#tags", "@container": "@language"}. The child property of the term MUST be an associative array. All associative array keys MUST be BCP47 language strings.

PROPOSAL 3: The values of the key-value pairs of a language map MUST be strings or arrays of strings. When expanded, the strings are tagged with the language specified by the key. When compacting, only language-tagged strings will match a term that has a "@container": "@language" mapping. Terms that have a "@container": "@language" mapping MUST NOT be type-coerced.

We could also allow other values such as plain literals or nodes but, as the language information would be lost during expansion, I don't think we should do that. If we disallow this now we drastically simplify the introduction of more sophisticated mechanisms at a later point in time since it won't change existing data. Therefore the MUST in the proposal above.

Lin Clark

@niklasl As I've pointed out, named graphs actually do handle our use case while also expressing the information in RDF. I believe you and I disagree about how odd this is. For example, JeniT has written about named graphs used for versioning UK government data. But that is besides the point and I don't want to hijack the thread with this debate, or with more attempts to convince us to handle our language-based entity variants as separate resources.

I believe the current problem is how the CG deals with other (non-Drupal) use cases. For example, when I met with Gregg in Berkeley, it seemed that he had his own use case for language maps that could contain node references.

@lanthaler I would prefer to use named graphs, and thus would still like to see #195 developed. However, Manu discussed another way which would not preserve the information in RDF, but would at least be good enough for us.

Gregg Kellogg
Owner

I believe the current problem is how the CG deals with other (non-Drupal) use cases. For example, when I met with Gregg in Berkeley, it seemed that he had his own use case for language maps that could contain node references.

To be clear, I have a use case where I have RDF data including information for separate languages, that I need to serialize in RDF. It was not necessarily the case that it needed to be done with language maps. In fact, named graphs may very well be the best way to do it. The Wikia case is different, though, as there are different resources for each language version (like WikiPedia), so named graphs might make sense, if you use named graphs to describe the resources of each page.

Niklas Lindström
Collaborator

@linclark I've replied in to this in issue 195, since you're right that this issue should focus on the effect of @container: @language only.

Also, I hope you have time to consider issue #196, which is an attempt to handle a bunch of related topics regarding syntactic extensibility with no defined semantics. I'm not sure if it'll have traction, but I believe it is close to the variant that has been discussed offline that you thought may be useful.

Gregg Kellogg
Owner

+1 to PROPOSAL 3.

For expansion, I would say that non-string (or array of string) values of a property with language maps are expanded to use the property, but loose the language association. That is, they don't round-trip.

As a general principle, I'm fine with syntactic constructs that allow for zero-edits when expanding, but opposed to them for round-tripping through expansion, unless they also have a representation that can be round-tripped through RDF.

Niklas Lindström
Collaborator

+1 to PROPOSAL 3 (for the behavior of @container: @language).

I also agree with the general principle. However, as noted in the last part of the #196 description, I may be willing to compromise that principle in that case if it is proven essential to usage and doesn't wreak havoc upon the expansion algorithm. Not for this issue though (since preserving @language would add ambiguous or nonsensical information).

Markus Lanthaler
Collaborator

RESOLVED: The values of the key-value pairs of a language map MUST be strings or arrays of strings. When expanded, the strings are tagged with the language specified by the key. When compacting, only language-tagged strings will match a term that has a "@container": "@language" mapping. Terms that have a "@container": "@language" mapping MUST NOT be type-coerced.

Manu Sporny
Owner

To be clear: JSON-LD 1.0 will support simple language maps. When using a language map and expanding, if the term's language key's value is not a simple string, the rule for using the language map does not apply (all language-map values get dropped). When compacting, if all statements in the list are not simple @value/@language objects, then the term that defines the language map does not match (the statements are kept in expanded form).

Markus Lanthaler
Collaborator

@msporny, could you please explain what you mean with the last sentence:

When compacting, if all statements in the list are not simple @value/@language objects, then the term that defines the language map does not match (the statements are kept in expanded form).

Every value will be evaluated separately and there might be values without an @language that weren't part of a language map... but maybe I'm don't understand what you are saying.

Gregg Kellogg
Owner

I just want to clarify that arrays of strings, or strings within an @list are also considered as being appropriate for use with language maps.

To clarify @msporny's description of compaction, a language map term is only appropriate for values which have @language. Otherwise, other terms (at a lower term rank) can also be considered, defaulting to an expanded IRI if none is found.

Markus Lanthaler
Collaborator

I just want to clarify that arrays of strings, or strings within an @list are also considered as being appropriate for use with language maps.

@list as well? Really? That would make everything a lot more complex

Gregg Kellogg
Owner

@list could work for multiple lists just like the example in #172 distinguishes based on language. However, I don't have a strong opinion on this.

Markus Lanthaler
Collaborator

I would prefer to not do that.

Manu Sporny msporny closed this in 51de407 December 02, 2012
Manu Sporny
Owner

@gkellogg I added the "@container": "@language" algorithms to the spec, but without the support for "@list" as you mentioned above as it would complicate the algorithm and seems like we could always add that feature later, if necessary. If the folks that have been active in this thread could look at the commit diff and make sure I implemented this correctly, I'd appreciate it. It took a very long time to figure out where to hook into the various algorithms on this feature and even once I did the work, it was difficult to figure out if there were going to be any side-effects from the modification to the algorithms.

Gregg Kellogg
Owner

I'm fine with not having @list support.

Markus Lanthaler
Collaborator
Markus Lanthaler lanthaler referenced this issue from a commit December 08, 2012
Markus Lanthaler Test that compaction falls back to term with @set containers if no la…
…nguage maps are available

Also removed unnecessary data from compact-0026-context.jsonld.

This addresses #133.
a0a67a9
Markus Lanthaler lanthaler referenced this issue from a commit December 11, 2012
Markus Lanthaler Only invoke language and annotation map expansion if the value is a J…
…SON object

See Gregg's changes in 8c546b9.

This addresses #133 and #196.
5cb6ba2
Markus Lanthaler lanthaler referenced this issue from a commit December 11, 2012
Markus Lanthaler Sort keys of language maps case-sensitively before expanding
This leads to much better performance as the keys don't have to be lower-cased multiple times.

This addresses #133.
e0fb8a4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.