Language type lacks guidance on properties #1079

Closed
danbri opened this Issue Apr 6, 2016 · 19 comments

Projects

None yet

7 participants

@danbri
Contributor
danbri commented Apr 6, 2016

Language - which property encodes the language and when to use formal vs informal names for languages. "Spanish" vs "ES", EN-uk etc.

@mfhepp
Contributor
mfhepp commented Apr 6, 2016

To be frank, I think the handling of language information was much better in the strict RDF worlds where any plain literal could have a language tag, so it was straighforward to represent alternative texts for the same property depending on the language.

Of all relevant syntaxes for schema.org, only Microdata lacks this feature AFAIK. Why don't we simply recommend RDFa or JSON-LD for use-cases that require language meta-data? That seems much better to me that introducing a mechanism at the level of the vocabulary.

@danbri
Contributor
danbri commented Apr 6, 2016

I agree there are issues with Microdata (or Microdata-as-RDF at least).

However you're missing the leading use case for the Language type: the many other contexts in which it is useful to mention and name languages, beyond annotation of sections of textual markup.

For examples see the incoming properties section at bottom of http://schema.org/Language

In all these cases we might have an informal name for a language, or (for human languages) a code from https://tools.ietf.org/html/bcp47 or perhaps a Wikipedia link.

@Aaranged
Aaranged commented Apr 6, 2016

Preliminary thoughts on this is that a new data type for Language be constructed, along the lines of the data type Date, with reference to a standard.

Which standard?

  • For the one class where Google specifies a standard it is IETF BCP 47
  • In Yandex documentation the language code specified is ISO-639 (while it's not stated explicitly in the documentation, the code type employed in their examples is ISO 639-1)
  • Bing still recommends use of <meta http-equiv="content-language"> for language declarations, where the value for content is "comprised of a 2-letter ISO 639 language code, followed by a dash and the appropriate ISO 3166 geography code"
  • JSON-LD specifies BCP 47 for @language values
  • In their specifications for implementing the hreflang tag both Google and Yandex stipulate ISO 639-1 for the language, and ISO 3166-1 alpha-2 for the optional region declaration (appended to the language code with a dash as separator)
  • W3C's HTML5 recommendation on language calls for a valid BCP 47 language tag
  • W3C's internationalization documentation on "Choosing a Language Tag" and "Language tags in HTML and XML" both cite BCP 47
  • schema.org/inLanguage, of course, expects the value to be Language or Text, with the directive "Please use one of the language codes from the IETF BCP 47 standard"

All things being equal it seems to me that BCP 47 should be the standard for schema.org language declarations.

All of this glosses over the data model issue that Language is already constituted as [a class in schema.org](Please use one of the language codes from the IETF BCP 47 standard), rather than a data type - and perhaps there's reason why the data model would want to support Text as an expected value rather than only pointing to BCP 47 (although this isn't done for Date, although dates, like languages, can also be represented in content as a string).

Regarding the comment by @mfhepp, aside from the response from @danbri I'd point to the potential limitations of language declarations in RDFa and JSON-LD in the context of possibly extending the classes on which inLanguage (or a similar property) can be used, as per #1065 started by @betehess. I.e. JSON-LD permits @language only to be declared for a string, not a URI.

@danbri
Contributor
danbri commented Apr 6, 2016

/cc @chaals

@danbri
Contributor
danbri commented Apr 6, 2016

Another use case: #1084 - proposal to add 'preferred language' for describing persons.

@danbri danbri self-assigned this Apr 6, 2016
@danbri danbri added this to the sdo-deimos release milestone Apr 6, 2016
@Aaranged
Aaranged commented Apr 6, 2016

Regarding programmingLanguage, which has as its expected type Language, I believe this is an awkward conflation of "language" in the conventional sense of "conventional human languages" with computer programming "language", "a system of signs for encoding and decoding information" (all this from the Wikipedia article on language).

They're clearly quite different things, which is why one won't find C++ in BCP 47. :) IMO the expected type for programmingLanguage should not be Language, but Text. (Absent, say, a enumerated value or code for programmingLanguage - but the former is unwieldy and has extensibility issues, and for the latter no standard, AFAIK, exists. This is not the case for human languages, which are unlikely to be extended and are supported by standards like BCP 47).

@betehess
Contributor
betehess commented Apr 6, 2016

Why don't we simply recommend RDFa or JSON-LD for use-cases that require language meta-data?

👍 We can do such a thing? 👍

All things being equal it seems to me that BCP 47 should be the standard for schema.org language declarations.

5135700

@mfhepp
Contributor
mfhepp commented Apr 7, 2016

As far as I understand, BCP 47 includes ISO 639 language tags, which are the basis for tagging the language of RDF plain literals. So maybe we have to update the RDF spec to broaden the set of language tags to the full BCP 47 set.

Except for thar, there is no conflict.


martin hepp
www: http://www.heppnetz.de/
email: mhepp@computer.org

Am 07.04.2016 um 01:38 schrieb Alexandre Bertails notifications@github.com:

Why don't we simply recommend RDFa or JSON-LD for use-cases that require language meta-data?

We can do such a thing?

All things being equal it seems to me that BCP 47 should be the standard for schema.org language declarations.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@mfhepp
Contributor
mfhepp commented Apr 7, 2016

RDF (at least 1.1) already allows all BCP 47 tags:
https://www.w3.org/TR/rdf-plain-literal/


martin hepp
www: http://www.heppnetz.de/
email: mhepp@computer.org

Am 07.04.2016 um 01:38 schrieb Alexandre Bertails notifications@github.com:

Why don't we simply recommend RDFa or JSON-LD for use-cases that require language meta-data?

We can do such a thing?

All things being equal it seems to me that BCP 47 should be the standard for schema.org language declarations.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@iBootstrap

I came across this post while googling to find how to use 'programmingLanguage' ... let's say I have a div containing C++ code ... I'm still unsure of the best way ...

<div itemprop="programmingLanguage" content="c++">
   // some c++ code here
</div>
@danbri
Contributor
danbri commented Apr 16, 2016

Looking at this again, I suggest:

  • Creating a parallel ComputerLanguage type plus migrate programmingLanguage to point to it
  • Clarify in definition of Language type that it is now (primarily) for representing human language, but that it can be used for other language-like systems. Partially this is to ease the migration of computer languages over to ComputerLanguage, but also to avoid nitpicking about any borderline cases.
  • Endorse BCP 47 explicitly in the definition
  • Decide and document which property links from an instance of the Language type to a BCP 47 code; alternateName? generalize 'code'?
@Dataliberate
Contributor

+1 on all of above, including generalizing 'code'

@danbri
Contributor
danbri commented Apr 18, 2016

I looked into generalizing 'code' but it is a big job and tied up with our desire to engage with SKOS and external enumerations productively. I have done something simpler for now.

  • Created a new ComputerLanguage type (rather than ProgrammingLanguage, to avoid nitpicking on whether SQL, datalog, RIF etc are programming or not; also we don't want types and properties to have the same name when case is omitted, and 'programmingLanguage is already taken).
  • Updated ComputerLanguage and Language with cross-references. Language in particular notes its former use for computer languages.
  • Updated programmingLanguage to have both Text (as @Aaranged suggests) and ComputerLanguage as values.
  • Added mention of BCP 47 to Language, as suggested by @betehess and others. Endorsed the alternateName property as a place to put these values (this seems better than saying nothing at all).
@danbri danbri pushed a commit that referenced this issue Apr 18, 2016
Dan Brickley Language + ComputerLanguage are now independent, cross-referenced.
BCP 47 endorsed, via alternateName property.
Fix for #1079
51ec72a
@chaals
Contributor
chaals commented Apr 19, 2016

Include e.g. HTML (or some not-really-a-programming-language form of computer representation of stuff) in the example?

@danbri
Contributor
danbri commented Apr 19, 2016

I'd avoid HTML at this stage as it could easily confuse people via too many meta layers. Plus the only property we have right now designed for ComputerLanguage is programming language. I'll work up something...

@danbri
Contributor
danbri commented Apr 19, 2016

Investigated... It seems we have no example containing programmingLanguage currently. We should fix that.

For this release I'd like to focus on reflecting the BCP 47 consensus back into the human-language oriented definitions, unless someone here has time to put together a quick programmingLanguage / ComputerLanguage example. Looking more carefully at our definitions I have made some modest changes to clarify that both http://schema.org/inLanguage and http://schema.org/availableLanguage can directly take BCP 47 codes as Text values, in addition to referencing an item of type Language. That makes sense because the vast majority of uses there is nothing more to say about the language except that it is the value of some property. I have also updated one example to add alternateName="es" alongside the existing name="Spanish" property of a Language. This update also cross-references inLanguage and availableLanguage. We would do well to clarify their relationship further in future - there are subtle differences (content/performance vs ability to use) but this could be better explained.

@danbri danbri pushed a commit that referenced this issue Apr 19, 2016
Dan Brickley Small changes for consistency and cross-referencing between inLanguag…
…e and availableLanguage.

See #1079.
3295651
@danbri danbri pushed a commit that referenced this issue Apr 19, 2016
Dan Brickley Updated release notes around language properties to reflect inLanguag…
…e / availableLanguage improvements.

See #1079
76ad6f8
@danbri
Contributor
danbri commented Apr 28, 2016

Closing per http://webschemas.org/docs/releases.html#g1079

There are other language-related conversations rolling along nearby, but I think we've addressed #1079 as originally raised.

@danbri danbri closed this Apr 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment