New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language type lacks guidance on properties #1079

Closed
danbri opened this Issue Apr 6, 2016 · 20 comments

Comments

Projects
None yet
8 participants
@danbri
Contributor

danbri commented Apr 6, 2016

Language - which property encodes the language and when to use formal vs informal names for languages. "Spanish" vs "ES", EN-uk etc.

@mfhepp

This comment has been minimized.

Show comment
Hide comment
@mfhepp

mfhepp Apr 6, 2016

Contributor

To be frank, I think the handling of language information was much better in the strict RDF worlds where any plain literal could have a language tag, so it was straighforward to represent alternative texts for the same property depending on the language.

Of all relevant syntaxes for schema.org, only Microdata lacks this feature AFAIK. Why don't we simply recommend RDFa or JSON-LD for use-cases that require language meta-data? That seems much better to me that introducing a mechanism at the level of the vocabulary.

Contributor

mfhepp commented Apr 6, 2016

To be frank, I think the handling of language information was much better in the strict RDF worlds where any plain literal could have a language tag, so it was straighforward to represent alternative texts for the same property depending on the language.

Of all relevant syntaxes for schema.org, only Microdata lacks this feature AFAIK. Why don't we simply recommend RDFa or JSON-LD for use-cases that require language meta-data? That seems much better to me that introducing a mechanism at the level of the vocabulary.

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Apr 6, 2016

Contributor

I agree there are issues with Microdata (or Microdata-as-RDF at least).

However you're missing the leading use case for the Language type: the many other contexts in which it is useful to mention and name languages, beyond annotation of sections of textual markup.

For examples see the incoming properties section at bottom of http://schema.org/Language

In all these cases we might have an informal name for a language, or (for human languages) a code from https://tools.ietf.org/html/bcp47 or perhaps a Wikipedia link.

Contributor

danbri commented Apr 6, 2016

I agree there are issues with Microdata (or Microdata-as-RDF at least).

However you're missing the leading use case for the Language type: the many other contexts in which it is useful to mention and name languages, beyond annotation of sections of textual markup.

For examples see the incoming properties section at bottom of http://schema.org/Language

In all these cases we might have an informal name for a language, or (for human languages) a code from https://tools.ietf.org/html/bcp47 or perhaps a Wikipedia link.

@Aaranged

This comment has been minimized.

Show comment
Hide comment
@Aaranged

Aaranged Apr 6, 2016

Preliminary thoughts on this is that a new data type for Language be constructed, along the lines of the data type Date, with reference to a standard.

Which standard?

  • For the one class where Google specifies a standard it is IETF BCP 47
  • In Yandex documentation the language code specified is ISO-639 (while it's not stated explicitly in the documentation, the code type employed in their examples is ISO 639-1)
  • Bing still recommends use of <meta http-equiv="content-language"> for language declarations, where the value for content is "comprised of a 2-letter ISO 639 language code, followed by a dash and the appropriate ISO 3166 geography code"
  • JSON-LD specifies BCP 47 for @language values
  • In their specifications for implementing the hreflang tag both Google and Yandex stipulate ISO 639-1 for the language, and ISO 3166-1 alpha-2 for the optional region declaration (appended to the language code with a dash as separator)
  • W3C's HTML5 recommendation on language calls for a valid BCP 47 language tag
  • W3C's internationalization documentation on "Choosing a Language Tag" and "Language tags in HTML and XML" both cite BCP 47
  • schema.org/inLanguage, of course, expects the value to be Language or Text, with the directive "Please use one of the language codes from the IETF BCP 47 standard"

All things being equal it seems to me that BCP 47 should be the standard for schema.org language declarations.

All of this glosses over the data model issue that Language is already constituted as [a class in schema.org](Please use one of the language codes from the IETF BCP 47 standard), rather than a data type - and perhaps there's reason why the data model would want to support Text as an expected value rather than only pointing to BCP 47 (although this isn't done for Date, although dates, like languages, can also be represented in content as a string).

Regarding the comment by @mfhepp, aside from the response from @danbri I'd point to the potential limitations of language declarations in RDFa and JSON-LD in the context of possibly extending the classes on which inLanguage (or a similar property) can be used, as per #1065 started by @betehess. I.e. JSON-LD permits @language only to be declared for a string, not a URI.

Aaranged commented Apr 6, 2016

Preliminary thoughts on this is that a new data type for Language be constructed, along the lines of the data type Date, with reference to a standard.

Which standard?

  • For the one class where Google specifies a standard it is IETF BCP 47
  • In Yandex documentation the language code specified is ISO-639 (while it's not stated explicitly in the documentation, the code type employed in their examples is ISO 639-1)
  • Bing still recommends use of <meta http-equiv="content-language"> for language declarations, where the value for content is "comprised of a 2-letter ISO 639 language code, followed by a dash and the appropriate ISO 3166 geography code"
  • JSON-LD specifies BCP 47 for @language values
  • In their specifications for implementing the hreflang tag both Google and Yandex stipulate ISO 639-1 for the language, and ISO 3166-1 alpha-2 for the optional region declaration (appended to the language code with a dash as separator)
  • W3C's HTML5 recommendation on language calls for a valid BCP 47 language tag
  • W3C's internationalization documentation on "Choosing a Language Tag" and "Language tags in HTML and XML" both cite BCP 47
  • schema.org/inLanguage, of course, expects the value to be Language or Text, with the directive "Please use one of the language codes from the IETF BCP 47 standard"

All things being equal it seems to me that BCP 47 should be the standard for schema.org language declarations.

All of this glosses over the data model issue that Language is already constituted as [a class in schema.org](Please use one of the language codes from the IETF BCP 47 standard), rather than a data type - and perhaps there's reason why the data model would want to support Text as an expected value rather than only pointing to BCP 47 (although this isn't done for Date, although dates, like languages, can also be represented in content as a string).

Regarding the comment by @mfhepp, aside from the response from @danbri I'd point to the potential limitations of language declarations in RDFa and JSON-LD in the context of possibly extending the classes on which inLanguage (or a similar property) can be used, as per #1065 started by @betehess. I.e. JSON-LD permits @language only to be declared for a string, not a URI.

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Apr 6, 2016

Contributor

/cc @chaals

Contributor

danbri commented Apr 6, 2016

/cc @chaals

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Apr 6, 2016

Contributor

Another use case: #1084 - proposal to add 'preferred language' for describing persons.

Contributor

danbri commented Apr 6, 2016

Another use case: #1084 - proposal to add 'preferred language' for describing persons.

@danbri danbri self-assigned this Apr 6, 2016

@danbri danbri added this to the sdo-deimos release milestone Apr 6, 2016

@Aaranged

This comment has been minimized.

Show comment
Hide comment
@Aaranged

Aaranged Apr 6, 2016

Regarding programmingLanguage, which has as its expected type Language, I believe this is an awkward conflation of "language" in the conventional sense of "conventional human languages" with computer programming "language", "a system of signs for encoding and decoding information" (all this from the Wikipedia article on language).

They're clearly quite different things, which is why one won't find C++ in BCP 47. :) IMO the expected type for programmingLanguage should not be Language, but Text. (Absent, say, a enumerated value or code for programmingLanguage - but the former is unwieldy and has extensibility issues, and for the latter no standard, AFAIK, exists. This is not the case for human languages, which are unlikely to be extended and are supported by standards like BCP 47).

Aaranged commented Apr 6, 2016

Regarding programmingLanguage, which has as its expected type Language, I believe this is an awkward conflation of "language" in the conventional sense of "conventional human languages" with computer programming "language", "a system of signs for encoding and decoding information" (all this from the Wikipedia article on language).

They're clearly quite different things, which is why one won't find C++ in BCP 47. :) IMO the expected type for programmingLanguage should not be Language, but Text. (Absent, say, a enumerated value or code for programmingLanguage - but the former is unwieldy and has extensibility issues, and for the latter no standard, AFAIK, exists. This is not the case for human languages, which are unlikely to be extended and are supported by standards like BCP 47).

@betehess

This comment has been minimized.

Show comment
Hide comment
@betehess

betehess Apr 6, 2016

Contributor

Why don't we simply recommend RDFa or JSON-LD for use-cases that require language meta-data?

👍 We can do such a thing? 👍

All things being equal it seems to me that BCP 47 should be the standard for schema.org language declarations.

5135700

Contributor

betehess commented Apr 6, 2016

Why don't we simply recommend RDFa or JSON-LD for use-cases that require language meta-data?

👍 We can do such a thing? 👍

All things being equal it seems to me that BCP 47 should be the standard for schema.org language declarations.

5135700

@mfhepp

This comment has been minimized.

Show comment
Hide comment
@mfhepp

mfhepp Apr 7, 2016

Contributor

As far as I understand, BCP 47 includes ISO 639 language tags, which are the basis for tagging the language of RDF plain literals. So maybe we have to update the RDF spec to broaden the set of language tags to the full BCP 47 set.

Except for thar, there is no conflict.


martin hepp
www: http://www.heppnetz.de/
email: mhepp@computer.org

Am 07.04.2016 um 01:38 schrieb Alexandre Bertails notifications@github.com:

Why don't we simply recommend RDFa or JSON-LD for use-cases that require language meta-data?

We can do such a thing?

All things being equal it seems to me that BCP 47 should be the standard for schema.org language declarations.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

Contributor

mfhepp commented Apr 7, 2016

As far as I understand, BCP 47 includes ISO 639 language tags, which are the basis for tagging the language of RDF plain literals. So maybe we have to update the RDF spec to broaden the set of language tags to the full BCP 47 set.

Except for thar, there is no conflict.


martin hepp
www: http://www.heppnetz.de/
email: mhepp@computer.org

Am 07.04.2016 um 01:38 schrieb Alexandre Bertails notifications@github.com:

Why don't we simply recommend RDFa or JSON-LD for use-cases that require language meta-data?

We can do such a thing?

All things being equal it seems to me that BCP 47 should be the standard for schema.org language declarations.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@mfhepp

This comment has been minimized.

Show comment
Hide comment
@mfhepp

mfhepp Apr 7, 2016

Contributor

RDF (at least 1.1) already allows all BCP 47 tags:
https://www.w3.org/TR/rdf-plain-literal/


martin hepp
www: http://www.heppnetz.de/
email: mhepp@computer.org

Am 07.04.2016 um 01:38 schrieb Alexandre Bertails notifications@github.com:

Why don't we simply recommend RDFa or JSON-LD for use-cases that require language meta-data?

We can do such a thing?

All things being equal it seems to me that BCP 47 should be the standard for schema.org language declarations.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

Contributor

mfhepp commented Apr 7, 2016

RDF (at least 1.1) already allows all BCP 47 tags:
https://www.w3.org/TR/rdf-plain-literal/


martin hepp
www: http://www.heppnetz.de/
email: mhepp@computer.org

Am 07.04.2016 um 01:38 schrieb Alexandre Bertails notifications@github.com:

Why don't we simply recommend RDFa or JSON-LD for use-cases that require language meta-data?

We can do such a thing?

All things being equal it seems to me that BCP 47 should be the standard for schema.org language declarations.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@iBootstrap

This comment has been minimized.

Show comment
Hide comment
@iBootstrap

iBootstrap Apr 12, 2016

I came across this post while googling to find how to use 'programmingLanguage' ... let's say I have a div containing C++ code ... I'm still unsure of the best way ...

<div itemprop="programmingLanguage" content="c++">
   // some c++ code here
</div>

iBootstrap commented Apr 12, 2016

I came across this post while googling to find how to use 'programmingLanguage' ... let's say I have a div containing C++ code ... I'm still unsure of the best way ...

<div itemprop="programmingLanguage" content="c++">
   // some c++ code here
</div>
@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Apr 16, 2016

Contributor

Looking at this again, I suggest:

  • Creating a parallel ComputerLanguage type plus migrate programmingLanguage to point to it
  • Clarify in definition of Language type that it is now (primarily) for representing human language, but that it can be used for other language-like systems. Partially this is to ease the migration of computer languages over to ComputerLanguage, but also to avoid nitpicking about any borderline cases.
  • Endorse BCP 47 explicitly in the definition
  • Decide and document which property links from an instance of the Language type to a BCP 47 code; alternateName? generalize 'code'?
Contributor

danbri commented Apr 16, 2016

Looking at this again, I suggest:

  • Creating a parallel ComputerLanguage type plus migrate programmingLanguage to point to it
  • Clarify in definition of Language type that it is now (primarily) for representing human language, but that it can be used for other language-like systems. Partially this is to ease the migration of computer languages over to ComputerLanguage, but also to avoid nitpicking about any borderline cases.
  • Endorse BCP 47 explicitly in the definition
  • Decide and document which property links from an instance of the Language type to a BCP 47 code; alternateName? generalize 'code'?
@Dataliberate

This comment has been minimized.

Show comment
Hide comment
@Dataliberate

Dataliberate Apr 16, 2016

Contributor

+1 on all of above, including generalizing 'code'

Contributor

Dataliberate commented Apr 16, 2016

+1 on all of above, including generalizing 'code'

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Apr 18, 2016

Contributor

I looked into generalizing 'code' but it is a big job and tied up with our desire to engage with SKOS and external enumerations productively. I have done something simpler for now.

  • Created a new ComputerLanguage type (rather than ProgrammingLanguage, to avoid nitpicking on whether SQL, datalog, RIF etc are programming or not; also we don't want types and properties to have the same name when case is omitted, and 'programmingLanguage is already taken).
  • Updated ComputerLanguage and Language with cross-references. Language in particular notes its former use for computer languages.
  • Updated programmingLanguage to have both Text (as @Aaranged suggests) and ComputerLanguage as values.
  • Added mention of BCP 47 to Language, as suggested by @betehess and others. Endorsed the alternateName property as a place to put these values (this seems better than saying nothing at all).
Contributor

danbri commented Apr 18, 2016

I looked into generalizing 'code' but it is a big job and tied up with our desire to engage with SKOS and external enumerations productively. I have done something simpler for now.

  • Created a new ComputerLanguage type (rather than ProgrammingLanguage, to avoid nitpicking on whether SQL, datalog, RIF etc are programming or not; also we don't want types and properties to have the same name when case is omitted, and 'programmingLanguage is already taken).
  • Updated ComputerLanguage and Language with cross-references. Language in particular notes its former use for computer languages.
  • Updated programmingLanguage to have both Text (as @Aaranged suggests) and ComputerLanguage as values.
  • Added mention of BCP 47 to Language, as suggested by @betehess and others. Endorsed the alternateName property as a place to put these values (this seems better than saying nothing at all).

danbri added a commit that referenced this issue Apr 18, 2016

Language + ComputerLanguage are now independent, cross-referenced.
BCP 47 endorsed, via alternateName property.
Fix for #1079
@danbri

This comment has been minimized.

Show comment
Hide comment
@chaals

This comment has been minimized.

Show comment
Hide comment
@chaals

chaals Apr 19, 2016

Contributor

Include e.g. HTML (or some not-really-a-programming-language form of computer representation of stuff) in the example?

Contributor

chaals commented Apr 19, 2016

Include e.g. HTML (or some not-really-a-programming-language form of computer representation of stuff) in the example?

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Apr 19, 2016

Contributor

I'd avoid HTML at this stage as it could easily confuse people via too many meta layers. Plus the only property we have right now designed for ComputerLanguage is programming language. I'll work up something...

Contributor

danbri commented Apr 19, 2016

I'd avoid HTML at this stage as it could easily confuse people via too many meta layers. Plus the only property we have right now designed for ComputerLanguage is programming language. I'll work up something...

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Apr 19, 2016

Contributor

Investigated... It seems we have no example containing programmingLanguage currently. We should fix that.

For this release I'd like to focus on reflecting the BCP 47 consensus back into the human-language oriented definitions, unless someone here has time to put together a quick programmingLanguage / ComputerLanguage example. Looking more carefully at our definitions I have made some modest changes to clarify that both http://schema.org/inLanguage and http://schema.org/availableLanguage can directly take BCP 47 codes as Text values, in addition to referencing an item of type Language. That makes sense because the vast majority of uses there is nothing more to say about the language except that it is the value of some property. I have also updated one example to add alternateName="es" alongside the existing name="Spanish" property of a Language. This update also cross-references inLanguage and availableLanguage. We would do well to clarify their relationship further in future - there are subtle differences (content/performance vs ability to use) but this could be better explained.

Contributor

danbri commented Apr 19, 2016

Investigated... It seems we have no example containing programmingLanguage currently. We should fix that.

For this release I'd like to focus on reflecting the BCP 47 consensus back into the human-language oriented definitions, unless someone here has time to put together a quick programmingLanguage / ComputerLanguage example. Looking more carefully at our definitions I have made some modest changes to clarify that both http://schema.org/inLanguage and http://schema.org/availableLanguage can directly take BCP 47 codes as Text values, in addition to referencing an item of type Language. That makes sense because the vast majority of uses there is nothing more to say about the language except that it is the value of some property. I have also updated one example to add alternateName="es" alongside the existing name="Spanish" property of a Language. This update also cross-references inLanguage and availableLanguage. We would do well to clarify their relationship further in future - there are subtle differences (content/performance vs ability to use) but this could be better explained.

danbri added a commit that referenced this issue Apr 19, 2016

danbri added a commit that referenced this issue Apr 19, 2016

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Apr 28, 2016

Contributor

Closing per http://webschemas.org/docs/releases.html#g1079

There are other language-related conversations rolling along nearby, but I think we've addressed #1079 as originally raised.

Contributor

danbri commented Apr 28, 2016

Closing per http://webschemas.org/docs/releases.html#g1079

There are other language-related conversations rolling along nearby, but I think we've addressed #1079 as originally raised.

@88kbbq

This comment has been minimized.

Show comment
Hide comment
@88kbbq

88kbbq May 18, 2018

Every item should support the language property. Period.

88kbbq commented May 18, 2018

Every item should support the language property. Period.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment