JSON-LD API terms don't match RDF Concepts terminology #131

Closed
cygri opened this Issue May 24, 2012 · 5 comments

Projects

None yet

3 participants

cygri commented May 24, 2012

The following terms in the WebIDL definitions in the JSON-LD API don't match the normative terminology in RDF Concepts:

  • Quad — there's no such term in RDF Concepts
  • Quad.property — should be Quad.predicate
  • Quad.name — should be Quad.graphName (terminology might still change), also for clarity (it's not a name of the quad)
  • Node.interfaceName — this is a weird term as it talks about the IDL definition itself rather than about the defined domain. termType would be slightly better aligned with RDF Concepts, or perhaps nodeType in analogy to the DOM.
  • Node.nominalValue — there's no “nominal value” in RDF Concepts. Nothing in RDF Concepts quite matches. Why “nominal” and not just “value”?
  • Literal.datatype — in RDF 1.1, all literals have a datatype. Also (not an RDF Concepts problem but usability): is it really wise to make this of type IRI? Isn't this better just a string?

I'm also wondering whether the subtypes of Node should have an additional property that reflects the value of nominalValue but using the proper RDF Concepts term: BlankNode.identifier and Literal.lexicalForm. (An IRI is defined just a string that has a particular syntax, so IRI.value seems already sufficiently appropriate.)

I note that Graph and Dataset interfaces are absent.

The IRI interface has some poor phrasing: “A node identified by an IRI”; “The IRI identifier of the node”. IRIs don't identify nodes; they identify resources. These should perhaps be “A node representing an IRI” and “The represented IRI as a string”.

The text about blank nodes has some issues but I'm not sure what it should say before studying the rest of the spec a bit closer.

Literals don't have “lexical representations” but “lexical forms”.

The language thingy in a literal is called a “language tag”, not a “language string”. The API also refers to it as a “text string token”; this sounds weird, why not just “string token”?

Owner
msporny commented May 25, 2012

From Pat Hayes:

Sorry if this is rather late, but I have only now read through these documents (not being a JSON maven, I had thought to leave this task to others.) Unfortunately there are some serious issues with the use of standard terminology, which I would urge you to consider carefully before publication, as they could have a seriously deleterious affect on the understanding of some readers and will likely generate a great deal of needless confusion and misunderstanding.

The most serious is a repeated confusion of the syntactic distinctions such as subject, object and property with semantic notions such as resource. For example, in [API] 3.2, we read

"blank node
a blank node is a resource which is neither an IRI nor a literal. Blank nodes may be named or unnamed and often take on the role of a variable that may represent either an IRI or a literal."

This is all quite wrong. A blank node is not itself a resource. It is a NODE which is neither IRI nor literal, and it definitely is NOT itself named. It refers to a resource which may be named or unnamed, but that is quite different from saying it IS one. Again, it may refer to the same thing as an IRI or literal refers to, but it itself does not represent an IRI or a literal.

Again, in 3.13.1:
" ... a key concept is that of a resource. Resources may be of three basic types: IRI, representing IRIs for describing externally named entities, BlankNode, resources for which an external name does not exist, or is not known, and Literal, which describe terminal entities such as strings, dates and other representations having a lexical representation possibly including an explicit language or datatype.

Data described with JSON-LD may be considered to be the representation of a graph made up of subject and object resources related via a property resource."

This is all muddled. Graphs are not made up from resources: they are made up from nodes and arcs. Resources do not come in three types: you are here talking about the nodes and arcs of the RDF graph, not the resources that these various graph components refer to. Indeed, it is quite possible for an IRI, an blank node and a literal to all three refer to the same resource.


[LD] does not have such egregious misuses of terminology, but it does have some confusing remarks. The Introduction says:

"A thing in this data network is typically identified using an IRI (Internationalized Resource Identifier), which is typically dereference-able, and thus may be used to find more information about the thing. The IRI allows a software program to start at one thing and follow links to other things in order to learn more about all of the things described on the Web."

Does "thing" here refer to the items which the data is about, or to the data items themselves? Neither interpretation makes sense. Consider for example http://dbpedia.org/resource/Honda_Civic, which refers to ("identifies") a car model, but dereferences to a Web page about that car model. If "thing" refers to the Web page, then its not what the page is about (at least, that is what the page itself says), but if it refers to the car model, then you can't get at that thing by dereferencing a web page.

Section 3.1, line 11:
"A value is an object with a label that is not an IRI"

This is not technically wrong, but I would urge you in the strongest possible terms to reconsider using the term "value" for a graph node of any kind. This is horribly misleading, and will likely breed massive confusion. I think (your editors seem to be going to extraordinary lengths to pretend that you are not using RDF) that you mean here to refer to literal nodes, perhaps also allowing other possibilities. If so, then the use of "literal" (or a similar word) rather than "value" would be far clearer. Or at least call it something like "value node". (The problem here is that any node may be said to have a "value", which is what the node is being interpreted to mean or refer to. So calling one type of node a "value" makes this close-to-universal language usage completely meaningless or extremely confusing.)

Finally, a pet peeve of mine regarding blank nodes.

[LD] 3.1 says

" Unlabeled nodes are not considered Linked Data."

Says who? I didnt know there was a Ministry (Church?) of Linked Data.

But 3.1.2 says:
"The example above does not use the @id keyword to set the subject of the node being described above. This type of node is called an unlabeled node and is considered to be a weaker form of Linked Data."

So, which is right? Is LD with blank nodes just weakly linked, or is it totally persona non grata, and excluded from consideration by definitional fiat?

As you can probably tell, I find this aversion to the (useful and harmless) notion of blank nodes rather silly, as well as simply false: a great deal of actual linked data does have blank nodes in it, and this is likely to continue and even increase as time goes on. It is telling that even you, in this short document, weren't able to conveniently avoid them.

But whatever your beliefs on this topic, my point is that the document ought to be consistent about it.

@gkellogg gkellogg added a commit that referenced this issue May 31, 2012
@gkellogg gkellogg Term updates to correspond with RDF Concepts, and re-wording of the R…
…DF conversion algorithms to not use IDL definitions. This relates to issue #131.
a28f820
Owner
msporny commented Jun 11, 2012
The following terms in the WebIDL definitions in the JSON-LD API don't match the normative terminology in RDF Concepts: Quad — there's no such term in RDF Concepts

Yes, but there is such a concept in the JSON-LD API. There was discussion on calling this a "triple" and the extra piece of information being "provenance" information. However, calling it a triple is a bit mis-leading since there aren't just three major pieces of information there, but four. Please suggest an alternate name - until then, we don't have anything better to use and 'Quad' will remain.

Quad.property — should be Quad.predicate

While 'predicate' is more accurate, it garners confused looks when talking about it with Web developers. "property" is easier for developers to grasp and since they're going to have to type it out in their code, it will be easier for non-RDF-ers to understand. We should keep property until there is a more compelling reason to change it... one could argue that RDF Concepts should be the document that should change if it wants to speak more to non-experts.

Quad.name — should be Quad.graphName (terminology might still change), also for clarity (it's not a name of the quad)

Agreed, changed in a28f820.

Node.interfaceName — this is a weird term as it talks about the IDL definition itself rather than about the defined domain. termType would be slightly better aligned with RDF Concepts, or perhaps nodeType in analogy to the DOM.

This has been removed temporarily in commit a28f820 and may be re-introduced in the future if developers complain that it is too difficult to determine the type of a particular Node.

Node.nominalValue — there's no “nominal value” in RDF Concepts. Nothing in RDF Concepts quite matches. Why “nominal” and not just “value”?

Agreed, changed in commit a28f820.

Literal.datatype — in RDF 1.1, all literals have a datatype. Also (not an RDF Concepts problem but usability): is it really wise to make this of type IRI? Isn't this better just a string?

The change has been made in commit a28f820, but it remains an IRI as that is more correct. We may change this in the future if implementers complain, but for now there isn't consensus on whether it should be an IRI or just a plain string that is interpreted as an IRI. I tend to lean toward it being a plain string that is interpreted as an IRI. @gkellogg, is there a strong argument for keeping it an IRI (other than design purity)?

I'm also wondering whether the subtypes of Node should have an additional property that reflects the value of nominalValue but using the proper RDF Concepts term: BlankNode.identifier and Literal.lexicalForm. (An IRI is defined just a string that has a particular syntax, so IRI.value seems already sufficiently appropriate.)

BlankNode.identifier has been added. Literal.lexicalForm has been kept as Literal.value. IRI.value remains as well. These changes were made in commit a28f820.

I note that Graph and Dataset interfaces are absent.

Yes, they are not absolutely required to implement the interface, so they were left out. We tried to pull in as little as possible from the RDF API in order to make the JSON-LD API work as pulling in more would make the spec (needlessly) more complex.

The IRI interface has some poor phrasing: “A node identified by an IRI”; “The IRI identifier of the node”. IRIs don't identify nodes; they identify resources. These should perhaps be “A node representing an IRI” and “The represented IRI as a string”.

Fixed by using your wording in commit a28f820.

The text about blank nodes has some issues but I'm not sure what it should say before studying the rest of the spec a bit closer.

An attempt was made to make this description a little better in commit 470a0ef.

Literals don't have “lexical representations” but “lexical forms”.

Fixed in commit af2d447.

The language thingy in a literal is called a “language tag”, not a “language string”. The API also refers to it as a “text string token”; this sounds weird, why not just “string token”?

Fixed in commit a28f820.

Owner
msporny commented Jun 11, 2012
Sorry if this is rather late, but I have only now read through these documents (not being a JSON maven, I had thought to leave this task to others.) Unfortunately there are some serious issues with the use of standard terminology, which I would urge you to consider carefully before publication, as they could have a seriously deleterious affect on the understanding of some readers and will likely generate a great deal of needless confusion and misunderstanding.

We have tried our best to align the document with the current RDF Concepts document, where it made sense to do so. You can see a number of these changes in the previous comment addressing Richard's concerns.

The most serious is a repeated confusion of the syntactic distinctions such as subject, object and property with semantic notions such as resource. For example, in [API] 3.2, we read

"blank node
a blank node is a resource which is neither an IRI nor a literal. Blank nodes may be named or unnamed and often take on the role of a variable that may represent either an IRI or a literal."

This is all quite wrong. A blank node is not itself a resource. It is a NODE which is neither IRI nor literal, and it definitely is NOT itself named. It refers to a resource which may be named or unnamed, but that is quite different from saying it IS one. Again, it may refer to the same thing as an IRI or literal refers to, but it itself does not represent an IRI or a literal.

I tried to fix this oversight in commit 05dce46.

Again, in 3.13.1: " ... a key concept is that of a resource. Resources may be of three basic types: IRI, representing IRIs for describing externally named entities, BlankNode, resources for which an external name does not exist, or is not known, and Literal, which describe terminal entities such as strings, dates and other representations having a lexical representation possibly including an explicit language or datatype.

Data described with JSON-LD may be considered to be the representation of a graph made up of subject and object resources related via a property resource."

This is all muddled. Graphs are not made up from resources: they are made up from nodes and arcs. Resources do not come in three types: you are here talking about the nodes and arcs of the RDF graph, not the resources that these various graph components refer to. Indeed, it is quite possible for an IRI, an blank node and a literal to all three refer to the same resource.

Agreed. I have tried to make the explanation a bit more clear in commit c0ca12c.

[LD] does not have such egregious misuses of terminology, but it does have some confusing remarks. The Introduction says:

"A thing in this data network is typically identified using an IRI (Internationalized Resource Identifier), which is typically dereference-able, and thus may be used to find more information about the thing. The IRI allows a software program to start at one thing and follow links to other things in order to learn more about all of the things described on the Web."

Does "thing" here refer to the items which the data is about, or to the data items themselves? Neither interpretation makes sense. Consider for example http://dbpedia.org/resource/Honda_Civic, which refers to ("identifies") a car model, but dereferences to a Web page about that car model. If "thing" refers to the Web page, then its not what the page is about (at least, that is what the page itself says), but if it refers to the car model, then you can't get at that thing by dereferencing a web page.

I have attempted to fix this in commit 2438a05.

Section 3.1, line 11: "A value is an object with a label that is not an IRI"

This is not technically wrong, but I would urge you in the strongest possible terms to reconsider using the term "value" for a graph node of any kind. This is horribly misleading, and will likely breed massive confusion. I think (your editors seem to be going to extraordinary lengths to pretend that you are not using RDF) that you mean here to refer to literal nodes, perhaps also allowing other possibilities. If so, then the use of "literal" (or a similar word) rather than "value" would be far clearer. Or at least call it something like "value node". (The problem here is that any node may be said to have a "value", which is what the node is being interpreted to mean or refer to. So calling one type of node a "value" makes this close-to-universal language usage completely meaningless or extremely confusing.)

I am sympathetic to your concerns - we argued for months over that specific wording and found consensus with that sentence. I have tried to change the sentence in a way that is aligned with consensus, but also in a way that doesn't trigger any of the concerns that you raise above. Take a look at commit e8cf04c and let me know if you agree with the new text.

Finally, a pet peeve of mine regarding blank nodes.

[LD] 3.1 says

" Unlabeled nodes are not considered Linked Data."

Says who? I didnt know there was a Ministry (Church?) of Linked Data.

Why yes - Our Glorious Holy Ministry of Linked Data is quite clear on this... would you like a pamphlet? Would you like to be saved by the Truth that is Linked Data? We draw clear lines between the "saved" and "eternal damnation" - none of this fuzzy logic atrocity that is RDF. :P

On a slightly more serious note (but not entirely serious) - it's a big discussion that Kingsley instigated - it's his fault. :P

We shouldn't be making value judgements like that in the spec... detracts from the message. The offending text was removed in commit 9518748.

But 3.1.2 says: "The example above does not use the @id keyword to set the subject of the node being described above. This type of node is called an unlabeled node and is considered to be a weaker form of Linked Data."

So, which is right? Is LD with blank nodes just weakly linked, or is it totally persona non grata, and excluded from consideration by definitional fiat?

As you can probably tell, I find this aversion to the (useful and harmless) notion of blank nodes rather silly, as well as simply false: a great deal of actual linked data does have blank nodes in it, and this is likely to continue and even increase as time goes on. It is telling that even you, in this short document, weren't able to conveniently avoid them.

But whatever your beliefs on this topic, my point is that the document ought to be consistent about it.

This statement has been removed from the specification in commit 9518748.

For the record, all of the editors and authors of the specification fought to make sure that blank nodes would be supported by JSON-LD and find blank nodes a necessity when working with various forms of Linked Data. The big proponents against blank nodes came from the community and, I believe, have not really participated in the community since their original pushes to not acknowledge blank nodes as "blessed Linked Data".

That said, blank nodes tend to confuse Web Developers, so we tried to push them to the background as much as possible. Most Web developers use blank nodes in JSON without ever knowing that they're using them and seem to have no problem with it. We hope to ensure that ignorance continues to be bliss for most of these developers.

Owner
msporny commented Jun 11, 2012

@cygri I believe that we have addressed all of your issues, please confirm that we have so that we may close this bug.

Member

I close this issue now. No response for two months from from original commenters and all concerns have been discussed and addressed. Feel free to reopen.

@lanthaler lanthaler closed this Aug 20, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment