Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

JSON-LD data model clarifications #174

Closed
cygri opened this Issue · 15 comments

4 participants

@cygri
Collaborator

Here is a list of questions that need to be answered in order to write an accurate mapping to RDF graphs/datasets. I would have expected to see all of them answered in Section 3.1, or in the definitions of terms linked from 3.1, because otherwise one can build quite broken JSON-LD implementations that conform to the spec:

  • What exactly is a value?
  • Are values datatyped? What exactly is a datatype? What datatypes are supported/available?
  • Are language tags supported? If so, are they case-normalized?
  • Can there be multiple edges with the same property between two nodes? If not, then when exactly are two values the same? Are "1.5" and 1.5 the same? Are "1"^^xsd:integer and "+1"^^xsd:integer the same? Are "1"^^xsd:integer and "1"^^xsd:decimal the same?
  • What kinds of things are allowed as labels? IRIs are mentioned, but what else? Arbitrary strings? Numbers? Other values? Blank nodes? Can edge labels have language tags? Can an edge be unlabelled?
  • Are named graphs not part of the JSON-LD data model? If not, then what does the @graph keyword do? Are named graphs supposed to map to the “named graphs” we have in RDF 1.1 Concepts and SPARQL? Can nodes be shared between multiple graphs in a document, or does that make them different nodes?
  • Are the edges of a node ordered?
  • What exactly does the terms “dereferenceable” mean, there's no link or reference? Is it the same definition as in AWWW?
  • What exactly do the terms “resource” and “denoted” mean? Same as in RDF 1.1 Concepts? Note, neither is normatively defined anywhere in RDF, and basing a normative data model definition on it seems questionable
  • Can there be free-floating nodes (IRIs, blank nodes, values) that have no edges?
  • Must IRIs in the data model be absolute or may they be relative?
  • Can there be multiple blank nodes with the same blank node identifier in a graph? What if we have multiple graphs in a document?

Not strictly relevant for the mapping to RDF:

  • Am I violating a SHOULD-level conformance statement if I use the URI of an HTML page?
  • Shouldn't there be a statement that every JSON-LD document serializes a JSON-LD data model instance? (It wouldn't hurt to normatively define the term “JSON-LD document”.)
  • I find the use of “property” weird. I'd expect to be able to say, “the value of the foo property of node bar is X”. According to the definition, the accurate way of saying that is: “the value of the edge with property foo of node bar is X”. According to the definition, nodes don't have properties, and edges may or may not have one property.
@gkellogg
Owner

My 2 cents:

What exactly is a value?

Well, from a JSON perpective, a value is the right side of a name/value pair in a node definition. With regards to JSON-LD, I believe it's essentially the same as RDF. A value is either a Literal, or an Object. Syntactically, a Value may be represented using any legitimate JSON construct:

A value which is an array specifies multiple un-ordered values, which are associated with the node identifier and property (basically, the object in a subject-predicate-object triple). The exception for this is when the property has a container model of @list, in which case the value is an ordered list of values. JSON-LD does not allow multiple lists per property, or lists containing lists.

For more primitive values, a value may be a node definition or node reference, which represents a reference to another node within the graph (IRI or Blank Node from RDF Concepts). Strings, native values, and value objects (JSON objects having a @value property) represent literal values, either typed or with language (untyped strings are effectively xsd:string, as in RDF Concepts).

Are values datatyped? What exactly is a datatype? What datatypes are supported/available?

With JSON-LD, it's useful to discuss literal values after performing expansion. In this case, values may have either an @type or @language and MUST have @value. Values without either a strings, which effectively are the same as the xsd:string type (just like Turtle's representation of strings relates to typed literals).

JSON-LD places no restrictions on datatypes used. When expanded, @type MUST expand to an absolute IRI. Anything else is discarded, and treated as if there were no type specified (just like properties in node definitions). I think a recent resolution will resolve @type against @vocab, if available, and against the document base otherwise; just like Turtle. JSON-LD places no restrictions on what IRI may be used, and attempts to avoid tightly binding to any semantics for treatment within JSON-LD. There may be some exceptions for xsd:boolean, xsd:integer and xsd:double; I'm not certain right now.

Are language tags supported? If so, are they case-normalized?

Language tags MUST correspond to BCP47. No normalization is performed. If we were to normalize, this could either be done in expansion, or the toRDF algorithm.

Can there be multiple edges with the same property between two nodes?

Two node definitions may reference the same node using a node reference. I would say that two non-object values (literal) are equivalent if they have the same lexical expression. I think the language in Value Compaction is vague. Step 3) says if the coercion target of the key matches the expression of the value. If the conersion if for "xsd:integer", does {"@value": 1} match this?

If not, then when exactly are two values the same?

I believe that expanded values are the same, when they have exactly the same lexical representation of @value, @type and @languge values (modulo the issue above).

Are "1.5" and 1.5 the same?

No. Are 1.5 and {"@value": "1.5", "@type": "xsd:double"} the same? I'm not exactly sure, but they result in the same RDF being generated, so in that sense, they are.

Are "1"^^xsd:integer and "+1"^^xsd:integer the same?

Not within JSON-LD. Isn't this D-Entailment?

Are "1"^^xsd:integer and "1"^^xsd:decimal the same?

No.

What kinds of things are allowed as labels? IRIs are mentioned, but what else? Arbitrary strings? Numbers? Other values? Blank nodes? Can edge labels have language tags? Can an edge be unlabelled?

JSON allows any string to be used as a name (label/property). Upon expansion, it MUST result in an absolute IRI (currently also BNode, but that should go away). Numbers are prohibited by JSON, as are boolean and null. Blank nodes are legitimate lexically; IMO, they should not survive expansion.

Lexically, JSON may use any string. JSON-LD does not use an "@" form for languages, so no, labels may not have language tags.

An edge cannot be unlabelled.

Are named graphs not part of the JSON-LD data model? If not, then what does the @graph keyword do? Are named graphs supposed to map to the “named graphs” we have in RDF 1.1 Concepts and SPARQL? Can nodes be shared between multiple graphs in a document, or does that make them different nodes?

They are part of the JSON-LD data model. The @graph keyword effectively allows node definitions to either be defined to be in the default graph, or a named graph, depending on if the node definition containing the @graph keyword has an @id key. They are intended to be equivalent.

The Flattening API method most accurately describes how these are treated. Definitions can be merged, but otherwise are restricted to the graph in which they are defined.

From a different perspective, a node definition describes a node in a graph. Nodes being shared between graphs in JSON-LD is equivalent to asking if triples can be shared between named graphs according to RDF Concepts.

Are the edges of a node ordered?

No.

What exactly does the terms “dereferenceable” mean, there's no link or reference? Is it the same definition as in AWWW?

I believe so.

What exactly do the terms “resource” and “denoted” mean? Same as in RDF 1.1 Concepts? Note, neither is normatively defined anywhere in RDF, and basing a normative data model definition on it seems questionable

Please suggest better wording. In my usage, resource is intended to represent an abstract concept, which I believe comes from RDF Concepts. "Denote" is a word that scares me :P, as it seems to be used in heated debates within the RDF WG. However, I understand it to mean that IRIs denote resources. The construct {"@value": 1} denotes the number one.

Can there be free-floating nodes (IRIs, blank nodes, values) that have no edges?

Well, it's lexicaly possible to have a document consistent of just a node reference. This does not result in any triple when RDF is generated, but I believe it survies expansion and compaction.

Must IRIs in the data model be absolute or may they be relative?

They may be relative, just as they may be relative in Turtle or RDFa. On expansion, relative IRIs are resolved to the document location.

Can there be multiple blank nodes with the same blank node identifier in a graph?

Yes.

What if we have multiple graphs in a document?

Node identifers are equivalent within a document. For example, when flattening, blank nodes are renamed consistently, so if two graphs use ":a", they may both be re-named to ":t0". When merging graphs, they would be values in the same node definition.

If necessary, I would support updating this algorithm to use different node identifiers in different graphs. But, IMO, a node identifier is has the lexical scope of the document, not the graph. In any case, we should match TriG's semantics.

Am I violating a SHOULD-level conformance statement if I use the URI of an HTML page?

Certainly not. To say that an IRI should resolve to a description of it's referent (I think this is Kingley's definition) does not restrict that that description be in JSON-LD, but could be in an alternative RDF format.

Shouldn't there be a statement that every JSON-LD document serializes a JSON-LD data model instance? (It wouldn't hurt to normatively define the term “JSON-LD document”.)

+1

I find the use of “property” weird. I'd expect to be able to say, “the value of the foo property of node bar is X”. According to the definition, the accurate way of saying that is: “the value of the edge with property foo of node bar is X”. According to the definition, nodes don't have properties, and edges may or may not have one property.

Well, this would be confusing. I would say that a "property" is how we refer to "names" in JSON, in the sense of a name/value pair. The property of a node describes a predicate relationship between the node's subject and related values. In RDF Concepts you say the following:

The predicate itself is an IRI and denotes a binary relation, also known as a property.

I think these are consistent statements.

@msporny
Owner

Relieved that @gkellogg's view on all of these questions align with mine without us having to coordinate on the responses. I think that's a good sign. There are bits of the above that I could nitpick, but I'm happy to say that everything @gkellogg said above wouldn't receive a -1 from me. Some may be a +0 or +0.5, but I think most everything above is consistent with what the JSON-LD CG has been discussing over the past two years.

Additionally, we should clarify all of these points in the spec. We should not do all of it in the introductory data model section. We should leave some of the more esoteric stuff for an appendix.

@lanthaler
Owner

Same here. I agree with mostly everything that Gregg said. Just a few clarifications:

What exactly is a value?

JSON-LD does not allow multiple lists per property, or lists containing lists

It does allow multiple lists per property, but not lists of lists. So a property which has a "@container": "list" cannot contain multiple lists but just one as otherwise it would be a list of lists.

What exactly does the terms “dereferenceable” mean, there's no link or reference? Is it the same definition as in AWWW?

Yes.

Agents may use a URI to access the referenced resource; this is called dereferencing the URI
http://www.w3.org/TR/webarch/#dereference-uri

I (we?) assumed that's such a fundamental concept on the Web that it doesn't need a reference. We should probably change that.

What exactly do the terms “resource” and “denoted” mean? Same as in RDF 1.1 Concepts? Note, neither is normatively defined anywhere in RDF, and basing a normative data model definition on it seems questionable

Same for resource. It's a fundamental concept of the Web's architecture:

By design a URI identifies one resource. We do not limit the scope of what might be a resource. The term "resource" is used in a general sense for whatever might be identified by a URI
http://www.w3.org/TR/webarch/#id-resources

As far as I understand it, the terms "resource" and "denote" are also normatively defined in RDF-Concepts: http://www.w3.org/TR/rdf11-concepts/#resources-and-statements

Can there be free-floating nodes (IRIs, blank nodes, values) that have no edges?

Yes, they survive expansion but not conversion to RDF.

Must IRIs in the data model be absolute or may they be relative?

In the data model they must be absolute, in the data format they can be relative. If I'm not wrong, valid IRIs are always absolute, otherwise they are IRI references/relative IRIs.

Am I violating a SHOULD-level conformance statement if I use the URI of an HTML page?

That boils down as to whether HTML is a Linked Data document or not. There have been enough discussions on the RDF WG mailing list the last days so that I think we should change that statement from a SHOULD to a RECOMMEND.

@cygri
Collaborator

@gkellogg, @lanthaler, thanks, this gives me something to work with.

@gkellogg
Owner

On more thing, the JSON-LD data model described in 3.1 is graph-centric, but JSON-LD includes named graphs. We should probably have a bit that talks about the model for named graphs too; otherwise, you need to interpret the flatting algorithm in the API document to figure it out.

@cygri
Collaborator

@gkellog, what is the model for named graphs then?

It is a default JSON-LD graph plus zero or more named JSON-LD graphs, where the names can be IRIs or Blank Nodes?

What if multiple named graphs share the same name? Does everything end up in a single named graph, or does this yield multiple named graphs with the same name, or is it forbidden?

@lanthaler
Owner
@gkellogg
Owner

Right. IRIs or BNodes as labels of the graphs. node definitions in graphs with the same label are merged. There should be no surprises WRT TriG.

We need a short description to add to 3.1 that states this, so you have a handle to describe the relationship with RDF datasets. As I mentioned elsewhere, the notion of BNode labels within graphs is a serialization issue. I think we do this the same as TriG, but I can't find a specific resolution. In N-Quads, it would be challenging to have the scope be anything other than the document.

As with TriG, and the resolution to RDF ISSUE-28, there is no nesting of graphs semantically, syntactically, they can be nested, but this is a coding question only, and when flattened, they all appear at the same level.

@cygri
Collaborator

Another question: Do null values play any role in the JSON-LD data model (that is, can they end up in values, data types, language tags, property names, graph names, blank node labels, IRIs, etc.)? Or is the data model guaranteed to be null-free?

@lanthaler
Owner
@cygri
Collaborator

Thanks Markus! A good decision IMO.

@msporny
Owner

Differences between the JSON-LD data model and the RDF data model:

  • Graph names can be bnodes
  • Property names can be bnodes
  • Language containers
@cygri
Collaborator

Work in progress on clarifying the data model and working out the differences: http://www.w3.org/2011/rdf-wg/wiki/JSON-LD_Data_Model

@lanthaler
Owner

@cygri, does this answer your questions? Can we close this issue?

@cygri
Collaborator

Yes, I'm closing it. Thanks again all.

@cygri cygri closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.