Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

JSON-LD data model clarifications #174

Closed
cygri opened this Issue Oct 23, 2012 · 15 comments

Comments

Projects
None yet
4 participants

cygri commented Oct 23, 2012

Here is a list of questions that need to be answered in order to write an accurate mapping to RDF graphs/datasets. I would have expected to see all of them answered in Section 3.1, or in the definitions of terms linked from 3.1, because otherwise one can build quite broken JSON-LD implementations that conform to the spec:

  • What exactly is a value?
  • Are values datatyped? What exactly is a datatype? What datatypes are supported/available?
  • Are language tags supported? If so, are they case-normalized?
  • Can there be multiple edges with the same property between two nodes? If not, then when exactly are two values the same? Are "1.5" and 1.5 the same? Are "1"^^xsd:integer and "+1"^^xsd:integer the same? Are "1"^^xsd:integer and "1"^^xsd:decimal the same?
  • What kinds of things are allowed as labels? IRIs are mentioned, but what else? Arbitrary strings? Numbers? Other values? Blank nodes? Can edge labels have language tags? Can an edge be unlabelled?
  • Are named graphs not part of the JSON-LD data model? If not, then what does the @graph keyword do? Are named graphs supposed to map to the “named graphs” we have in RDF 1.1 Concepts and SPARQL? Can nodes be shared between multiple graphs in a document, or does that make them different nodes?
  • Are the edges of a node ordered?
  • What exactly does the terms “dereferenceable” mean, there's no link or reference? Is it the same definition as in AWWW?
  • What exactly do the terms “resource” and “denoted” mean? Same as in RDF 1.1 Concepts? Note, neither is normatively defined anywhere in RDF, and basing a normative data model definition on it seems questionable
  • Can there be free-floating nodes (IRIs, blank nodes, values) that have no edges?
  • Must IRIs in the data model be absolute or may they be relative?
  • Can there be multiple blank nodes with the same blank node identifier in a graph? What if we have multiple graphs in a document?

Not strictly relevant for the mapping to RDF:

  • Am I violating a SHOULD-level conformance statement if I use the URI of an HTML page?
  • Shouldn't there be a statement that every JSON-LD document serializes a JSON-LD data model instance? (It wouldn't hurt to normatively define the term “JSON-LD document”.)
  • I find the use of “property” weird. I'd expect to be able to say, “the value of the foo property of node bar is X”. According to the definition, the accurate way of saying that is: “the value of the edge with property foo of node bar is X”. According to the definition, nodes don't have properties, and edges may or may not have one property.
Owner

gkellogg commented Oct 23, 2012

My 2 cents:

What exactly is a value?

Well, from a JSON perpective, a value is the right side of a name/value pair in a node definition. With regards to JSON-LD, I believe it's essentially the same as RDF. A value is either a Literal, or an Object. Syntactically, a Value may be represented using any legitimate JSON construct:

A value which is an array specifies multiple un-ordered values, which are associated with the node identifier and property (basically, the object in a subject-predicate-object triple). The exception for this is when the property has a container model of @list, in which case the value is an ordered list of values. JSON-LD does not allow multiple lists per property, or lists containing lists.

For more primitive values, a value may be a node definition or node reference, which represents a reference to another node within the graph (IRI or Blank Node from RDF Concepts). Strings, native values, and value objects (JSON objects having a @value property) represent literal values, either typed or with language (untyped strings are effectively xsd:string, as in RDF Concepts).

Are values datatyped? What exactly is a datatype? What datatypes are supported/available?

With JSON-LD, it's useful to discuss literal values after performing expansion. In this case, values may have either an @type or @language and MUST have @value. Values without either a strings, which effectively are the same as the xsd:string type (just like Turtle's representation of strings relates to typed literals).

JSON-LD places no restrictions on datatypes used. When expanded, @type MUST expand to an absolute IRI. Anything else is discarded, and treated as if there were no type specified (just like properties in node definitions). I think a recent resolution will resolve @type against @vocab, if available, and against the document base otherwise; just like Turtle. JSON-LD places no restrictions on what IRI may be used, and attempts to avoid tightly binding to any semantics for treatment within JSON-LD. There may be some exceptions for xsd:boolean, xsd:integer and xsd:double; I'm not certain right now.

Are language tags supported? If so, are they case-normalized?

Language tags MUST correspond to BCP47. No normalization is performed. If we were to normalize, this could either be done in expansion, or the toRDF algorithm.

Can there be multiple edges with the same property between two nodes?

Two node definitions may reference the same node using a node reference. I would say that two non-object values (literal) are equivalent if they have the same lexical expression. I think the language in Value Compaction is vague. Step 3) says if the coercion target of the key matches the expression of the value. If the conersion if for "xsd:integer", does {"@value": 1} match this?

If not, then when exactly are two values the same?

I believe that expanded values are the same, when they have exactly the same lexical representation of @value, @type and @Languge values (modulo the issue above).

Are "1.5" and 1.5 the same?

No. Are 1.5 and {"@value": "1.5", "@type": "xsd:double"} the same? I'm not exactly sure, but they result in the same RDF being generated, so in that sense, they are.

Are "1"^^xsd:integer and "+1"^^xsd:integer the same?

Not within JSON-LD. Isn't this D-Entailment?

Are "1"^^xsd:integer and "1"^^xsd:decimal the same?

No.

What kinds of things are allowed as labels? IRIs are mentioned, but what else? Arbitrary strings? Numbers? Other values? Blank nodes? Can edge labels have language tags? Can an edge be unlabelled?

JSON allows any string to be used as a name (label/property). Upon expansion, it MUST result in an absolute IRI (currently also BNode, but that should go away). Numbers are prohibited by JSON, as are boolean and null. Blank nodes are legitimate lexically; IMO, they should not survive expansion.

Lexically, JSON may use any string. JSON-LD does not use an "@" form for languages, so no, labels may not have language tags.

An edge cannot be unlabelled.

Are named graphs not part of the JSON-LD data model? If not, then what does the @graph keyword do? Are named graphs supposed to map to the “named graphs” we have in RDF 1.1 Concepts and SPARQL? Can nodes be shared between multiple graphs in a document, or does that make them different nodes?

They are part of the JSON-LD data model. The @graph keyword effectively allows node definitions to either be defined to be in the default graph, or a named graph, depending on if the node definition containing the @graph keyword has an @id key. They are intended to be equivalent.

The Flattening API method most accurately describes how these are treated. Definitions can be merged, but otherwise are restricted to the graph in which they are defined.

From a different perspective, a node definition describes a node in a graph. Nodes being shared between graphs in JSON-LD is equivalent to asking if triples can be shared between named graphs according to RDF Concepts.

Are the edges of a node ordered?

No.

What exactly does the terms “dereferenceable” mean, there's no link or reference? Is it the same definition as in AWWW?

I believe so.

What exactly do the terms “resource” and “denoted” mean? Same as in RDF 1.1 Concepts? Note, neither is normatively defined anywhere in RDF, and basing a normative data model definition on it seems questionable

Please suggest better wording. In my usage, resource is intended to represent an abstract concept, which I believe comes from RDF Concepts. "Denote" is a word that scares me :P, as it seems to be used in heated debates within the RDF WG. However, I understand it to mean that IRIs denote resources. The construct {"@value": 1} denotes the number one.

Can there be free-floating nodes (IRIs, blank nodes, values) that have no edges?

Well, it's lexicaly possible to have a document consistent of just a node reference. This does not result in any triple when RDF is generated, but I believe it survies expansion and compaction.

Must IRIs in the data model be absolute or may they be relative?

They may be relative, just as they may be relative in Turtle or RDFa. On expansion, relative IRIs are resolved to the document location.

Can there be multiple blank nodes with the same blank node identifier in a graph?

Yes.

What if we have multiple graphs in a document?

Node identifers are equivalent within a document. For example, when flattening, blank nodes are renamed consistently, so if two graphs use ":a", they may both be re-named to ":t0". When merging graphs, they would be values in the same node definition.

If necessary, I would support updating this algorithm to use different node identifiers in different graphs. But, IMO, a node identifier is has the lexical scope of the document, not the graph. In any case, we should match TriG's semantics.

Am I violating a SHOULD-level conformance statement if I use the URI of an HTML page?

Certainly not. To say that an IRI should resolve to a description of it's referent (I think this is Kingley's definition) does not restrict that that description be in JSON-LD, but could be in an alternative RDF format.

Shouldn't there be a statement that every JSON-LD document serializes a JSON-LD data model instance? (It wouldn't hurt to normatively define the term “JSON-LD document”.)

+1

I find the use of “property” weird. I'd expect to be able to say, “the value of the foo property of node bar is X”. According to the definition, the accurate way of saying that is: “the value of the edge with property foo of node bar is X”. According to the definition, nodes don't have properties, and edges may or may not have one property.

Well, this would be confusing. I would say that a "property" is how we refer to "names" in JSON, in the sense of a name/value pair. The property of a node describes a predicate relationship between the node's subject and related values. In RDF Concepts you say the following:

The predicate itself is an IRI and denotes a binary relation, also known as a property.

I think these are consistent statements.

Owner

msporny commented Oct 24, 2012

Relieved that @gkellogg's view on all of these questions align with mine without us having to coordinate on the responses. I think that's a good sign. There are bits of the above that I could nitpick, but I'm happy to say that everything @gkellogg said above wouldn't receive a -1 from me. Some may be a +0 or +0.5, but I think most everything above is consistent with what the JSON-LD CG has been discussing over the past two years.

Additionally, we should clarify all of these points in the spec. We should not do all of it in the introductory data model section. We should leave some of the more esoteric stuff for an appendix.

Member

lanthaler commented Oct 24, 2012

Same here. I agree with mostly everything that Gregg said. Just a few clarifications:

What exactly is a value?

JSON-LD does not allow multiple lists per property, or lists containing lists

It does allow multiple lists per property, but not lists of lists. So a property which has a "@container": "list" cannot contain multiple lists but just one as otherwise it would be a list of lists.

What exactly does the terms “dereferenceable” mean, there's no link or reference? Is it the same definition as in AWWW?

Yes.

Agents may use a URI to access the referenced resource; this is called dereferencing the URI
http://www.w3.org/TR/webarch/#dereference-uri

I (we?) assumed that's such a fundamental concept on the Web that it doesn't need a reference. We should probably change that.

What exactly do the terms “resource” and “denoted” mean? Same as in RDF 1.1 Concepts? Note, neither is normatively defined anywhere in RDF, and basing a normative data model definition on it seems questionable

Same for resource. It's a fundamental concept of the Web's architecture:

By design a URI identifies one resource. We do not limit the scope of what might be a resource. The term "resource" is used in a general sense for whatever might be identified by a URI
http://www.w3.org/TR/webarch/#id-resources

As far as I understand it, the terms "resource" and "denote" are also normatively defined in RDF-Concepts: http://www.w3.org/TR/rdf11-concepts/#resources-and-statements

Can there be free-floating nodes (IRIs, blank nodes, values) that have no edges?

Yes, they survive expansion but not conversion to RDF.

Must IRIs in the data model be absolute or may they be relative?

In the data model they must be absolute, in the data format they can be relative. If I'm not wrong, valid IRIs are always absolute, otherwise they are IRI references/relative IRIs.

Am I violating a SHOULD-level conformance statement if I use the URI of an HTML page?

That boils down as to whether HTML is a Linked Data document or not. There have been enough discussions on the RDF WG mailing list the last days so that I think we should change that statement from a SHOULD to a RECOMMEND.

cygri commented Oct 24, 2012

@gkellogg, @lanthaler, thanks, this gives me something to work with.

Owner

gkellogg commented Oct 24, 2012

On more thing, the JSON-LD data model described in 3.1 is graph-centric, but JSON-LD includes named graphs. We should probably have a bit that talks about the model for named graphs too; otherwise, you need to interpret the flatting algorithm in the API document to figure it out.

cygri commented Oct 24, 2012

@gkellog, what is the model for named graphs then?

It is a default JSON-LD graph plus zero or more named JSON-LD graphs, where the names can be IRIs or Blank Nodes?

What if multiple named graphs share the same name? Does everything end up in a single named graph, or does this yield multiple named graphs with the same name, or is it forbidden?

Member

lanthaler commented Oct 24, 2012

Exactly, multiple named graphs with the same name end up in a single named graph.

Owner

gkellogg commented Oct 24, 2012

Right. IRIs or BNodes as labels of the graphs. node definitions in graphs with the same label are merged. There should be no surprises WRT TriG.

We need a short description to add to 3.1 that states this, so you have a handle to describe the relationship with RDF datasets. As I mentioned elsewhere, the notion of BNode labels within graphs is a serialization issue. I think we do this the same as TriG, but I can't find a specific resolution. In N-Quads, it would be challenging to have the scope be anything other than the document.

As with TriG, and the resolution to RDF ISSUE-28, there is no nesting of graphs semantically, syntactically, they can be nested, but this is a coding question only, and when flattened, they all appear at the same level.

cygri commented Oct 25, 2012

Another question: Do null values play any role in the JSON-LD data model (that is, can they end up in values, data types, language tags, property names, graph names, blank node labels, IRIs, etc.)? Or is the data model guaranteed to be null-free?

Member

lanthaler commented Oct 25, 2012

We had long discussions about null values and decided to not support null. That means, whenever a property with a value of null is encountered, it will be ignored. In the context it’s a bit different, there it is used to reset definitions, e.g., to remove a term-IRI mapping or reset the default language. So the data model is guaranteed to be null-free.

cygri commented Oct 25, 2012

Thanks Markus! A good decision IMO.

Owner

msporny commented Oct 29, 2012

Differences between the JSON-LD data model and the RDF data model:

  • Graph names can be bnodes
  • Property names can be bnodes
  • Language containers

cygri commented Oct 30, 2012

Work in progress on clarifying the data model and working out the differences: http://www.w3.org/2011/rdf-wg/wiki/JSON-LD_Data_Model

Member

lanthaler commented Nov 7, 2012

@cygri, does this answer your questions? Can we close this issue?

cygri commented Nov 7, 2012

Yes, I'm closing it. Thanks again all.

@cygri cygri closed this Nov 7, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment