Skip to content

Add '@graph' container type #195

Open
lanthaler opened this Issue Nov 9, 2012 · 6 comments

3 participants

@lanthaler
JSON-LD Public Repositories member

Sent to the mailing list by @linclark:

Rather than continuing to reiterate the use case we have for language maps (which has been called a bogus use case and an anti-pattern by members of the WG), I thought it could be worth looking at another option.

What Drupal needs isn't really language management. Drupal needs version management, where the versions just happen to be based on language. That's why I originally considered named graphs. The idea of using named graphs for our use case made members of the WG balk and we were encouraged to look to language maps.

However, it seems now that language maps need to round trip to RDF. This means that language maps will force a change in the data model... for example inserting blank nodes in between a subject and its properties. I'm unclear on why it is preferable to create blank nodes in a data model than it is to use a named graph. The named graph at least lets you keep the same base triple structure, and consumers can choose whether or not to pay attention to the 4th element of the quad. As I recall, on a telecon where I brought it up, Gregg said that named graphs shouldn't be used unless you needed to make statements about the graph itself. However, others such as Leigh Dodds have discussed using named graphs for versioning or providing context, and I'm not sure that it's such an unconventional idea.

I would be interested to hear what concerns the WG have with this sort of use of named graphs.

Besides being discouraged from using them by the WG, the other reason I decided against named graphs was because there was no good way to access properties in named graphs. Since JSON-LD's query API is still unspecified, direct access to properties using the tree structure needs to be easy for the end user developer.

Instead of continuing to try to shoehorn language maps into our use case (or vice versa), I'm wondering whether making named graphs easier to traverse would be a better option.

For example, I imagine something like:

{
    "@context": {
        "site": "http://ex.org/",
        "body": {
            "@id": "site:body",
            "@container": "@graph"
        },
        "en": "site:node/1/en",
        "de": "site:node/1/de"
    },
    "@id": "site: node/1",
    "body": {
        "en": [
            "Here is some body text for the article."
        ],
        "de": [
            "Hier sind einige Textkörper für den Artikel."
        ],
    }
}

It would normalize to:

<site:node/1> <site:body> "Here is some body text for the article." <site:node/1/en>
<site:node/1> <site:body> "Hier sind einige Textkörper für den Artikel." <site:node/1/de>

And values could be accessed the same way as was intended with language maps:
obj.body.en[0]

I imagine this could be useful for expressing version information beyond language (for example, revisioning), which I could see being a large use case for many other CMSs besides Drupal.

-Lin

@lanthaler
JSON-LD Public Repositories member

@gkellogg's response:

On Nov 8, 2012, at 6:17 PM, Lin Clark lin.w.clark@gmail.com wrote:

Rather than continuing to reiterate the use case we have for language maps (which has been called a bogus use case and an anti-pattern by members of the WG), I thought it could be worth looking at another option.

What Drupal needs isn't really language management. Drupal needs version management, where the versions just happen to be based on language. That's why I originally considered named graphs. The idea of using named graphs for our use case made members of the WG balk and we were encouraged to look to language maps.

However, it seems now that language maps need to round trip to RDF. This means that language maps will force a change in the data model... for example inserting blank nodes in between a subject and its properties. I'm unclear on why it is preferable to create blank nodes in a data model than it is to use a named graph. The named graph at least lets you keep the same base triple structure, and consumers can choose whether or not to pay attention to the 4th element of the quad. As I recall, on a telecon where I brought it up, Gregg said that named graphs shouldn't be used unless you needed to make statements about the graph itself. However, others such as Leigh Dodds have discussed using named graphs for versioning or providing context, and I'm not sure that it's such an unconventional idea.

Hmm, I don't remember suggesting that named graphs should only be used when making assertions about a graph itself; do you have a reference? I could have done so, as the reason named graphs were brought in was particularly for the provenance use case, where you want to make assertions about other information.

In any case, we have come to the realization that JSON-LD is really a dataset model (like TriG) and not really a pure graph model (like Turtle). The only thing the RDF WG could agree upon is that datasets have no semantics, so we can infer that they don't in JSON-LD either. As you note many people use named graphs for all kinds of reasons, and I think (now anyway) that this might be a good solution for you.

I did see that back in July, we discussed @container: @graph as a potential solution for WikiData's solution, and if there's something we can do that addresses Drupal's use case, particularly if it does it better than language maps, then that seems like an interesting area to pursue.

I would be interested to hear what concerns the WG have with this sort of use of named graphs.

Besides being discouraged from using them by the WG, the other reason I decided against named graphs was because there was no good way to access properties in named graphs. Since JSON-LD's query API is still unspecified, direct access to properties using the tree structure needs to be easy for the end user developer.

Instead of continuing to try to shoehorn language maps into our use case (or vice versa), I'm wondering whether making named graphs easier to traverse would be a better option.

For example, I imagine something like:

{
    "@context": {
        "site": "http://ex.org/",
        "body": {
            "@id": "site:body",
            "@container": "@graph"
        },
        "en": "site:node/1/en",
        "de": "site:node/1/de"
    },
    "@id": "site: node/1",
    "body": {
        "en": [
            "Here is some body text for the article."
        ],
        "de": [
            "Hier sind einige Textkörper für den Artikel."
        ],
    }
}

It would normalize to:

<site:node/1> <site:body> "Here is some body text for the article." <site:node/1/en>
<site:node/1> <site:body> "Hier sind einige Textkörper für den Artikel." <site:node/1/de>

Yes, this looks right. It's certainly unusual, as the subject and property appear in the default graph, with the value(s) in the named graph, but it seems quite consistent. So, the semantics would be that the relevant subject and property are "pulled into" the named graph associated with their values, and any node definitions within that context would remain within the named graph.

Expanding such a structure (flattening, anyway) would likely look like the following:

[
  {
    "@id": "http://ex.org/node/1/en",
    "@graph": [{
      "@id": "http://ex.org/node/1",
      "http://ex.org/body": [
        {"@value": "Here is some body text for the article."}
      ]
    }]
  },
  {
    "@id": "http://ex.org/node/1/de",
    "@graph": [{
      "@id": "http://ex.org/node/1",
      "http://ex.org/body": [
        {"@value": "Hier sind einige Textkörper für den Artikel."}
      ]
    }]
  }
]

Figuring out how to reverse this when compacting might be challenging, but we haven't lost any information, so we should be able to do it.

Gregg

@lanthaler
JSON-LD Public Repositories member

This is related to #133.

@niklasl
JSON-LD Public Repositories member
niklasl commented Nov 9, 2012

So, to elaborate on my recent comment in issue 133:

Named graphs are for describing descriptions (they are "the sheet of paper the article is printed on"). It's a much more complex case for consumption than just describing the each language version as a distinct resource, in the data given by the canonical IRI for the resource (the one described by articles in different languages). That is just basic Dublin Core usage. Using named graphs is primarily for doing data quotation (used for e.g. digital signatures), handling provenance of entire datasets (i.e. datadumps of several quoted records) and managing quad stores (handling revisions etc). And handling datasets isn't something I'd expect e.g. CreateJS to do casually, for instance.

They are powerful and useful of course, but you may end up with disambiguation problems. If the same resource is described in two named graphs, it is still logically the same resource. For example, any use of a functional property (in OWL lingo) describing that resource pointing to different IRIs would mean that those two IRIs identify same thing. Conflation may abound if this is not thoroughly understood by authors of such data.

Is that fully OK by Drupal? And are the different versions really not viable to expose more concretely than as two sets of statements? You should compare this to recommended data handling in e.g. bibliographical systems (see e.g. FRBR). This is the pivotal point, especially for interoperability.

Do you also accept Gregg's example of the expanded data above? If so, the question is if this is a reasonable addition to the compaction algorithm. I can imagine how it would be done, but I'm not sure at what cost. It seems very advanced to support partitioning of each property value of a resource by named graph in a syntax like this. Let's hear what others have to say.

(Note that I'd still prefer to add a @container mechanism for mapping based on the property value of a member over this, as it is a more common case to describe different articles as distinct resources in the same graph.)

@linclark
linclark commented Nov 9, 2012
If the same resource is described in two named graphs, it is still logically the same resource... Is that fully OK by Drupal?

Right, we actually want it to be logically the same resource. They all have the same UUID in Drupal, we conceive of it as a single resource. We intentionally moved away from having "translation ids" in Drupal 7 to having a single entity with a single ID in Drupal 8.

Do you also accept Gregg's example of the expanded data above?

Yes. I can imagine how it will round trip, which is important and is something I could never quite be certain of in the language maps proposal.

@niklasl
JSON-LD Public Repositories member
niklasl commented Nov 12, 2012

(This is a reply to a comment in 133, put here since it's mainly about the use of graphs to differentiate between language versions.)

@linclark My concern is that it seems like an odd way of partitioning information based on language. I use named graphs a lot for managing changes in descriptions from various sources. That article by Jeni is very good, and outlines a usable way of handling versions, specifically revisions, of data over time. Note though that for information resources, her recommendation is to use distinct representations of the resources (note especially the use of dct:hasVersion to link from a canonical "hub" resource to the different (here time-based) variants). Also for the use of named graphs (mainly pertaining to "real world entities", quite hard to talk about as snapshots over time), as the article describes, the default graph is intended to reflect the current state of affairs. How do you recommend to use the language-based entity variants for a node as named graphs, as exposed by Drupal, in RDF applications?

You do say "language-based entity variants". More than one language variant means that, conceptually, there are separate resources. (The representation of a resource is also a resource, with its own mime-type etc.) You cannot content-negotiate on language and get different resources back if they are intrinsically the same resource (what you get is a distinct representation in a specific language, its own comment count, author, etc.). Neither can you do a query against a graph to e.g. count the variants in english, etc, unless these are distinct.

Note that I'm thinking about this from the outside in (the surface data), not from the inside perspective (implementations often look rather different internally from the resources they expose, for many reasons). Also note that there is no hard requirement to publish the variants on different IRIs (certainly not when exposed as raw data). They can be subsumed as different entities without IRIs (i.e. blank nodes), described by the data published for the node (similar to the document "hub" in Jeni's example).

I just want to point out that this conflation may become problematic down the road, when information published by Drupal sites is syndicated and integrated by various applications. I'm no stranger to "practical conflation" (things can get absurd either way), but in this case it is evident that the difference in language is key (no pun intended..). So when publishing data containing this difference in syntax, it would be a waste to see it get lost in interpretation.

This is why I keep coming back to this; I'm sorry if I'm not conveying that clearly. I'm not after restructuring Drupal's internals, I'm just trying to focus on what I've seen regarding the usefulness of published information.

@lanthaler
JSON-LD Public Repositories member

RESOLVED: Push the addition of '@container': '@graph' to the JSON-LD Syntax specification off to a later version of JSON-LD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.