Do not promote undescribed nodes nor datatypes to subjects during flattening #279

niklasl · 2013-07-16T10:34:53Z

The current flattening algorithm creates nodes for undescribed objects in the graph, as well as for datatypes.

This seems like an implementation artifact, and does not seem very useful. It was argued a while ago that users might want this to count the total number of nodes in the graph. I can't see that as a valid argument, for several reasons. For one, users might more commonly want to count the subjects in the graph. That is, the resources that are actually described. Also, crucially, datatypes of literals are not in the graph. They are an intrinsic part of the literal value (that is, in RDF).

The purpose of Flattening is stated as "By flattening a document, all properties of a node are collected in a single JSON object and all blank nodes are labeled with a blank node identifier. This may drastically simplify the code required to process JSON-LD data in certain applications."

Along with removing any nesting of nodes in other nodes, that is all it should do. Anyone wanting to index and analyze the node usage in its entirety needs to process the data futher anyway. A flattened source is useful as input to such an algorithm (e.g. the suggested Connect algorithm). Right now, flattening seems to go half way, and in the process it may create lots of unnecessary nodes.

For instance, consider this data:

{
  "@context": {
    "@vocab": "http://example.org/bib/",
    "@base": "http://library.example.org/",
    "created":{"@type":"http://www.w3.org/2001/XMLSchema#date"}
  },
  "@graph": [
    {
      "@id": "/work/three",
      "@type": "Comic",
      "creator": {"@id": "/person/three"},
      "created": "2001-01-01",
      "basedOn": {
        "@id": "/work/two",
        "@type": "Movie",
        "creator": {"@id": "/person/two"},
        "created": "1991-01-01",
        "basedOn": {
          "@id": "/work/one",
          "@type": "Book",
          "creator": {"@id": "/person/one"},
          "created": "1901-01-01"
        }
      }
    }
  ]
}

I would prefer the flattened result to be just:

{
  "@context": {
    "@vocab": "http://example.org/bib/",
    "@base": "http://library.example.org/",
    "created":{"@type":"http://www.w3.org/2001/XMLSchema#date"}
  },
  "@graph": [
    {
      "@id": "/work/one",
      "@type": "Book",
      "creator": {"@id": "/person/one"},
      "created": "1901-01-01"
    },
    {
      "@id": "/work/two",
      "@type": "Movie",
      "basedOn": {"@id": "/work/one"},
      "creator": {"@id": "/person/two"},
      "created": "1991-01-01"
    },
    {
      "@id": "/work/three",
      "@type": "Comic",
      "basedOn": {"@id": "/work/two"},
      "creator": {"@id": "/person/three"},
      "created": "2001-01-01"
    }
  ]
}

Instead, it becomes:

{
  "@context": {
    "@vocab": "http://example.org/bib/",
    "@base": "http://library.example.org/",
    "created":{"@type":"http://www.w3.org/2001/XMLSchema#date"}
  },
  "@graph": [
    {"@id": "http://example.org/bib/Book"},
    {"@id": "http://example.org/bib/Movie"},
    {"@id": "http://example.org/bib/Comic"},
    {"@id": "person/one"},
    {"@id": "person/two"},
    {"@id": "person/three"},
    {"@id": "http://www.w3.org/2001/XMLSchema#date"},
    {
      "@id": "work/one",
      "@type": "Book",
      "created": "1901-01-01",
      "creator": {"@id": "person/one"}
    }, {
      "@id": "work/two",
      "@type": "Movie",
      "basedOn": {"@id": "work/one"},
      "created": "1991-01-01",
      "creator": {"@id": "person/two"}
    }, {
      "@id": "work/three",
      "@type": "Comic",
      "basedOn": {"@id": "work/two"},
      "created": "2001-01-01",
      "creator": {"@id": "person/three"}
    }
  ]
}

As you can see, the undescribed creators and the types – including a literal datatype – are all added as well.

Perhaps there may be some limited value in adding the objects of the persons here as well, but adding the types and datatypes seems speculative, and rather wasteful in general (even outright wrong in the case of datatypes).

Also, RDF processors may wish to produce flattened JSON-LD directly from graphs. The current result form requires them to add all nodes (not only described subjects, which is the norm when serializing RDF), and also, quite annoyingly, all dataypes used by literals. (Note also that although rdf:nil is logically used at the end of lists in RDF and thus part of the graph, it is not to be added as a node here.)

If strong arguments in favor of this "all nodes" behavior are made, perhaps a flag for controlling this, being off by default, could be an option (though I'd argue that it should still not add datatypes as nodes). Right now, I consider this to be a bug.

lanthaler · 2013-07-16T14:39:53Z

RESOLVED: Fix a bug in the flattening and fromRDF algorithm by not promoting undescribed nodes or datatypes to subjects during the flattening/fromRDF algorithms.

RESOLVED: Graphs are not free-floating nodes and should not be removed during the flattening or fromRDF algorithm.

Adapt flatten and fromRdf tests to the resolution of issue #279

This addresses #279.

@gkellogg

I'm not sure this is what we want but it was a consequence of @gkellogg's change in cd397a5. /cc @niklasl, @dlongley, @msporny This addresses #279.

lanthaler · 2013-07-23T20:43:51Z

The algorithms have been fixed to not output any "undescribed nodes", i.e., nodes without any properties, according our resolution. The relevant test cases have been updated as well.

Unless I hear objections, I will therefore close this issue in 24 hours.

niklasl added a commit that referenced this issue Jul 18, 2013

Adapt flatten and fromRdf tests to the resolution of issue #279

5f8d825

gkellogg added a commit that referenced this issue Jul 20, 2013

Merge pull request #284 from json-ld/test-suite-fix-for-issue-279

da4f40e

Adapt flatten and fromRdf tests to the resolution of issue #279

lanthaler added a commit that referenced this issue Jul 21, 2013

Remove undescribed node from flatten-0021

f183264

This addresses #279.

lanthaler added a commit that referenced this issue Jul 22, 2013

Update framing tests to not include nodes which consists of only @id

f0d29bd

I'm not sure this is what we want but it was a consequence of @gkellogg's change in cd397a5. /cc @niklasl, @dlongley, @msporny This addresses #279.

lanthaler closed this as completed Jul 25, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not promote undescribed nodes nor datatypes to subjects during flattening #279

Do not promote undescribed nodes nor datatypes to subjects during flattening #279

niklasl commented Jul 16, 2013

lanthaler commented Jul 16, 2013

lanthaler commented Jul 23, 2013

Do not promote undescribed nodes nor datatypes to subjects during flattening #279

Do not promote undescribed nodes nor datatypes to subjects during flattening #279

Comments

niklasl commented Jul 16, 2013

lanthaler commented Jul 16, 2013

lanthaler commented Jul 23, 2013