Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not promote undescribed nodes nor datatypes to subjects during flattening #279

Closed
niklasl opened this issue Jul 16, 2013 · 2 comments
Closed

Comments

@niklasl
Copy link
Member

niklasl commented Jul 16, 2013

The current flattening algorithm creates nodes for undescribed objects in the graph, as well as for datatypes.

This seems like an implementation artifact, and does not seem very useful. It was argued a while ago that users might want this to count the total number of nodes in the graph. I can't see that as a valid argument, for several reasons. For one, users might more commonly want to count the subjects in the graph. That is, the resources that are actually described. Also, crucially, datatypes of literals are not in the graph. They are an intrinsic part of the literal value (that is, in RDF).

The purpose of Flattening is stated as "By flattening a document, all properties of a node are collected in a single JSON object and all blank nodes are labeled with a blank node identifier. This may drastically simplify the code required to process JSON-LD data in certain applications."

Along with removing any nesting of nodes in other nodes, that is all it should do. Anyone wanting to index and analyze the node usage in its entirety needs to process the data futher anyway. A flattened source is useful as input to such an algorithm (e.g. the suggested Connect algorithm). Right now, flattening seems to go half way, and in the process it may create lots of unnecessary nodes.

For instance, consider this data:

{
  "@context": {
    "@vocab": "http://example.org/bib/",
    "@base": "http://library.example.org/",
    "created":{"@type":"http://www.w3.org/2001/XMLSchema#date"}
  },
  "@graph": [
    {
      "@id": "/work/three",
      "@type": "Comic",
      "creator": {"@id": "/person/three"},
      "created": "2001-01-01",
      "basedOn": {
        "@id": "/work/two",
        "@type": "Movie",
        "creator": {"@id": "/person/two"},
        "created": "1991-01-01",
        "basedOn": {
          "@id": "/work/one",
          "@type": "Book",
          "creator": {"@id": "/person/one"},
          "created": "1901-01-01"
        }
      }
    }
  ]
}

I would prefer the flattened result to be just:

{
  "@context": {
    "@vocab": "http://example.org/bib/",
    "@base": "http://library.example.org/",
    "created":{"@type":"http://www.w3.org/2001/XMLSchema#date"}
  },
  "@graph": [
    {
      "@id": "/work/one",
      "@type": "Book",
      "creator": {"@id": "/person/one"},
      "created": "1901-01-01"
    },
    {
      "@id": "/work/two",
      "@type": "Movie",
      "basedOn": {"@id": "/work/one"},
      "creator": {"@id": "/person/two"},
      "created": "1991-01-01"
    },
    {
      "@id": "/work/three",
      "@type": "Comic",
      "basedOn": {"@id": "/work/two"},
      "creator": {"@id": "/person/three"},
      "created": "2001-01-01"
    }
  ]
}

Instead, it becomes:

{
  "@context": {
    "@vocab": "http://example.org/bib/",
    "@base": "http://library.example.org/",
    "created":{"@type":"http://www.w3.org/2001/XMLSchema#date"}
  },
  "@graph": [
    {"@id": "http://example.org/bib/Book"},
    {"@id": "http://example.org/bib/Movie"},
    {"@id": "http://example.org/bib/Comic"},
    {"@id": "person/one"},
    {"@id": "person/two"},
    {"@id": "person/three"},
    {"@id": "http://www.w3.org/2001/XMLSchema#date"},
    {
      "@id": "work/one",
      "@type": "Book",
      "created": "1901-01-01",
      "creator": {"@id": "person/one"}
    }, {
      "@id": "work/two",
      "@type": "Movie",
      "basedOn": {"@id": "work/one"},
      "created": "1991-01-01",
      "creator": {"@id": "person/two"}
    }, {
      "@id": "work/three",
      "@type": "Comic",
      "basedOn": {"@id": "work/two"},
      "created": "2001-01-01",
      "creator": {"@id": "person/three"}
    }
  ]
}

As you can see, the undescribed creators and the types – including a literal datatype – are all added as well.

Perhaps there may be some limited value in adding the objects of the persons here as well, but adding the types and datatypes seems speculative, and rather wasteful in general (even outright wrong in the case of datatypes).

Also, RDF processors may wish to produce flattened JSON-LD directly from graphs. The current result form requires them to add all nodes (not only described subjects, which is the norm when serializing RDF), and also, quite annoyingly, all dataypes used by literals. (Note also that although rdf:nil is logically used at the end of lists in RDF and thus part of the graph, it is not to be added as a node here.)

If strong arguments in favor of this "all nodes" behavior are made, perhaps a flag for controlling this, being off by default, could be an option (though I'd argue that it should still not add datatypes as nodes). Right now, I consider this to be a bug.

@lanthaler
Copy link
Member

RESOLVED: Fix a bug in the flattening and fromRDF algorithm by not promoting undescribed nodes or datatypes to subjects during the flattening/fromRDF algorithms.

RESOLVED: Graphs are not free-floating nodes and should not be removed during the flattening or fromRDF algorithm.

gkellogg added a commit that referenced this issue Jul 20, 2013
Adapt flatten and fromRdf tests to the resolution of issue #279
lanthaler added a commit that referenced this issue Jul 21, 2013
lanthaler added a commit that referenced this issue Jul 22, 2013
I'm not sure this is what we want but it was a consequence of @gkellogg's change in cd397a5.

/cc @niklasl, @dlongley, @msporny

This addresses #279.
@lanthaler
Copy link
Member

The algorithms have been fixed to not output any "undescribed nodes", i.e., nodes without any properties, according our resolution. The relevant test cases have been updated as well.

Unless I hear objections, I will therefore close this issue in 24 hours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants