You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current flattening algorithm creates nodes for undescribed objects in the graph, as well as for datatypes.
This seems like an implementation artifact, and does not seem very useful. It was argued a while ago that users might want this to count the total number of nodes in the graph. I can't see that as a valid argument, for several reasons. For one, users might more commonly want to count the subjects in the graph. That is, the resources that are actually described. Also, crucially, datatypes of literals are not in the graph. They are an intrinsic part of the literal value (that is, in RDF).
The purpose of Flattening is stated as "By flattening a document, all properties of a node are collected in a single JSON object and all blank nodes are labeled with a blank node identifier. This may drastically simplify the code required to process JSON-LD data in certain applications."
Along with removing any nesting of nodes in other nodes, that is all it should do. Anyone wanting to index and analyze the node usage in its entirety needs to process the data futher anyway. A flattened source is useful as input to such an algorithm (e.g. the suggested Connect algorithm). Right now, flattening seems to go half way, and in the process it may create lots of unnecessary nodes.
As you can see, the undescribed creators and the types – including a literal datatype – are all added as well.
Perhaps there may be some limited value in adding the objects of the persons here as well, but adding the types and datatypes seems speculative, and rather wasteful in general (even outright wrong in the case of datatypes).
Also, RDF processors may wish to produce flattened JSON-LD directly from graphs. The current result form requires them to add all nodes (not only described subjects, which is the norm when serializing RDF), and also, quite annoyingly, all dataypes used by literals. (Note also that although rdf:nil is logically used at the end of lists in RDF and thus part of the graph, it is not to be added as a node here.)
If strong arguments in favor of this "all nodes" behavior are made, perhaps a flag for controlling this, being off by default, could be an option (though I'd argue that it should still not add datatypes as nodes). Right now, I consider this to be a bug.
The text was updated successfully, but these errors were encountered:
RESOLVED: Fix a bug in the flattening and fromRDF algorithm by not promoting undescribed nodes or datatypes to subjects during the flattening/fromRDF algorithms.
RESOLVED: Graphs are not free-floating nodes and should not be removed during the flattening or fromRDF algorithm.
The algorithms have been fixed to not output any "undescribed nodes", i.e., nodes without any properties, according our resolution. The relevant test cases have been updated as well.
Unless I hear objections, I will therefore close this issue in 24 hours.
The current flattening algorithm creates nodes for undescribed objects in the graph, as well as for datatypes.
This seems like an implementation artifact, and does not seem very useful. It was argued a while ago that users might want this to count the total number of nodes in the graph. I can't see that as a valid argument, for several reasons. For one, users might more commonly want to count the subjects in the graph. That is, the resources that are actually described. Also, crucially, datatypes of literals are not in the graph. They are an intrinsic part of the literal value (that is, in RDF).
The purpose of Flattening is stated as "By flattening a document, all properties of a node are collected in a single JSON object and all blank nodes are labeled with a blank node identifier. This may drastically simplify the code required to process JSON-LD data in certain applications."
Along with removing any nesting of nodes in other nodes, that is all it should do. Anyone wanting to index and analyze the node usage in its entirety needs to process the data futher anyway. A flattened source is useful as input to such an algorithm (e.g. the suggested Connect algorithm). Right now, flattening seems to go half way, and in the process it may create lots of unnecessary nodes.
For instance, consider this data:
I would prefer the flattened result to be just:
Instead, it becomes:
As you can see, the undescribed creators and the types – including a literal datatype – are all added as well.
Perhaps there may be some limited value in adding the objects of the persons here as well, but adding the types and datatypes seems speculative, and rather wasteful in general (even outright wrong in the case of datatypes).
Also, RDF processors may wish to produce flattened JSON-LD directly from graphs. The current result form requires them to add all nodes (not only described subjects, which is the norm when serializing RDF), and also, quite annoyingly, all dataypes used by literals. (Note also that although rdf:nil is logically used at the end of lists in RDF and thus part of the graph, it is not to be added as a node here.)
If strong arguments in favor of this "all nodes" behavior are made, perhaps a flag for controlling this, being off by default, could be an option (though I'd argue that it should still not add datatypes as nodes). Right now, I consider this to be a bug.
The text was updated successfully, but these errors were encountered: