Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keeping quoted triples sharing subject within one node object #45

Open
niklasl opened this issue Jun 4, 2023 · 0 comments
Open

Keeping quoted triples sharing subject within one node object #45

niklasl opened this issue Jun 4, 2023 · 0 comments

Comments

@niklasl
Copy link
Member

niklasl commented Jun 4, 2023

Background and motivation

It would be useful to be able to group quoted triples about the same resource within a node object, to eliminate redundancy and make it simple to access, view and manage all data about that resource, whether it is asserted or quoted.

Use cases that would benefit from this include:

  • Managing suggested triples for a resource (such as automatic classification of bibliographic resources).
  • A combined "blame" view of the history of statements about a resource, where each statement is related to versions of source documents.

Example

A simplified example combining the use cases would be:

  • There are two versions of a record describing the resource <7d5d0d651caa>, with version 2 being the current.
  • The resource was typed as just a :Work in version 1, then in version 2 more precisely as a :Text.
  • In version 2 there is also a suggestion by <classifyer> that the resource has the subject <semantics>.

Here are the two versions expressed as named graphs using TriG-star:

prefix : <http://example.org/ns#>
base <http://example.com/>

graph <data?version=1> {
  <7d5d0d651caa> a :Work .
}

graph <data?version=2> {
  <7d5d0d651caa> a :Text .
  << <7d5d0d651caa> :subject <semantics> >> :suggestedBy <classifyer> .
}

Here is the "blame" view of that, combining asserted and quoted facts (utilizing annotation to avoid repeating asserted arcs with additional attached facts):

prefix : <http://example.org/ns#>
base <http://example.com/>

<7d5d0d651caa> a :Text {| :statedIn <data?version=2> |} .
<< <7d5d0d651caa> a :Work >> :statedIn <data?version=1> ; :retractedIn <data?version=2> .
<< <7d5d0d651caa> :subject <semantics> >> :suggestedBy <classifyer> {| :statedIn <data?version=2> |} .

Here is that same "blame" view expressed using JSON-LD-star:

[
  {
    "@id": "7d5d0d651caa",
    "@type": "Text"
  },
  {
    "@id": {"@id": "7d5d0d651caa", "@type": "Text"},
    "statedIn": {"@id": "data?version=2"}
  },
  {
    "@id": {"@id": "7d5d0d651caa", "@type": "Work"},
    "statedIn": {"@id": "data?version=1"},
    "retractedIn": {"@id": "data?version=2"}
  },
  {
    "@id": {"@id": "7d5d0d651caa", "subject": {"@id": "semantics"}},
    "suggestedBy": {"@id": "classifyer", "@annotation": {"statedIn": {"@id": "data?version=2"}}}
  }
]

(Note that while the TriG-star annotation form helps to eliminate one assertion/quotation repetition (<7d5d0d651caa> a :Text . << <7d5d0d651caa> a :Text >> :statedIn <data?version=2> .), that is not possible to succinctly express in JSON-LD-star. So in order to keep predictable compact types, the quotation of an assigned type here repeats the triple, outside of the subject node.)

While repetitive, the above is still more concise (and intended to become semantically different in RDF 1.2) than using plain old reification (here in Turtle):

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix : <http://example.org/ns#>
base <http://example.com/>

<7d5d0d651caa> a :Text .
[] rdf:subject <7d5d0d651caa> ;
  rdf:predicate rdf:type ;
  rdf:object :Text ;
  :statedIn <data?version=2> .

[] rdf:subject <7d5d0d651caa> ;
  rdf:predicate rdf:type ;
  rdf:object :Work ;
  :statedIn <data?version=1> ;
  :retractedIn <data?version=2> .

_:stmt3 rdf:subject <7d5d0d651caa> ;
  rdf:predicate :subject ;
  rdf:object <semantics> ;
  :suggestedBy <classifyer> .

_:stmt4 rdf:subject _:stmt3 ;
  rdf:predicate :suggestedBy ;
  rdf:object <classifyer> .

_:stmt4 :statedIn <data?version=2> .

However, all examples above, from named graphs via quoted triples to old reification, requires a lot of repetition of the same subject throughout. For NQuads that is to be expected, but the purpose of TriG-star or at least JSON-LD is to provide ergonomic representations, reducing redundancy as much as possible. (Without turning things obscure of course; albeit both of these qualities are subjective and require a wide range of experience and agreement.)

Interestingly, by using JSON-LD with a custom context, the old reification form can actually result in a fairy compact view:

{
  "@context": {
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "stmt": {"@reverse": "rdf:subject", "@container": "@index", "@index": "p"},
    "s": {"@id": "rdf:subject", "@type": "@id"},
    "p": {"@id": "rdf:predicate", "@type": "@vocab"},
    "t": {"@id": "rdf:object", "@type": "@vocab"},
    "o": {"@id": "rdf:object"},
    "@vocab": "http://example.org/ns#",
    "@base": "http://example.com/"
  },
  "@graph": [
    {
      "@id": "7d5d0d651caa",
      "@type": "Text",
      "stmt": {
        "rdf:type": [
          {"t": "Text", "statedIn": {"@id": "data?version=2"}},
          {"t": "Work", "statedIn": {"@id": "data?version=1"}, "retractedIn": {"@id": "data?version=2"} }
        ],
        "subject": {
          "o": {"@id": "semantics"},
          "suggestedBy": {"@id": "classifyer"},
          "stmt": {
            "suggestedBy": {"o": {"@id": "classifyer"}, "statedIn": {"@id": "data?version=2"}}
          }
        }
      }
    }
  ]
}

(Note: While most JSON-LD processors support this form, there is an errata on the algorithm for this feature: w3c/json-ld-api#565.)

It is still a bit unwieldy (with the odd predicate and object keys), and does not utilize any of the semantics RDF-star will hopefully define (e.g. regarding uniqueness of triples and their relation to named graphs).

Grouping with @included

Note that grouping is already possible in JSON-LD, albeit with the remaining problem of repeating the same @id value:

{
  "@context": {
    "@vocab": "http://example.org/ns#",
    "@base": "http://example.com/"
  },
  "@graph": [

    {
      "@id": "7d5d0d651caa",
      "@type": "Text",
      "@included": [
        {
          "@id": {"@id": "7d5d0d651caa", "@type": "Text"},
          "statedIn": {"@id": "data?version=2"}
        },
        {
          "@id": {"@id": "7d5d0d651caa", "@type": "Work"},
          "statedIn": {"@id": "data?version=1"},
          "retractedIn": {"@id": "data?version=2"}
        },
        {
          "@id": {"@id": "7d5d0d651caa", "subject": {"@id": "semantics"}},
          "suggestedBy": {"@id": "classifyer", "@annotation": {"statedIn": {"@id": "data?version=2"}}}
        }
      ]
    }

  ]
}

While this appears somewhat better than the first blame view, it still requires something or someone to ensure that the subject is kept the same within this group, and would still require processing (e.g. indexing or restructuring) to be concisely managed.

Also, this is a simplified example which doesn't scale well. In many real world examples, there can be lots of quoted triples sharing the same subject (e.g. in aggregated bibliographic records from the library domain, or in Wikidata descriptions where there are sometimes disputed statements which could be beneficial to be kept as unasserted quotes).

Proposal: a @quoted keyword

In this proposal the objects of arcs are quoted using a new @quoted keyword. This means that the entire arc is quoted. Data within the quoted object become assertions about that quoted triple:

{
  "@context": {
    "@vocab": "http://example.org/ns#",
    "@base": "http://example.com/"
  },
  "@graph": [

    {
      "@id": "7d5d0d651caa",
      "@type": [
        "Text",
        {
          "@quoted": "Text",
          "statedIn": {"@id": "data?version=2"}
        },
        {
          "@quoted": "Work",
          "statedIn": {"@id": "data?version=1"},
          "retractedIn": {"@id": "data?version=2"}
        }
      ],
      "subject": {
        "@quoted": {"@id": "semantics"},
        "suggestedBy": {"@id": "classifyer", "@annotation": {"statedIn": {"@id": "data?version=2"}}}
      }
    }

  ]
}

This becomes a kind of complement to @annotation, and behaves very much like it, in that both affect the subject-predicate-object arc.

Of course, like @annotation, it could be considered a drawback in that values for keys that normally expect strings may become quoted objects. Again, see the @annotation on a @type issue for details. (We may want to similarly support @container: @quoted just like suggested in that issue to support partitioning predicates.)

The upshot of @quoted is that, reasonably, values of @quoted would behave as the value for the key itself, avoiding the @id vs. @vocab lexical space part of the problem in the aforementioned issue. It even seems possible to use @uoted instead of @annotation on type, as illustrated above, as long as you accept repeating the type value as quoted. (There is no way around that for simple string values in JSON, so this appears to be an acceptable trade-off for keeping predictable code paths, with the caveat of quoted objects being mixed in unless a partitioning container can be used.)

(Note: I did try to allow @quoted to also be used instead of @id-with-object-value as a way to represent quoted triples as objects (of statements). But due to the way subjects may be objects in JSON-LD, that would become ambiguous in conjunction with the above. It may be possible to support that by requiring the above to combine @quoted with @annotation, and only then let @quoted trigger the "quote the entire arc" meaning.)

(Also, this keyword may still be possible to use in the suggestion for allowing terms with quotes as values, instead of the there suggested @triple keyword though.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant