Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Add '@annotation' container type #196

Closed
niklasl opened this Issue · 17 comments

6 participants

@niklasl
Collaborator

Given the various needs outlined in issues #84, #133, #159 and #195, it seems there may be a general need for noisy JSON to work as JSON-LD. While not ideal, it may be required for zero-edits.

This is a proposal to add a keyword, tentatively called @annotation. It is only to be used in a context definition, and signals to the processor to skip a part of the JSON but continue recursive processing.

For example, it could be used to provide any kind of application-specific index-objects, like this:

{
  "@context": {
    "author": {"@id": "http://schema.org/author", "@container": "@annotation"}
  },
  "@id": "http://example.org/article",
  "author": {
    "regular": {"@id": "http://example.org/person/1"},
    "guest": {"@id": "http://example.org/guest/cd24f329aa"}
  }
}

The publisher has here decided that authors are to be accessed by some property (here some kind of role), which is not to be exposed as information (interpretable as RDF). To do this, the above shape has an injected artificial object between the author property and the authors, which is to be ignored. Thus, semantically, the above means exactly the same as:

{
  "@context": {
    "author": {"@id": "http://schema.org/author", "@container": "@annotation"}
  },
  "@id": "http://example.org/article",
  "author": [
    {"@id": "http://example.org/person/1"},
    {"@id": "http://example.org/guest/cd24f329aa"}
  ]
}

The role information itself could of course be included in the information about the author, or in associated account data. In fact, this mechanism enables publishers to experiment with many kinds of special container algorithms (such as the before suggested @id maps or generalized property maps for language, timestamps, etc.), which may in the future be part of JSON-LD. (At which time the context, and only that, could be updated with such newly supported container options.) And since objects-as-maps is a fairly common occurrence in JSON in the wild, this annotation mechanism may ease adoption in general.

The @annotation keyword could also be used as the @id for a term (i.e. again only in the context). If used so, the term itself would be ignored (and thus any linkage), but its object value could be processed, just as if it had been a top level object within a @graph array. This may be a separate proposal.

A downside of this proposal is that the shape of compact JSON-LD becomes harder to immediately understand, since "faux" keys may pop up in unexpected places. The upshot is that any context using the annotation keyword can immediately be categorized as being for application-specific, idiosyncratic JSON. This lets consumers know that such JSON cannot be automatically created using compaction by itself, but has been composed by some other process. (This somewhat akin to RDFa in that the JSON syntax can be treated as a carrier with parts picked out as semantically relevant. Also compare this to GRDDL, replacing XML with JSON and XSLT with just the JSON-LD context.)

If these annotations must survive expansion, an intermediate object with only an @annotation key reasonably have to be put into the expanded form. (Similar in shape to @list objects.) There should be a flag to control if annotations are to be preserved (default being false).

@msporny
Owner

This proposal is related to 'Decide on language handling for JSON-LD': http://drupal.org/node/1838700

@msporny
Owner

I really like this proposal. It's exactly the sort of compromise that JSON-LD should be making - there are some things (such as application-specific data structure optimization, that you don't want surfacing in your RDF). It's simple, directly addresses the Drupal use case, and allows applications to use their own application-specific annotation ("language-like maps", @id maps, etc.) without surfacing the annotation in the RDF. I spoke with @linclark about it and she thinks that it would work for Drupal's use case. Any objections from @gkellogg, @dlongley, @lanthaler, @cygri or @tidoust?

@dlongley
Owner

+1 to the proposal. We should also make it possible to specify (in the @context) where the deep data is added -- to cover the microdata use case.

@tidoust

I like the idea as well.

I'm not sure I get @dlongley comment about depth, so my comment may be a duplicate of his. Would the proposal cover cases where someone comes up with a "container" that is more than one level deep (similar to a multi-column index in a database)?

{
  "@context": {
    "author": {"@id": "http://schema.org/author", "@container": "@annotation"}
  },
  "@id": "http://example.org/article",
  "author": {
    "chapter1": {
      "regular": { "@id": "http://example.org/person/1" },
      "guest": { "@id": "http://example.org/person/2" }
    },
    "chapter2": {
      "regular": { "@id": "http://example.org/person/1" },
      "guest": { "@id": "http://example.org/person/3" }
    }
  }
}

If it does, how? It not, could that be problem?

Side though related note: while re-writing the grammar, I've been somewhat surprised to realize that it was actually pretty strict. I was more expecting something à la GRDDL as Niklas puts it, i.e. the possibility to have properties more meant for internal use only that would be lost during processing, combined with properties properly flagged as Linked Data that would be preserved. I suppose this has been discussed in the past. Any pointer to relevant discussions or arguments?

@gkellogg
Owner

I like everything in the proposal but the last paragraph. Maintaining the annotation information through an @annotation property I think makes the information worse, not better. It has profound implications to other algorithms such as compact, flatten and frame, not to mention to/fromRDF.

I would rather see the annotations be removed from expansion, yielding a form similar to @niklasl's second example above. This also works best when trying to consume other JSON, such as microdata-JSON, Twitter and GitHub.

Would the proposal cover cases where someone comes up with a "container" that is more than one level deep (similar to a multi-column index in a database)?

Yes, I think that can work too. Basically, when encountering a property in the expansion algorithm in before step 2.2.2 and the las sentence of 2.2.1, add the following:

If _property_ has @container @annotation, expand this _value_ recursively using this algorithm, passing copies of the *active context* and *active property*.

Of course, this needs to consider both the case when it's the top-level property that's being consumed with properties left in the RHS, and then the LHS property is preserved with values promoted up to that property.

In the case of microdata-JSON, with a structure like the following:

{
  "type": "http:schema.org/Person",
  "properties": {
    "name": "Gregg Kellogg"
  }
}

We could have a context applied such as the following:

{ "@context": {
  "@vocab": "http://schema.org/",
  "type": "@type",
  "properties": {"@container": "annotation"}
}}

This would then expand to

[{
  "@type": "http://schema.org/Person",
  "http://schema.org/name": [{"@value": "Gregg Kellogg"}]
}]

If it's important to preserve such "annotation" properties, then I think they need to have meaning in the context of the Linked Data Graph. Perhaps the @container: @graph mechanism preserves this best.

@niklasl
Collaborator

Yes, I agree that the last paragraph doesn't paint a pretty picture at all. It occurred to me however, that we could use the same mechanism which works for @container: @language here instead. So given the original example, if it was expanded to:

{"@graph": [{
  "@id": "http://example.org/article",
  "http://schema.org/author": [
    {"@id": "http://example.org/person/1", "@annotation": "regular"},
    {"@id": "http://example.org/guest/cd24f329aa", "@annotation": "guest"}
  ]
}]}

then those annotation keys would be "out of the way". Still semantic noise, but they wouldn't distort the expanded form in any way. The upside is also that if the compaction mechanism treated such annotations just like it handles @language for literals (but for any kind of object), someone could actually generate or post-process an expanded form to add annotations (e.g. by picking from other values in the object, to get an "index value"). Combined with the example context it could produce the desired idiosyncratic mapping "for free".

I agree that this goes out of its way a bit, but given how #133 works (which we have resolved to do), it's at least an isomorphic design (and also isomorphic to the other mapping ideas, for id or generalized properties, that have come up).

@niklasl
Collaborator

As for microdata JSON, I would also like to make it work. Not so much for the sake of microdata in and of itself, but since I have also seen its shape in other cases, where an object represents what I'd like to call a "property group". Consider this JSON:

{
  "@id": "http://example.org/book",
  "publishing": {
    "publisher" {"@id": "http://example.org/org/1"},
    "author": {"@id": "http://example.org/person/1"}
  },
  "description": {
    "type": "paperback",
    "size": "110mm x 178mm",
    "pageCount": "204"
  }
}

The publishing and description keys here are "meaningless" in the sense that their role is to group a bunch of properties together by some shared characteristic (i.e. this is "presentational noise" pushed into the data, unfortunately a somewhat common JSON (and XML) pattern as well). Microdata does the same, only that it groups all "proper" properties together under properties.

However, as Gregg notes, this shape is unfortunately an "inverse" of the shape in this proposal. In the proposal example (and in the issues it attempts to solve), the term (LHS) represents a real property and its object keys (RHS) are the "void" annotations. In this microdata/"property group" case, what is needed is to ignore the term and "fold in" the object as if its keys where actually terms of the current object.

Perhaps @id: @annotation could be made to work like this. (Though if this is required, perhaps it's better to be explicit and define e.g. @id: @fold or @id: @group.) In any case, I don't think @container is the right vehicle for the microdata case, since the term is meaningless, so it is its @id we should treat specially. So I suspect this is a separate proposal (and perhaps not as pressing for 1.0 as the other cases are).

@lanthaler
Owner

RESOLVED: If '@container': '@annotation' is added to the JSON-LD Syntax, the feature MUST be round-trippable from .compact() to .expand() back to .compact()

RESOLVED: Add '@container': '@annotation' to the JSON-LD Syntax.

@lanthaler
Owner

What's the value space of @annotation in the body of a document? Is something like this allowed:

{
  "@context": {
    "author": "http://schema.org/author"
  },
  "@id": "http://example.org/article",
  "author": [
    {
      "@id": "http://example.org/person/1",
      "@annotation": "regular"
    },
    {
      "@id": "http://example.org/guest/cd24f329aa",
      "@annotation": {
        "role": "regular",
        "office": "XH13"
      }
    }
  ]
}

Or are just strings allowed? Even if this is allowed, such objects wouldn't be compacted by a annotation-container I guess. I do see some value in having something surviving expansion that doesn't map to an IRI.. but well, you could easily mint a (temporary) IRI if you need to.

@niklasl
Collaborator

I think the value space should be string only. The @annotation value in (expanded) objects is only used in compaction (to provide the key in the map for a term with @container: @annotation). If @annotation is used elsewhere I think it should be ignored.

@lanthaler
Owner
@niklasl
Collaborator

True, it must be kept during expansion.

@lanthaler lanthaler referenced this issue from a commit
@lanthaler lanthaler Add @annotation expansion test
This addresses #196.
3fa30af
@lanthaler lanthaler referenced this issue from a commit
@lanthaler lanthaler Only invoke language and annotation map expansion if the value is a J…
…SON object

See Gregg's changes in 8c546b9.

This addresses #133 and #196.
5cb6ba2
@lanthaler lanthaler referenced this issue from a commit
@lanthaler lanthaler Update expansion algorithm
This addresses #185 as well as #203, #142 and #196.
d64c560
@msporny msporny referenced this issue from a commit
@msporny msporny Added Data Annotations to JSON-LD Syntax specification.
This partially addresses #196.
5015a51
@msporny
Owner

I added the basic language to support data annotations in JSON-LD. Having written the text, I think we should rename "@annotation" to "@index", as that's actually what's going on here... the developer is stating that the JSON Object is being used as an index, and that processing should continue deeper into the tree. I think the word 'index' will resonate more with developers than 'annotation'.

PROPOSAL: Change the "@annotation" keyword to "@index".

@lanthaler
Owner
@msporny
Owner

Alright, good point, I withdraw my proposal.

@lanthaler - It looks like we have a number of algorithms that now include this feature, is the algorithm work done now? If so, we should close this issue.

@lanthaler
Owner
@lanthaler
Owner

I've just updated the syntax spec and sent a notification to the mailing list. Unless I hear objections I will close this issue in 24 hours.

@lanthaler lanthaler closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.