Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyper-schema: Link subject/anchor #140

Closed
awwright opened this issue Nov 9, 2016 · 16 comments
Closed

Hyper-schema: Link subject/anchor #140

awwright opened this issue Nov 9, 2016 · 16 comments

Comments

@awwright
Copy link
Member

awwright commented Nov 9, 2016

JSON Schema should be able to define the subject (anchor, or source) of a link relation.


Suppose I define a link right here (HTML):

<link rel="prev" href="139" />

We can express the information links convey in a format called N-Triples (a subset of the more general grammar Turtle). Since links by default have a subject of the current document, it would look like:

<https://github.com/json-schema-org/json-schema-spec/issues/140> iana:prev <https://github.com/json-schema-org/json-schema-spec/issues/139> .

But it might be the case we want to use JSON Schema to let documents make links about other documents to different other documents. A subject that's not <https://github.com/json-schema-org/json-schema-spec/issues/140>

Well in that case we need to allow JSON Schema to define what that "subject"/"source"/"anchor" is.

@Relequestual
Copy link
Member

Relequestual commented Nov 9, 2016

I don't understand the purpose of this at all. Could you expand with a practical use case?

@awwright
Copy link
Member Author

Suppose I have a JSON document that specifies metadata about a blog post, or another image, or something. In that case, I don't want to specify a link from the current document, I want to specify a link about some third resource.

So I have a document like:

[ {
post: 34
next: 35
}, {
post: 35,
next: 36
} ]

and I want to generate the following links:

<34> iana:next <35> .
<35> iana:next <36> .

With the HTTP Link header, you'd specify it like this:

Link: <34>;anchor="35";rel="next"
Link: <35>;anchor="36";rel="next"

@handrews
Copy link
Contributor

handrews commented Mar 25, 2017

@awwright @Relequestual @jdesrosiers this just clicked for me and I understand why it is essential. I endorse the "anchor" keyword (for compatibility with RFC 5988) over the "linkSource" keyword that was proposed in #61, but see further down for concerns over relative positioning.

The simple example is that without this feature, collections with the "collection" and "item" relations simply don't work. The key to understanding this is the following bit from the definition of "rel":

The relation to the target is interpreted as from the instance that the schema (or sub-schema) applies to, not any larger document that the instance may have been found in.

So if a link is attached to the sub-schema, then the context URI of the link (which is what is meant by "from") is the resource to which the sub-schema is attached, not the complete instance document (which corresponds to the complete schema document). Consider this minimal collection schema:

{
    "title": "Generic collection with id references",
    "type": "array",
    "items": {
        "type": "object",
                "properties": {
                    "id": {"type": "integer", "minimum": 1}
                },
                "links": [
                    {
                        "rel": "item",
                        "href": "/stuff/element/{id}"
                    }
                ]
            }
        }
    },
    "links": [
        {
            "rel": "self",
            "href": "/stuff"
        }
    ]
}

With this representation, retrieved from https://api.example.com/stuff:

[
    {"id": 1},
    {"id": 2}
]

The context URI of the collection's "self" link as applied to this instance is https://api.example.com/stuff#, because it is defined at the top level of the schema document.

However, the context URIs of the "item" link as applied to each of the two collection members are https://api.example.com/stuff#/0 and https://api.example.com/stuff#/1. That is what it means to interpret the relation to the target as being from the sub-schema.

But the "item" link is defined as relating a collection (the context) to one of its members (the target). However, those context URIs do not identify the collection instance. They each identify one array element. So we need a way to adjust the context URI to identify the collection, which would be https://api.example.com/stuff#

Why not just use the whole instance as context?

In other situations this turns out to be critically important. If you have a collection of books, and in addition to the "item" link for each element of the collection array, you also define an "author" link, the context URI should be for the book, which is the individual array element.

Similarly, each element in the collection array could have a "self" link, and once again the context should be the array item. The "self" link for the collection is defined at the top level.

You can also get more complicated with nested collection representations. So an "item" link from an inner collection element should have the inner collection as its context, not the collection described by the full schema document.

OK now what?

"anchor", if it follows RFC 5988 conventions, takes a URI reference. That's great if you know the absolute location of something, but what if you only know the relative location?

For instance, you may define a schema with an items link in "definitions", use it as a top-level collecition representation in one place, but a nested collection representation in another. The only way I know to solve this is with a Relative JSON Pointer, as proposed for "linkSource" in #61.

I don't think that giving the re-used schema an "$id" would help, because it doesn't uniquely identify a position in the instance. The main schema might even re-use the re-usable sub-schema multiple times in the same representation (no idea why, but it's legal), and that would definitely be ambiguous.

Perhaps we support two keywords:

  • "anchor" takes a URI reference and is analogous to "anchor" in other link serialization formats
  • "anchorPointer" takes a Relative JSON Pointer

(there is no way to do Relative JSON Pointers as URI fragments, which I can explain if anyone doesn't follow)

@jdesrosiers
Copy link
Member

I think anchor is a great idea. It seems like the kind of thing that I never knew I needed/wanted until I started thinking in those terms. We'll see if it actually works out that way :-).

The issues with "collection"/"item" might be why draft-04 went with "instances"/"full" instead.

@handrews I was with you up until "OK now what?". Can you elaborate on the purpose of anchorPointer?

@handrews
Copy link
Contributor

:-)

Yeah "anchorPointer" is a weird case, and if there's agreement on the general approach I would be fine with adding "anchor" while continuing to debate "anchorPointer". The main point of contention around "anchorPointer" is that some people really hate Relative JSON Pointer (but notably, have never come up with a better solution).

Both "anchor"'s URI reference and "anchorPointer"'s relative pointer are resolved against the instance, not the schema. The context of the link is the instance.

This means that if a link is deeply nested in an array item, and needs to set a less-deeply-nested object that is still within the same array item as the context, then there is no URI reference that can express that, as they all start from the root of the instance document. Plus, if you re-use a schema that needs to go up a level within the re-used unit, you can't know how the subschema is being re-used so you can't write a URI reference that will always work.

Here's a silly example which is not even attempting to be a coherent design of anything, just illustrate what happens:

The "1" is a Relative JSON Pointer, indicating the next enclosing object/array up (like ".." in paths).

{
    "definitions": {
        "foo": {
            "properties": {
                "outer": {
                    "properties": {
                        "inner": {
                            "links": [{"anchorPointer": "1"}]
                        }   
                    }   
                }   
            }   
        },  
        "bar": {"properties": {"x": {"$ref": "#/definitions/foo"}}},
        "baz": {"items": { "$ref": "#/definitions/foo" }}
    }
}

So, if you just had a JSON instance that matched foo on its own:

{"outer": {"inner": null}

Then "anchorPointer": "1" could be replaced with "anchor": "#/outer"

But with a bar instance:

{"x": {"outer": {"inner": null}}}

you would need to replace it with "anchor": "#/x/outer"

Finally, with a baz instance:

[{"outer": {"inner": null}}, {"outer": {"inner": null}}]

you can't write this one properly with "anchor" at all, because in the first one it would have to be "anchor": "#/0/outer", but it would be "anchor": "#/1/outer" for the second.

@handrews
Copy link
Contributor

As far as "instances" and "full": All of that dates back to at least 2009 (in Draft 00). The collection and item link relations were registered in 2012. My guess is that either no one realized the duplication, or no one ever got around to resolving it until now.

Although "instances" is not a direct analogue- the context of "instances" was the schema (same for "create"), which was confusing and (I'm fairly sure) unnecessary if we define the specific usage of "collection" and "item" for application/schema+json.

@dlax
Copy link
Member

dlax commented Apr 6, 2017

@handrews If I understand the "anchor" proposal correctly, based on your first example at #140 (comment), one should add an "anchor" within inner "rel: "item" link as:

{
    "title": "Generic collection with id references",
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "id": {"type": "integer", "minimum": 1}
        },
        "links": [
            {
                "rel": "item",
                "href": "/stuff/element/{id}",
                "anchor": "#"
            }
        ]
    },
    "links": [
        {
            "rel": "self",
            "href": "/stuff"
        }
    ]
}

is that correct?

@handrews handrews added this to the draft-07 (wright-*-02) milestone May 16, 2017
@handrews
Copy link
Contributor

@dlax yes, that is correct! I am going to write up a PR for the "anchor" proposal since no one seriously objected to it.

As for "anchorPointer", it may not be necessary now that I have spent more time thinking about this. I keep forgetting about the ability to declare plain-name fragment ids. So instead of needing relative JSON Pointers when you are unsure of the absolute position, the same effect could be created by naming the subschema that you want to use as an anchor and referencing it with a regular old URI.

I think I'd like to try writing that up as part of the "anchor" PR and see how that feels. If that is accepted, then maybe we will get feedback on whether "anchorPointer" is needed. If it is, I will open a new issue for it based on the new information.

@handrews
Copy link
Contributor

Hmm... now I'm realizing my point about naming subschemas isn't as useful as I thought.

Writing up a PR, I was quickly reminded that the context URI is an instance URI, so naming subschemas is irrelevant.

Per the "rel" spec (hmm... need to fix those dangling propositions, and align this with the "context" terminology from RFC 5988):

The relation to the target is interpreted as from the instance that
the schema (or sub-schema) applies to, not any larger document that
the instance may have been found in.

There's another implication here:

Using JSON Pointer-style URI fragments for "anchor" at all is dubious, as application/json does not technically support such fragments. While "anchor" still has uses (as discussed by @awwright earlier), the traditional URI Reference approach of "anchor" to change the context within the document does not work because JSON has no fragment addressing defined.

"anchorPointer" would in this sense be easier and more flexible, as it is not a URI Reference, so its resolution rules can be specified within JSON Hyper-Schema.

@dlax
Copy link
Member

dlax commented Aug 19, 2017

@handrews Can't we say that the anchor's value must be a URI reference to a (sub)schema and has the effect to override the context with the instance pointed by referenced (sub)schema?

@handrews
Copy link
Contributor

@dlax I did consider that and worried that it was too convoluted. Although it makes perfect sense to me (however, experience tells me that this is not an effective metric for whether it makes sense to anyone else ;-)

More seriously, "anchor" has existing semantics in RFC 5988 and I would prefer to conform with that. So while I like this idea, it needs its own JSON Hyper-Schema-specific keyword. I actually halfway wrote it up as "anchorSchema" before deleting it in favor of what I did post.

One reason that I did not post "anchorSchema" was that it's behavior can easily be ambiguous. For instance, if you point to a subschema value of "items" (the single subschema syntax, not the array syntax), it's unclear which item in the instance should be used as the context. There may be some way to disambiguate it enough to be useful, but it's not immediately obvious to me.

Also, the value could be a URI Reference, but it would only make sense when referencing another part of the schema that is currently being used to validate the instance. I would not want to allow "anchorSchema" to kick off another validation. So that's another complexity to sort out, I think.

@dlax
Copy link
Member

dlax commented Aug 20, 2017

I did consider that and worried that it was too convoluted. Although it makes perfect sense to me (however, experience tells me that this is not an effective metric for whether it makes sense to anyone else ;-)
More seriously, "anchor" has existing semantics in RFC 5988 and I would prefer to conform with that.

I find this quite symmetrical with how Hyper-Schema's LDO are currently defining the context which is already deviating from RFC5988 because the link is not on the instance. Accordingly, I don't think it'd be more convoluted to assume that anything related to link's context (here, the anchor) should be considered through the instance's schema indirection.

Anyways, I'm trying to figure out limitations of this "simple" approach to get convinced that something more complex is needed (e.g. #351). For relative references to subschemas within the same schema (the case of rel="item" link), I see no issue. But clearly, this is different when one wants to refer to a fragment of another schema document; maybe this is what you meant to say in your last paragraph at #140 (comment), @handrews.

@handrews
Copy link
Contributor

@dlax we do not yet use any keywords that are also used in RFC 5988 in completely different ways. The most different is "href" being a template. But really an LDO only becomes an actual link when the template is resolved, and at that point "href", "rel", "mediaType", and "title" have the same semantics as in RFC 5988.

Since JSON Schema does not control the exact media type of the instance, it is not always possible to calculate a context URI that your definition would produce. There may be no way to associate a URI fragment with the position in the instance that the designated schema matches. If we were able to do that, it would map down into RFC 5988 semantics and I might have been persuadable.

I still don't see how referencing another schema produces unambiguous instance locations. A given sub-schema may validate many points within the instance. It may not always be clear which is intended or desirable.

@handrews
Copy link
Contributor

I still don't see how referencing another schema produces unambiguous instance locations. A given sub-schema may validate many points within the instance. It may not always be clear which is intended or desirable.

@dlax I didn't really express this well earlier. We may be able to come up with rules that makes this unambiguous, in which case I would really like this idea (independent of the keyword we use for it).

This reminds me a bit of the debate on how powerful relative pointers need to be (e.g. do you need to be able to move laterally in a parent array, which the current Relative JSON Pointer spec does not support). I should try to find that discussion, as I think we managed to prove that some of the use cases really had no application. A similar approach could work here.

How would you feel about splitting this off into a new issue, and letting the current "anchor" PR resolve this one? I doubt anyone else will be willing to get involved in an issue this long, and I think there's a really interesting discussion to be had with this idea.

@dlax
Copy link
Member

dlax commented Aug 21, 2017

How would you feel about splitting this off into a new issue, and letting the current "anchor" PR resolve this one? I doubt anyone else will be willing to get involved in an issue this long, and I think there's a really interesting discussion to be had with this idea.

@handrews That's fine by me.

@handrews
Copy link
Contributor

handrews commented Sep 2, 2017

"anchor" added in #352, everything else tracked in #381 and #382.

@handrews handrews closed this as completed Sep 2, 2017
@ghost ghost removed the Priority: Critical label Sep 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants