Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using/defining the "schema" link relation type #522

Open
dret opened this issue Dec 19, 2017 · 28 comments
Open

using/defining the "schema" link relation type #522

dret opened this issue Dec 19, 2017 · 28 comments
Labels

Comments

@dret
Copy link
Contributor

dret commented Dec 19, 2017

it seems that -07 recommends using schema-typed links (as one method) to link an instance to a schema. this link relation is not registered (yet). this means either there should be some spec doing this, or maybe the JSON schema draft should do it. if the latter and you need some help, let me know.

@handrews
Copy link
Contributor

@dret thanks! Yes, we were planning to register it as part of the JSON Schema draft. Or motivated by the JSON Schema draft, or whatever the appropriate mental model is. But I have no idea what approach is likely to be successful, and any advice would be appreciated.

Should we try to register it now, or is that something that must wait until we're at RFC or at least adopted by a working group?

@handrews
Copy link
Contributor

I almost put in an extension link relation like tag:json-schema.org,19-11-2017:schema or something, but that seemed more likely to confuse people, and then folks would have to change it again if/when we got the link relation registered.

@dret
Copy link
Contributor Author

dret commented Dec 19, 2017 via email

@handrews
Copy link
Contributor

definitely never ever put identifiers in specs and then change them. that’s the X- disaster all over again...

Hah! Yeah, makes sense. We only changed from "profile" to "schema" due to the discussions here that "profile" really wasn't quite right. Given that "schema" is a pretty common concept applying to many media types, it seems like a good candidate for link relation / media type parameter.

I did just see dret/I-D#92 about maybe using HTTP Prefer or some other proposal instead / in addition to media type parameters for "profile", so I'd be interested in considering similar options for "schema" if that makes sense.

I see "profile" and "schema" as similar, but while "profile" identifies a broadly applicable subset, "schema" identifies a set of values suitable for a much more specific purpose. It can also identify something broad, but the ability to be specific is what (I think) distinguishes it.

@handrews handrews added the core label Dec 19, 2017
@dret
Copy link
Contributor Author

dret commented Dec 20, 2017 via email

@dret
Copy link
Contributor Author

dret commented Dec 20, 2017 via email

@handrews
Copy link
Contributor

a schema sets the basics in how to represent concepts

This seems like a good key distinction: the schema is about representation. You may have different valid and useful representations of an abstract type (which might clarify the distinction between "type", registered as describing the "abstract semantic type") vs "schema" (describing a particular way to represent that type in a data model, which may then be encoded in one or more media types).

So you could potentially negotiate both the schema (describing the data model for a representation) and media type (the concrete encoding of that data model into a JSON/YAML/XML/whatever document).

@dret
Copy link
Contributor Author

dret commented Dec 30, 2017 via email

@handrews
Copy link
Contributor

schemas are defining models and their representation, let's say an XML
model in terms of how it is represented in XML concepts. that tells you
everything you need to know in terms of how this will look on the wire.
same for JSON schemas.

JSON Schemas work on a data model that is derived from but not specific to JSON. This is why I view the schema and media type as separate. JSON Schema keeps them separate, and the expectation is that people may use JSON, JSON5, YAML, TOML, CBOR, Protobuf, whatever with them. Some of those are a better fit than other, but the extensibility of JSON Schema allows for describing concepts more precise than can be directly expressed in the data model. You could probably make it work for XML to some degree, but I can't imagine why anyone would.

So the data model and media type are not entirely separate, but they are definitely not identical. Any definition of "schema" that requires them to be would be problematic for JSON Schema.

@dret
Copy link
Contributor Author

dret commented Dec 30, 2017 via email

@handrews
Copy link
Contributor

handrews commented Dec 30, 2017

Thanks, @dret this discussion is very helpful.

It's definitely occurred to me that "schema" is not necessarily the best name for JSON Schema, particularly given that people use it for so many things including data model definitions (code generation, doc generation, ui generation) to the extent that we are looking at adding vocabularies specifically for those purposes.

Then again, the project's been called JSON Schema for many years, and changing the name would likely kill any momentum of the larger ecosystem, so... ¯\(ツ)

@awwright
Copy link
Member

awwright commented Jan 1, 2018

I was only casually paying attention, but I wasn't actually aware the specification changed the link relation we were using. No existing link relation was sufficient?

  • "profile"
  • "describedBy"
  • "type"

For example

Link: <http://example.org/Person.schema.json>; rel="type"; type="application/schema+json"

https://tools.ietf.org/html/rfc5988 (HTTP Link header, including description of the "type" attribute)
https://tools.ietf.org/html/rfc6906 (rel="profile")
https://tools.ietf.org/html/rfc6903 (various relation types including rel="type")

I know @dret suggested "profile" was incorrect for us but my reading of RFC6906 suggests we were using it exactly correctly. Perhaps I wasn't describing what we're intending to do very clearly?

@handrews
Copy link
Contributor

handrews commented Jan 1, 2018

@awwright duuuuuuuuude....

  • Here is the PR on which I requested a review from you, tagged you in the issue comments, and held it open for an extra 2.5 weeks beyond the usual 2 week period specifically trying to get a review from you. By name.
  • Here is the issue in which you and I discussed the "profile" relationship extensively with @dret. You even assigned the issue to yourself when I posted that PR, and then didn't comment on it for over a month.

At some point, I can't keep waiting. I know I pinged you on email and slack because I know you are interested in this area. This is the 2nd time in the past month that someone has wanted to revisit something that I begged and pleaded for comments on. I don't know what to do when people won't reply in any meaningful way for over a month. I try direct email, @-mentions, mailing list, IRC, slack, anything and everything I can think of. I announced the final review period everywhere, often repeatedly, and it was open for a solid month. I did everything I could think of. In all seriousness, what am I missing? What do people need in order to make a timely review before publication? Or at least skim the change log, which lists this change?

Regarding "profile", if you want to keep arguing with @dret about it go ahead, but I found his reasoning quite clear and see no reason to revisit it. I don't know how you read him clearly stating that schemas are not profiles and come to a different conclusion.

You also cited HTML's "profile" , which does seem much closer to what we would want, but the response to that is that just because HTML uses the same word, that does not mean that it is using it in the RFC 6906 sense, and you never replied to that concern (or made any further replies on the issue at all for the remaining ten months that it was open).

There is also the Accept-Profile proposal which is specifically using a different definition of profile than RFC 6906. So I suppose we could define our own "profile" definition, but since we're talking about a media type parameter and link relation, unlike with HTTP headers, that term is already in use. We can't just re-define it to agree with some other definition somewhere else that suits us better.

Regarding "type", as far as I can tell we both agreed that it was not quite right. It is specified to identify the "abstract semantic type", which is a more generic concept. If I identify something's "type" as "car", there may be numerous schemas that describe different ways of representing a car. In particular, as the representation evolves, new schema versions will be published and used, but the "abstract semantic type" remains the same.

So I see "type" as serving a clear purpose of abstracting away concrete representation details, while the relation we need is specifically about concrete representation.

As for "describedBy", it's still in the spec and still doing exactly what it was doing before. Although I'm not actually sure we should continue using it. AFAICT it is more often used for things like human-readable documentation, which would be useful alongside of schema links. We were also using it because "profile" is specifically an identifier and not a locator, so "describedBy" was to be the locator. But we could define things differently with our own media type parameter / link relation if we want. I'm definitely not dead-set on getting rid of it, but once we settle on the behavior we want from our own relation type we should make sure our usage still makes sense.

@awwright
Copy link
Member

awwright commented Jan 2, 2018

@handrews Actually I do remember a lot of that now that you mention it.

Have we pinged @RubenVerborgh to get his take on how/if his usage is similar to ours & RFC 6906?

Edit: Yeah you did that too, nice.

Edit: My reading of rel="type" is that it's extremely generic:

The "type" link relation references the payload's abstract semantic type
[...]
If the context can be considered to be an instance of multiple semantic types, multiple "type" link relations can be used.

So while something more specific would be preferable, but I still think could be relevant

@RubenVerborgh
Copy link

@handrews
Copy link
Contributor

handrews commented Jan 2, 2018

@awwright I think that "type" is relevant, just not sufficient. I would actually use it alongside of a more specific link relation, such that in an API that evolves (or even does coarse-grained versioning in the base URI) the resources would have a stable "type" but different schemas.

So I could use the same "type" values across the current kinda-REST-ish version of the API I'm working with, and also with a replacement fully RESTful API, even though the schemas would be very different. I'm not sure how that would be of use to clients off the top of my head, but it seems like a worthwhile distinction.

I think we need to figure out what level of specificity we want. I have been going for something more specific, particularly to enable content negotiation. But I think that Accept-Profile is the most promising avenue for schema-based content negotiation, which makes me a bit less concerned. Still, we need a media type that will work for us in non-HTTP environments (e.g. will CoAP adopt Accept-Profile as well?)

@dret
Copy link
Contributor Author

dret commented Jan 2, 2018 via email

@RubenVerborgh
Copy link

@dret Will this updated version of the RFC explain how to achieve such a preference with existing means, or will it introduce a new header (as we're planning for https://github.com/profilenegotiation/I-D-Accept--Schema/)?

@dret
Copy link
Contributor Author

dret commented Jan 5, 2018 via email

@handrews
Copy link
Contributor

handrews commented Jan 5, 2018

@dret but to clarify, your "profile" preference would use your definition of "profile", which excludes schemas, so could not be used for JSON Schema, correct? So our options are:

  • Register a "schema" preference
  • Jump on the "Accept-Profile" bandwagon since it uses a broader definition of "profile" that includes schemas
  • Come up with something else (like our own media type with its own parameters such as a "schema" parameter)

Is this correct?

@dret
Copy link
Contributor Author

dret commented Jan 5, 2018 via email

@dlax
Copy link
Member

dlax commented Jan 6, 2018

@handrews

I think we need to figure out what level of specificity we want. I have been going for something more specific, particularly to enable content negotiation. But I think that Accept-Profile is the most promising avenue for schema-based content negotiation, which makes me a bit less concerned.

I don't quite get the "schema-based content negotiation" part. Can you clarify? In my understanding, an instance can only have one schema. This contrasts with profiles (in the sense of "application profiles" a la Dublin Core); an instance may be represented through several profiles.

@handrews
Copy link
Contributor

handrews commented Jan 6, 2018

@dlax certainly!

What I'm thinking of is a REST API that evolves representations at the per-resource granularity, rather than doing coarse grained URI-based versioning.

In this view, resources (the abstract things on the server) are not versioned. If I have Person resources, the resource is always Person. Person is a concept, it does not change.

However, the representation of a Person in a given API will likely changes. Fields are added, enum sets are changed, fields may even be removed in a compatible way if they were not required.

The versioning of the representation is expressed by assigning a new schema to each successive version (and the URIs for the schemas probably do have some sort of version in the URI- semantic versioning, a date-time stamp, whatever).

So let's think particularly of the case where the representations are all compatible with each other, and the schemas are designed to support that, meaning:

  • Schemas never set "additionalProperties": false so that fields can be added without causing validation to fail
  • Schemas avoid required as much as possible, so that fields can be removed without causing validation to fail

There are other constraints you can use on schema design, but these illustrate the point.

So:

{
    "$id": "https://example.com/schemas/some-entity/1.0.0",
    "type": "object",
    "properties": {
        "foo": {"type": "integer"},
        "bar": {"type": "string"}
    }
}
{
    "$id": "https://example.com/schemas/some-entity/1.1.0",
    "type": "object",
    "properties": {
        "foo": {"type": "integer"},
        "stuff": {"type": "boolean"}
    }
}

The instance:

{
    "foo": 42,
    "bar": "hello",
    "stuff": false,
    "nonsense": null
}

validates against either the 1.0.0 or 1.1.0 schemas. If you're doing something with bar you need 1.0.0. If you're doing something with stuff you need 1.1.0. If you're doing something with foo either will work. And there's no guidance in either version on what nonsense should look like or how to use it.

So if I'm an API client using the "some-entity" resource, I may have started out while it was on version 1.0.0. I would use content negotiation (media type parameter, HTTP Prefer header preference, or a new Accept-Profile or Accept-Schema header) to ask for a representation matching schema version 1.0.0.

The response can come back with the instance above and link to both the 1.0.0 and 1.1.0 schemas, because it is valid according to both and therefore usable as either.

I can use this information to log that the resource has been updated, and then a human can decide whether to start asking for 1.1.0 (because stuff is needed) or stay with 1.0.0 (because bar is still needed).

I can go into more detail but I'll pause here to see if this is making sense.

@dret
Copy link
Contributor Author

dret commented Jan 7, 2018 via email

@dret
Copy link
Contributor Author

dret commented Jan 7, 2018 via email

@dlax
Copy link
Member

dlax commented Jan 7, 2018

@handrews

I can go into more detail but I'll pause here to see if this is making sense.

This makes sense, thanks!

@dret

ever heard of DSDL? there almost by definition each instance has multiple schemas. it’s actually a very clever approach: modularize validation like any other non-trivial task, and allow schemas to be composed of multiple languages with each language having a specific focus.

Actually, I was more thinking about "schema as a data model" rather than "schema as a validation tool".
@handrews explained his intended usage of content negotiation for the former "definition" and I now see how a client may ask for a resource that follows a given data model.
So about multiple validation and DSDL, yes, a resource may have several complementary schemas, but it's not clear to me how negotiation would come into play as far as validation is concerned. Would a client ask for a resource that validates with a particular technology? Would that resource's representation be different if another technology had been asked? Maybe it's just irrelevant and only the data model (or profile) point of view matters for negotiation.

@dret
Copy link
Contributor Author

dret commented Jan 7, 2018 via email

@garethsb
Copy link
Contributor

garethsb commented Nov 6, 2019

https://json-schema.org/latest/json-schema-core.html#rfc.section.11.1 proposes:
Link: <https://example.com/my-hyper-schema#>; rel="describedby"

But then the final paragraphs of https://json-schema.org/latest/json-schema-core.html#rfc.section.11.2 has:
Link: </alice>;rel="schema", </bob>;rel="schema"

I don't understand why these use different link relations?

Sorry if I should have raised this as a new issue, @handrews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants