Skip to content
This repository has been archived by the owner on Mar 19, 2019. It is now read-only.
This repository has been archived by the owner on Mar 19, 2019. It is now read-only.

Proposal for next draft: Remove 'id' keyword #77

Closed
gazpachoking opened this issue Feb 20, 2013 · 11 comments
Closed

Proposal for next draft: Remove 'id' keyword #77

gazpachoking opened this issue Feb 20, 2013 · 11 comments

Comments

@gazpachoking
Copy link

So, I propose "id" keyword is removed from the next draft of json schema. Caveat: I have not implemented many schemas, so maybe I am missing some areas where it is extremely beneficial.

My objections:
Inline dereferencing:

  • inline dererferencing is optional. It does not do anyone favors to have schemas that are not portable between implementations
  • it is not trivial to scan a json schema and identify which "id" keys are actually keywords, for example:
{
    "x": {
        "id": "it is impossible to know if this is an id keyword or not"
    }
}
{
    "dependencies": {
        "id": "this is not an id keyword"
    }
}

Canonical dereferencing:

  • due to the id keyword, it is possible to have $refs that are in a different scope, depending on how they are referenced. consider:
{
    "id": "base",
    "items": {
        "id": "anitem",
        "anyOf": [{"$ref": "#/x"}]
    },
    "properties": {
        "a": {"$ref": "#/items/anyOf/0"}
    },
    "x": {"type": "integer"}
}

the $ref within items has a different scope depending on if we are evaluating it from the items keyword (anitem#/x) or the properties keyword (base#/x). JSON reference does not have this problem, as the scope is always the uri of the document.

Maybe I just need to see some examples where id is highly beneficial, but it seems to me that you might want to use the id property, you could just as easily $ref the external schema. Simple inline referencing to fragment ids might be helpful in some situations, but scanning an arbitrary json object to identify them is not. And given the definitions keyword, it is much more helpful to use json pointer references to subschemas stored within that keyword. I just do not see the the benefit of id outweighing the difficulty if implementing it, especially given support may not be consistent throughout implementations.

@geraintluff
Copy link
Member

__sigh**

All good points.

The main benefit as I see it is that it allows you to "bundle" multiple schemas into a single document. In bandwidth-limited scenarios, the time and bandwidth overhead of a second request can be non-trivial, especially if you have a host of small schemas you are referencing.

However, given all the fuss that "id" has produced, removing it might well be a good move. People simply aren't using it properly. Small schemas written for low-bandwidth applications can be joined into a single document and referenced using #/definitions/..., which is not always ideal, but might be good enough.

@fge: You probably never thought I'd say that. :p

@fge
Copy link

fge commented Feb 20, 2013

@geraintluff Indeed I'd have never thought that :p

@gazpachoking I admit I am relieved that someone else came to the same conclusion as I did, independently -- so thank you for that ;)

I was just too upset to be able to be impartial on that matter anymore -- and I still am.

This has the consequence, of course that we need to think about a "clean room" addressing policy for schemas. JSON Reference should be the base, I think nobody disagrees on this; but we need to "anchor" the schemas one way or another. Most of all, I wish for what is decided to be completely unambiguous, that is no implementation can get it wrong, and it is fully testable.

I am still too upset to even come up with something at this moment though :p

@nickl-
Copy link
Member

nickl- commented Feb 23, 2013

@gazpachoking thank you for your time and the valuable contribution of elaborate analysis you've documented here. Kudos!

I agree with you and precedents certainly exist for the use of an endpoint (physically referenceable or in identification only) to be a logical first choice.

Personally I've been looking to see relation types been put to purpose, equally suited for this in contemplation, yet remains absent from current proposals..

@awwright
Copy link

inline dererferencing is optional.

This isn't a problem with "id" per se, this is just poorly defined behavior.

Choosing to define a particular method of dereferencing, "inline dereferencing" especially, was a particularly bad decision. In practice, dereferencing is an implementation-dependent function that cannot be (re-)defined. And how implementations want to support looking for schemas, given a URI, is up to them. Do you consult a database? (I do this primarily.) Do you go over the network? (Hopefully not, though my schemas are usually available at their advertised URIs for populating a database with.) Do you scan and load into memory a list of files? (I do this too.)

This is good, because although implementations will normally wish to scan schemas for inline schemas to also use, this would not make any sense in my database, where "id" should already parsed out.

it is not trivial to scan a json schema and identify which "id" keys are actually keywords, for example...

In the given example, /x is not somewhere you would expect to find a schema according to the meta-schema and the specification, therefore it's not an "id".

That is to say, the question isn't 'where do I find "id"s?' It is 'Where do I find schemas?' and this is very well-defined by the meta-schema, and the specification.

due to the id keyword, it is possible to have $refs that are in a different scope, depending on how they are referenced. consider:

This is a highly regarded feature! The fact that URIs are resolved against the current document, and that documents are allowed to define their own URI, is a core functionality of Web technology. Any other behavior would definitely be a bug in implementing RFC 3986.

JSON reference does not have this problem

JSON Reference supports this behavior. The hang-up is that most JSON documents don't change their URI within the document. Nonetheless, the practice of changing a URI within a document, and the ability to embed documents inside other documents, is a common one, especially in the Web.

you could just as easily $ref the external schema

While the proposal is not that we get rid of the URI, it is just that all URIs would have to be provided out-of-bound. This doesn't seem very helpful, and normally it's expected that documents are allowed to define their own URI or base URI.

it is much more helpful to use json pointer references to subschemas stored within that keyword.

How would this mechanism work, exactly? What if you have multiple nested schemas?

Ideally (perhaps not currently), here's a few of the things you could use "id" for:

(1) Easily building recursive schemas. A schema author may want to write a schema with three nested schemas that serve different functions, and the innermost schema may itself either be an instance of the root schema, or the second-in schema, depending on the context. You would need to break this up into three seperate documents, and assign each of them an id out-of-bound, even though you're using those id's within the document. This does not strike me as prudent, that we define how to refer to another schema, but do not define any method of labeling them.

(2) Labeling individual parts of the document. "id" ideally allows for such constructs like being able to give every schema an "id":"#name" and refer to it with a URIRef (like the XML/HTML "id" attribute, and other web technologies... How do user-agents find these ids? Well you have to parse the document, normally against a schema.).

(3) Ability to carelessly embed schemas inside other schemas, to substitute a {$ref:"..."} expression for a {"id":"...", ...} expression, and have it carry the same meaning. (** This comes with an asterisk, that the schema URI must be preserved, by definition, and not merely the URI Reference (URIRef) if any, for the sake of parsing child URIRefs. This should be obvious, this behavior also is the same as in other Web technologies which use relative URIRefs. This can be done by inserting the absolute URI as the URIRef, or preserving the base URI by some other means.)

@fge
Copy link

fge commented Feb 26, 2013

On Tue, Feb 26, 2013 at 9:41 AM, Acubed notifications@github.com wrote:

inline dererferencing is optional.

This isn't a problem with "id" per se, this is just poorly defined behavior.

No.

{
"id": "foo",
"subschema": { "id": "bar" }
}

I am sorry, but this is exactly why inline dereferencing had to be "created".

Choosing to define a particular method of dereferencing, "inline dereferencing" especially, was a particularly bad decision. In practice, dereferencing is an implementation-dependent function that cannot be (re-)defined. And how implementations want to support looking for schemas, given a URI, is up to them. Do you consult a database? (I do this primarily.) Do you go over the network? (Hopefully not, though my schemas are usually available at their advertised URIs for populating a database with.) Do you scan and load into memory a list of files? (I do this too.)

See? Plenty of possibilities. Nothing illegal. A mess. KISS!

[...]

due to the id keyword, it is possible to have $refs that are in a different scope, depending on how they are referenced. consider:

This is a highly regarded feature!

By whom? For what?

[...]

JSON reference does not have this problem

JSON Reference supports this behavior. The hang-up is that most JSON documents don't change their URI within the document. Nonetheless, the practice of changing a URI within a document, and the ability to embed documents inside other documents, is a common one, especially in the Web.

No. Remember that JSON References are always supposed to be resolved
against the URI of the document. The fact that "id" redefines URIs
to resolve against is an extension to JSON Reference.

you could just as easily $ref the external schema

While the proposal is not that we get rid of the URI, it is just that all URIs would have to be provided out-of-bound. This doesn't seem very helpful, and normally it's expected that documents are allowed to define their own URI or base URI.

it is much more helpful to use json pointer references to subschemas stored within that keyword.

How would this mechanism work, exactly? What if you have multiple nested schemas?

See "definitions".

Ideally (perhaps not currently), here's a few of the things you could use "id" for:

(1) Easily building recursive schemas. A schema author may want to write a schema with three nested schemas that serve different functions, and the innermost schema may itself either be an instance of the root schema, or the second-in schema, depending on the context. You would need to break this up into three seperate documents, and assign each of them an id out-of-bound, even though you're using those id's within the document. This does not strike me as prudent, that we define how to refer to another schema, but do not define any method of labeling them.

No. See "definitions".

(2) Labeling individual parts of the document. "id" ideally allows for such constructs like being able to give every schema an "id":"#name" and refer to it with a URIRef (like the XML/HTML "id" attribute, and other web technologies... How do user-agents find these ids? Well you have to parse the document, normally against a schema.).

See "definitions".

(3) Ability to carelessly embed schemas inside other schemas, to substitute a {$ref:"..."} expression for a {"id":"...", ...} expression, and have it carry the same meaning. (** This comes with an asterisk, that the schema URI must be preserved, by definition, and not merely the URI Reference (URIRef) if any, for the sake of parsing child URIRefs. This should be obvious, this behavior also is the same as in other Web technologies which use relative URIRefs. This can be done by inserting the absolute URI as the URIRef, or preserving the base URI by some other means.)

The same effect can be achieved if there is a simple mechanism to
anchor schemas to a given URI (to define its URI, in your own words).
So, "id" is not even needed there.

Francis Galiegue, fgaliegue@gmail.com
JSON Schema in Java: http://json-schema-validator.herokuapp.com

@awwright
Copy link

awwright commented Mar 1, 2013

"definitions" cannot be a replacement for "id", it is strictly complimentary. Consider a post I just made to the mailing list:

JSON Hyper-schema allows me to give individual objects within an instance a URI. Consider a blog post stored in a document database:

// application/json; profile=http://example.com/blog
{ subject: "/2012/11/20/my-first-blog-post"
, author:"/users/awright"
, posted:"2012-11-20T23:40:22Z"
, body:"<p>Hello</p>"
}

I can say that subject has a rel=self, therefore giving that object a URI when a JSON Hyper-schema parser passes over it:

// application/schema+json; profile=http://json-schema.org/draft-04/hyper-schema#
{ id: "http://example.com/blog"
, links: [ {href:"{subject}", rel:"self"}, {href:"{author}", rel:"author"} ]
}

This is an extremely common use-case. Any time you sent JSON documents over HTTP, in request or response, you are exchanging a resource, which often has a URI. Blog posts and other media frequently encode their URL the resource, or otherwise or have some other way to generate a "self" link from the available data. An entire JSON vocabulary, JSON-LD (currently being looked over to become a W3C Recommendation) also deals with exchanging URIs in this manner.

Within this blog post, perhaps I want to refer to the mere contents, the "body". So you have: /2012/11/20/my-first-blog-post#/body

Now maybe I want to export a list of blog posts:

// application/json; profile=http://example.com/bloglist
[ {subject: "/2012/11/20/my-first-blog-post", author:"/users/awright", posted:"2012-11-20T23:40:22Z", body:"<p>Hello</p>"}
, {subject: "/2013/01/04/my-second-blog-post", author:"/users/awright", posted:"2013-01-04T16:14:50Z", body:"<p>Good afternoon</p>"}
]

I can embed the schema for {blog post} inside a schema:

// application/schema+json; profile=http://json-schema.org/draft-04/hyper-schema#
{type:"array", items:{$ref: "http://example.com/blog"}}

And with this, the blog posts retain their semantics, including their respective URIs. Now I want to refer to the same JSON instance as before, but this time all I have is this collection that I exported from my database. You still have: /2012/11/20/my-first-blog-post#/body. Same resource, same URI, as expected. Try doing this with JSON Reference alone. How the resource is encoded doesn't matter, the fact is that given the same URI, it always refers to the same resource. Being able to look up an arbitrary instance from a JSON instance (e.g. "Find me the second item in this JSON array") is but one use-case: Being able to say "This resource is xyz" is also extremely valuable to Web technologies. While HTTP is the quintessential protocol for returning an information resource, they can be embedded anywhere, for a variety of reasons.

All these schemas are stored somewhere, of course. Perhaps I want to define new content-types in addition to blog posts. So these schemas are stored in the database, like blog posts and users are. The meta-schema is also stored with this collection, to describe the collection of schemas.

Now suppose I want to export this entire database, to keep a backup, or maybe someone else wants to import the data into their own database, maybe a document database, or a relational database, or a resource database (e.g. make PUT requests to an HTTP server). We can describe this export with a schema:

// application/schema+json; profile=http://json-schema.org/draft-04/hyper-schema#
{ type: "object"
, properties:
  { "version": {type: "string", "maxLength": 64}
  , "posts": {type: "array", items: {$ref: "http://example.com/blog"}}
  , "users": {type: "array", items: {$ref: "http://example.com/user"}}
  , "types": {type: "array", items: {$ref: "http://json-schema.org/draft-04/hyper-schema#"}}
  }
}

I use $ref here, especially for brevity, but it may be a better idea to simply substitute the schema in, so we have all the schemas together in one coherent document. Maybe my blog post schema and other schemas also contain meta-data on how to map fields to relational database rows and tables, perhaps that's how I've been storing the data.

So now I've just exported my whole database by first reading a JSON Schema, then using that to read the database and generate an export. And any JSON Hyper-schema parser can read this one schema, and then consume my database export, and generate a list of URIs of JSON instances found within it that correspond to my blog post JSON instances, and import it back into the database, into a relational database (if it understands my custom vocabulary), or into a resource database.

But woe is me, while my blog posts and schemas are fully parsed, my schemas and meta-data for those users and blog posts, under this proposal, doesn't get the same privilege. There would no longer a way to refer to them with a consistent, uniform URI.

The purpose of URIs is to decouple them from their physical location. Almost all formats with hyperlinking using URIs have methods of defining one's own URI. Even JSON Schema gives this option to instances with the rel=self relation, so excluding JSON Schemas makes no sense. Asking people to change their URI because of context, especially something as ubiquitous as a JSON Schema, is not an option.

@gazpachoking
Copy link
Author

@ACubed Some of that was a bit over my head, I'm not too knowledgeable on the hyper-schema stuff, so forgive me if I missed something. I agree there should be a way to define the base uri of a document within the document, my gripe is when the id keyword is nested within a document somewhere. I didn't see that being done in your example, can you help me understand where that is needed in your example?

@awwright
Copy link

awwright commented Mar 1, 2013

First confirming some definitions:

A base URI is the URI of a document, or at least the URI that other URI references are resolved against. By definition it does not have a fragment part (the fragment part of a base URI is never preserved, resolving the blank URI <> against another URI effectively strips the fragment from that URI).

A URI reference is a relative or absolute URI, to be resolved against the base URI. Simply saying URI is considered to be absolute and already resolved against the base URI.

Some implementations let you define a base URI without defining the URI of the document/information resource, and/or let you define the URI of the document without setting the base URI that URI references are resolved against. Generally you want to do both.

The important thing to know about JSON Hyper-schema is that you can define relationships between data. For instance, "This blog post has an author identified by this URI". You can also say "this object is identified by this URI". You can probably imagine going through an array of blog posts, seeing that they have a certain URI, and requesting that URI from an HTTP server (if it's an HTTP URL), and getting back that same object.

The same thing goes for JSON Schema. It's especially important for JSON Schema, schemas of which are often re-used and re-embedded in multiple places. I keep a database just like I described, the same schema will be found embedded inside a schema, or available by itself.

Consider a schema describing a blog post, and a schema that describes an array of blog posts (for example, listing posts on a front page). I want to ship a document describing the latter case, a schema of a list of posts. In this case, downloading the additional schema defining the blog post can't be done (downloading may not be an option), and the proposed method of embedding the schema doesn't make sense -- the URI of the schema that identifies blog posts would necessarily change. This is a problem since, among other things, these posts would be served with Content-Type: application/json; profile=http://example.com/blog not Content-Type: application/json; profile=http://example.com/blog-list#/definitions/blog).

@seagreen
Copy link

seagreen commented Oct 9, 2015

I wonder how many JSON schema implementations currently implement "id" correctly? Until this gets finished: json-schema-org/JSON-Schema-Test-Suite#20 (if ever) we may have to compile the list by hand, but it still wouldn't take too long.

@sgpinkus
Copy link

@ACubed tl;dr - most of it. Fact is, there is a clear problem with id, but I think in general people get caught up in false dichotomies.

👎 for removing the id keyword
👍 for renaming the id keyword, "$id", to avoid conflicts @gazpachoking mentioned in first post.
👍 for deprecating using the id keyword for base URI resolution. This is hideous.
👍 for 2 types of references JSON Pointers (#/foo) and flat refs ("#foo") that ref an id (or rather an "$id").
👍 For there being one non optional simplified type of dereferencing - not "canonical" and "inline".

@fge analyses this well in https://github.com/json-schema/json-schema/wiki/The-%22id%22-conundrum#how-to-fix-that. Completely agree with that.

As an aside, I would say its worth looking at how XSD does internal and external refs (much more restrictive than is proposed in json schema v4), and how HTML does base URI adjustment (with a different keyword -- and who actually uses base in HTML anyway...)

@handrews
Copy link

handrews commented Oct 23, 2016

@gazpachoking a number of things about id and $ref have been tidied up in the latest internet draft. Spec work for that is going on at https://github.com/json-schema-org/json-schema-spec
Could you please close this, and if you still have concerns with the new wording, open issues for those at the new repo?

https://tools.ietf.org/html/draft-wright-json-schema-00
https://tools.ietf.org/html/draft-wright-json-schema-validation-00
https://tools.ietf.org/html/draft-wright-json-schema-hyperschema-00

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants