-
Notifications
You must be signed in to change notification settings - Fork 84
-
Notifications
You must be signed in to change notification settings - Fork 84
Proposal for next draft: Remove 'id' keyword #77
Comments
__sigh** All good points. The main benefit as I see it is that it allows you to "bundle" multiple schemas into a single document. In bandwidth-limited scenarios, the time and bandwidth overhead of a second request can be non-trivial, especially if you have a host of small schemas you are referencing. However, given all the fuss that "id" has produced, removing it might well be a good move. People simply aren't using it properly. Small schemas written for low-bandwidth applications can be joined into a single document and referenced using @fge: You probably never thought I'd say that. :p |
@geraintluff Indeed I'd have never thought that :p @gazpachoking I admit I am relieved that someone else came to the same conclusion as I did, independently -- so thank you for that ;) I was just too upset to be able to be impartial on that matter anymore -- and I still am. This has the consequence, of course that we need to think about a "clean room" addressing policy for schemas. JSON Reference should be the base, I think nobody disagrees on this; but we need to "anchor" the schemas one way or another. Most of all, I wish for what is decided to be completely unambiguous, that is no implementation can get it wrong, and it is fully testable. I am still too upset to even come up with something at this moment though :p |
@gazpachoking thank you for your time and the valuable contribution of elaborate analysis you've documented here. Kudos! I agree with you and precedents certainly exist for the use of an endpoint (physically referenceable or in identification only) to be a logical first choice. Personally I've been looking to see relation types been put to purpose, equally suited for this in contemplation, yet remains absent from current proposals.. |
This isn't a problem with "id" per se, this is just poorly defined behavior. Choosing to define a particular method of dereferencing, "inline dereferencing" especially, was a particularly bad decision. In practice, dereferencing is an implementation-dependent function that cannot be (re-)defined. And how implementations want to support looking for schemas, given a URI, is up to them. Do you consult a database? (I do this primarily.) Do you go over the network? (Hopefully not, though my schemas are usually available at their advertised URIs for populating a database with.) Do you scan and load into memory a list of files? (I do this too.) This is good, because although implementations will normally wish to scan schemas for inline schemas to also use, this would not make any sense in my database, where "id" should already parsed out.
In the given example, /x is not somewhere you would expect to find a schema according to the meta-schema and the specification, therefore it's not an "id". That is to say, the question isn't 'where do I find "id"s?' It is 'Where do I find schemas?' and this is very well-defined by the meta-schema, and the specification.
This is a highly regarded feature! The fact that URIs are resolved against the current document, and that documents are allowed to define their own URI, is a core functionality of Web technology. Any other behavior would definitely be a bug in implementing RFC 3986.
JSON Reference supports this behavior. The hang-up is that most JSON documents don't change their URI within the document. Nonetheless, the practice of changing a URI within a document, and the ability to embed documents inside other documents, is a common one, especially in the Web.
While the proposal is not that we get rid of the URI, it is just that all URIs would have to be provided out-of-bound. This doesn't seem very helpful, and normally it's expected that documents are allowed to define their own URI or base URI.
How would this mechanism work, exactly? What if you have multiple nested schemas? Ideally (perhaps not currently), here's a few of the things you could use "id" for: (1) Easily building recursive schemas. A schema author may want to write a schema with three nested schemas that serve different functions, and the innermost schema may itself either be an instance of the root schema, or the second-in schema, depending on the context. You would need to break this up into three seperate documents, and assign each of them an id out-of-bound, even though you're using those id's within the document. This does not strike me as prudent, that we define how to refer to another schema, but do not define any method of labeling them. (2) Labeling individual parts of the document. "id" ideally allows for such constructs like being able to give every schema an "id":"#name" and refer to it with a URIRef (like the XML/HTML "id" attribute, and other web technologies... How do user-agents find these ids? Well you have to parse the document, normally against a schema.). (3) Ability to carelessly embed schemas inside other schemas, to substitute a {$ref:"..."} expression for a {"id":"...", ...} expression, and have it carry the same meaning. (** This comes with an asterisk, that the schema URI must be preserved, by definition, and not merely the URI Reference (URIRef) if any, for the sake of parsing child URIRefs. This should be obvious, this behavior also is the same as in other Web technologies which use relative URIRefs. This can be done by inserting the absolute URI as the URIRef, or preserving the base URI by some other means.) |
On Tue, Feb 26, 2013 at 9:41 AM, Acubed notifications@github.com wrote:
No. { I am sorry, but this is exactly why inline dereferencing had to be "created".
See? Plenty of possibilities. Nothing illegal. A mess. KISS! [...]
By whom? For what? [...]
No. Remember that JSON References are always supposed to be resolved
See "definitions".
No. See "definitions".
See "definitions".
The same effect can be achieved if there is a simple mechanism to Francis Galiegue, fgaliegue@gmail.com |
"definitions" cannot be a replacement for "id", it is strictly complimentary. Consider a post I just made to the mailing list: JSON Hyper-schema allows me to give individual objects within an instance a URI. Consider a blog post stored in a document database: // application/json; profile=http://example.com/blog
{ subject: "/2012/11/20/my-first-blog-post"
, author:"/users/awright"
, posted:"2012-11-20T23:40:22Z"
, body:"<p>Hello</p>"
} I can say that subject has a rel=self, therefore giving that object a URI when a JSON Hyper-schema parser passes over it: // application/schema+json; profile=http://json-schema.org/draft-04/hyper-schema#
{ id: "http://example.com/blog"
, links: [ {href:"{subject}", rel:"self"}, {href:"{author}", rel:"author"} ]
} This is an extremely common use-case. Any time you sent JSON documents over HTTP, in request or response, you are exchanging a resource, which often has a URI. Blog posts and other media frequently encode their URL the resource, or otherwise or have some other way to generate a "self" link from the available data. An entire JSON vocabulary, JSON-LD (currently being looked over to become a W3C Recommendation) also deals with exchanging URIs in this manner. Within this blog post, perhaps I want to refer to the mere contents, the "body". So you have: Now maybe I want to export a list of blog posts: // application/json; profile=http://example.com/bloglist
[ {subject: "/2012/11/20/my-first-blog-post", author:"/users/awright", posted:"2012-11-20T23:40:22Z", body:"<p>Hello</p>"}
, {subject: "/2013/01/04/my-second-blog-post", author:"/users/awright", posted:"2013-01-04T16:14:50Z", body:"<p>Good afternoon</p>"}
] I can embed the schema for {blog post} inside a schema: // application/schema+json; profile=http://json-schema.org/draft-04/hyper-schema#
{type:"array", items:{$ref: "http://example.com/blog"}} And with this, the blog posts retain their semantics, including their respective URIs. Now I want to refer to the same JSON instance as before, but this time all I have is this collection that I exported from my database. You still have: All these schemas are stored somewhere, of course. Perhaps I want to define new content-types in addition to blog posts. So these schemas are stored in the database, like blog posts and users are. The meta-schema is also stored with this collection, to describe the collection of schemas. Now suppose I want to export this entire database, to keep a backup, or maybe someone else wants to import the data into their own database, maybe a document database, or a relational database, or a resource database (e.g. make PUT requests to an HTTP server). We can describe this export with a schema: // application/schema+json; profile=http://json-schema.org/draft-04/hyper-schema#
{ type: "object"
, properties:
{ "version": {type: "string", "maxLength": 64}
, "posts": {type: "array", items: {$ref: "http://example.com/blog"}}
, "users": {type: "array", items: {$ref: "http://example.com/user"}}
, "types": {type: "array", items: {$ref: "http://json-schema.org/draft-04/hyper-schema#"}}
}
} I use $ref here, especially for brevity, but it may be a better idea to simply substitute the schema in, so we have all the schemas together in one coherent document. Maybe my blog post schema and other schemas also contain meta-data on how to map fields to relational database rows and tables, perhaps that's how I've been storing the data. So now I've just exported my whole database by first reading a JSON Schema, then using that to read the database and generate an export. And any JSON Hyper-schema parser can read this one schema, and then consume my database export, and generate a list of URIs of JSON instances found within it that correspond to my blog post JSON instances, and import it back into the database, into a relational database (if it understands my custom vocabulary), or into a resource database. But woe is me, while my blog posts and schemas are fully parsed, my schemas and meta-data for those users and blog posts, under this proposal, doesn't get the same privilege. There would no longer a way to refer to them with a consistent, uniform URI. The purpose of URIs is to decouple them from their physical location. Almost all formats with hyperlinking using URIs have methods of defining one's own URI. Even JSON Schema gives this option to instances with the rel=self relation, so excluding JSON Schemas makes no sense. Asking people to change their URI because of context, especially something as ubiquitous as a JSON Schema, is not an option. |
@ACubed Some of that was a bit over my head, I'm not too knowledgeable on the hyper-schema stuff, so forgive me if I missed something. I agree there should be a way to define the base uri of a document within the document, my gripe is when the id keyword is nested within a document somewhere. I didn't see that being done in your example, can you help me understand where that is needed in your example? |
First confirming some definitions: A base URI is the URI of a document, or at least the URI that other URI references are resolved against. By definition it does not have a fragment part (the fragment part of a base URI is never preserved, resolving the blank URI <> against another URI effectively strips the fragment from that URI). A URI reference is a relative or absolute URI, to be resolved against the base URI. Simply saying URI is considered to be absolute and already resolved against the base URI. Some implementations let you define a base URI without defining the URI of the document/information resource, and/or let you define the URI of the document without setting the base URI that URI references are resolved against. Generally you want to do both. The important thing to know about JSON Hyper-schema is that you can define relationships between data. For instance, "This blog post has an author identified by this URI". You can also say "this object is identified by this URI". You can probably imagine going through an array of blog posts, seeing that they have a certain URI, and requesting that URI from an HTTP server (if it's an HTTP URL), and getting back that same object. The same thing goes for JSON Schema. It's especially important for JSON Schema, schemas of which are often re-used and re-embedded in multiple places. I keep a database just like I described, the same schema will be found embedded inside a schema, or available by itself. Consider a schema describing a blog post, and a schema that describes an array of blog posts (for example, listing posts on a front page). I want to ship a document describing the latter case, a schema of a list of posts. In this case, downloading the additional schema defining the blog post can't be done (downloading may not be an option), and the proposed method of embedding the schema doesn't make sense -- the URI of the schema that identifies blog posts would necessarily change. This is a problem since, among other things, these posts would be served with |
I wonder how many JSON schema implementations currently implement "id" correctly? Until this gets finished: json-schema-org/JSON-Schema-Test-Suite#20 (if ever) we may have to compile the list by hand, but it still wouldn't take too long. |
@ACubed tl;dr - most of it. Fact is, there is a clear problem with id, but I think in general people get caught up in false dichotomies. 👎 for removing the id keyword @fge analyses this well in https://github.com/json-schema/json-schema/wiki/The-%22id%22-conundrum#how-to-fix-that. Completely agree with that. As an aside, I would say its worth looking at how XSD does internal and external refs (much more restrictive than is proposed in json schema v4), and how HTML does base URI adjustment (with a different keyword -- and who actually uses |
@gazpachoking a number of things about id and $ref have been tidied up in the latest internet draft. Spec work for that is going on at https://github.com/json-schema-org/json-schema-spec https://tools.ietf.org/html/draft-wright-json-schema-00 |
So, I propose "id" keyword is removed from the next draft of json schema. Caveat: I have not implemented many schemas, so maybe I am missing some areas where it is extremely beneficial.
My objections:
Inline dereferencing:
Canonical dereferencing:
id
keyword, it is possible to have$ref
s that are in a different scope, depending on how they are referenced. consider:the
$ref
within items has a different scope depending on if we are evaluating it from theitems
keyword (anitem#/x) or theproperties
keyword (base#/x). JSON reference does not have this problem, as the scope is always the uri of the document.Maybe I just need to see some examples where
id
is highly beneficial, but it seems to me that you might want to use theid
property, you could just as easily$ref
the external schema. Simple inline referencing to fragment ids might be helpful in some situations, but scanning an arbitrary json object to identify them is not. And given thedefinitions
keyword, it is much more helpful to use json pointer references to subschemas stored within that keyword. I just do not see the the benefit ofid
outweighing the difficulty if implementing it, especially given support may not be consistent throughout implementations.The text was updated successfully, but these errors were encountered: