Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of profiles vs. sets #151

Closed
awwright opened this issue Nov 21, 2016 · 10 comments
Closed

Improve handling of profiles vs. sets #151

awwright opened this issue Nov 21, 2016 · 10 comments

Comments

@awwright
Copy link
Member

awwright commented Nov 21, 2016

JSON Schema is conflating two mutually exclusive ideas of class membership, and it's causing a handful of problems around "additionalProperties", especially when you have "additionalProperties": false

Types of taxonomy relationships

Associating an instance as a member of a group is done one of two ways:

  1. Sets - A JSON schema that identifies a set of JSON documents, where any JSON document that validates against the schema is a member of the set, even if the data has no semantic value or makes no sense.
  2. Profiles - A JSON document has a profile only when explicitly indicated as such. A profile may additionally contain restrictions on what data the JSON document may have.

These forms of association exhibit different behaviors.

Sets

Sets are motivated by the mathematical concept of a set and members of a set.

An instance in a set is uniquely identified by its contents.

JSON Schema can be used to create subsets - in which every member of a subset is by definition also a member of the superset.

If X and Y are sets described by schemas, where Y is a subset of X, then any instance of Y will also be an instance of X.

Profiles

Profiles are similar in function to media types. A media type is a short, standardized string that associates with a string of octets to form a document. Profiles are somewhat more abstract as a concept.

Profiles are often used to tell people they can treat the document a certain way, that they couldn't otherwise, even if it happened to otherwise look like a valid document. For example, a media type or profile lets people know if the document is executable or not -- an important security measure to ensure that plain text documents can't get executed as a program! (See also "polyglot programming" where a program is valid in two different languages, possibly one benign and one malicious)

An instance of a profile can determine how it is uniquely identified; frequently a URI is used as an identifier, and there can be multiple instances with identical data.

A resource can be described my multiple profiles. Rules can be applied to profiles and implications made about their membership. If X and Y are profiles, where Y is a subclass of X, any instance of Y is also an instance of X.

Sometimes we want to use JSON to describe properties only when they're an instance of a profile, using this subclass logic. For example, instances of X have property A, and A is only found in instances of X; instances of Y have property B, and property B is only found in instances of Y.

How do we describe data like this? There's a few options:

  1. Use one JSON document per profile per resource. If I have a resource Q that is an instance of Y, then have two JSON documents <Q.X.json> and <Q.Y.json>.

  2. Allow additionalItems on a document, and ignore unknown properties. List the document having every profile, including superclasses (though it might sometimes be possible to omit superclasses or any profiles implied from other profiles). Properties across profiles must not overlap.

  3. Prohibit additionalItems on a document. List the document as having only one media-type profile. Duplicate properties from superclasses, and optionally specify each property as copied/inherited from that superclass.

Solutions

How do we solve this?

Option for additional keywords

  1. Create a keyword that explicitly creates a subclass/superclass relationship. "properties" and some other keywords would get imported to the current document according to a well-defined behavior.

  2. Create a keyword that specifies a property matches (was inherited from) a property in another schema.

  3. Keyword to the current instance against another schema, with special instructions to ignore all properties not from a certain list - side-stepping additionalProperties: false if it exists.

@awwright
Copy link
Member Author

@handrews This is a natural follow-up to your Groups post about different uses of "additionalProperties"

@handrews
Copy link
Contributor

@awwright thanks! Did you look at #119 with $combine and $combinable? It explores one way to implement the sort of property-combination extensibility that people often ask for. My personal opinion after working through the whole thing was "more trouble than it's worth" but it does work.

I'm not 100% certain, but I think that anything that attempts to solve this problem will have the same complexities around negated schemas (not and oneOf) that makes $combine so complex. If you stick to simple object schemas, $combine is nearly trivial. So I think $combine at least illustrates some of the challenges of your option 0.

I am leery of option 1 as it gets into splicing bits of validation schemas from one place to another- the same problem I and others have with $merge and $patch (see #98 for a safer form limited to annotation keywords).

I'm not quite sure I understand option 2, could you walk through the algorithm in a bit more detail?

I need to spend a bit more time with your set-vs-profile concept. I'm not entirely sure I'm grasping the difference, particularly in your description of profile usage (set seems straightforward). In the profile discussion are Q.X.json and Q.Y.json two instance documents describing the same resource but conforming to different schemas? Also, do you really mean additionalItems in that section or do you mean additionalProperties? You use additionalItems in several places.

@handrews
Copy link
Contributor

See #214 for another approach to the reuse-with-additionalProperties aspect of the "profiles" use case. It is a relatively concise workaround using only current keywords, which could perhaps be a starting point for a more elegant solution.

@awwright
Copy link
Member Author

awwright commented May 9, 2017

Re-reading this, I could probably phrase the issue to be a little clearer and straightforward.

@handrews handrews modified the milestone: draft-07 (wright-*-02) May 16, 2017
@handrews
Copy link
Contributor

Assuming you really meant additionalProperties and not additionalItems, I think that your option 1 under Profiles is somewhat related to how I plan to managed versioning resource representations. The difference being that where you specify non-overlapping properties, I would allow them.

The schema is versioned, the resource itself is not. A representation would declare all of the schema versions against which it might validate. If a representation validates against a given schema version, then it may be interpreted according to that version's documented semantics.

For this, I have not seen a need for any additional keywords. Either the server ensures that it only connects the instance to schemas against which it validates (so the client can just look and see if it works with a version it understands), or the client is expected to validate against a specific schema version to determine whether that particular instance is usable.

Does any of this sound at all relevant here?

@handrews
Copy link
Contributor

handrews commented Aug 20, 2017

VOTE-A-RAMA!!!

It's time to gauge community support for all re-use/extension/additionalProperties proposals and actually make some decisions for the next draft.

Please use emoji reactions ON THIS COMMENT to indicate your support.

  • You do not need to vote on every proposal
  • If you have no opinion, don't vote- that is also useful data
  • If you've already commented on this issue, please still vote so we know your current thoughts
  • Not all proposals solve exactly the same problem, so we may end up implementing more than one

This is not a binding majority-rule vote, but it will be a very significant input into discussions.

Here are the meanings for the emojis:

  • Celebration / Hooray / Tada!: I support this so strongly that I want to be the primary advocate for it
  • Heart: I think this is an ideal solution
  • Smiley face: I'd be happy with this solution
  • Thumbs up: I'd tolerate this solution
  • Thumbs down: I'd rather we not do this, but it wouldn't be the end of the world
  • Frowny face: I'd be actively unhappy, and may even consider other technologies instead

If you want to explain in more detail, feel free to add another comment, but please also vote on this comment.

The vote should stay open for several weeks- I'll update this comment with specifics once we see how much feedback we are getting and whether any really obvious patterns emerge.

@epoberezkin
Copy link
Member

I am not sure what the proposal is here.

@handrews
Copy link
Contributor

I am not sure what the proposal is here.

I'm hoping @awwright will expand on his earlier comment about clarifying :-)
If he does we can nudge folks who already voted to double-check.

@awwright
Copy link
Member Author

I'm not actually sure which tangent this is going towards... But if the distinction I made in the original post (the difference between being an instance of a set and being an instance of a profile) makes sense, let's move it to a wiki page.

Then we can figure out some ways to implement or honor it, like write some detailed use cases.

Close this out?

@handrews
Copy link
Contributor

@awwright yes all of that sounds like a good plan. It's a good distinction to explore, but a wiki page sounds like a better home while figuring it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants