Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conditional selection of alternate schemas (includes "switch" and other options) #64

Closed
handrews opened this issue Sep 19, 2016 · 43 comments

Comments

@handrews
Copy link
Contributor

The Problem (and current workarounds)

A common use case is to select an overall validation schema (or schemas) based on how a small subset of the instance validates. In the simplest case, some property in the instance is checked against a set of literal values, and the overall validation schema(s) are chosen based on that literal value. This is generally implemented with oneOf (or anyOf) and enum (although see also the constant proposal in issue #58 ). Some forms of this problem may also be solved with dependencies.

Note: Throughout this proposal, the elements of the oneOf/anyOf lists are referred to as branches, as in this case they are being used as implicit (or in some options, explicit) conditionals.

Single selection with oneOf

This can be read as "if foo is firstValue, bar must be present and must be a list of number, otherwise if foo is secondValue, buzz must be present and a string that is at least 10 characters long":

{
    "type": "object",
    "oneOf": [
        {
            "properties": {
                "foo": {"enum": ["firstValue"]},
                "bar": {
                    "type": "array",
                    "items": {"type": "number"}
                }
            },
            "required": ["foo", "bar"]
        },
        {
            "properties": {
                "foo": {"enum": ["secondValue"]},
                "buzz": {
                    "type": "string",
                    "minLength": 10
                },
                "required": ["foo", "buzz"]
            }
        }
    ]
}

One difficulty with this approach is that the cause and effect are not clear. This could just as easily be read as "If bar is present as a list of numbers, foo must be set to firstValue. Otherwise if buzz is present and a string of at least ten characters, foo must be set to secondValue."

This is both a strength and a weakness. The oneOf construct an capture complex alternatives, but does not clearly express the idea of one part of the schema being the determining factor. In a small schema like this, either interpretation is easy to read, and it’s easy to spot the enum and guess that it is probably the determinant. In a more complex schema, where maybe there are two values of the enum that select one branch and one in the other, that is much less easy to spot.

Multiple selection with anyOf

Here is a similar example using anyOf with, where foo being set to multiSelect can potentially validate against two branches (although is only required to validate against one or the other). If foo is set to singleSelect it must validate against the lone schema that accepts that value. Multiple matches are prominently addressed in one of the proposals so we will use this example of consider the options.

{
    "type": "object",
    "required": "foo",
    "anyOf": [
        {"properties": {"foo": {"enum": ["multiSelect"]}, "bar": {"type": "number"}}},
        {"properties": {"foo": {"enum": ["multiSelect"]}, "buzz": {"type": "string"}}},
        {"properties": {"foo": {"enum": ["singleSelect"]}, "zippy": {"type": "boolean"}}},
    ]
}

In order to require validation against both "multiSelect"-designated schemas, an allOf must be introduced:

{
    "type": "object",
    "anyOf": [
        {
            "properties": {"foo": {"enum": ["multiSelect"]}},
            "allOf": [
                {"properties": {"bar": {"type": "number"}}},
                {"properties": {"buzz": {"type": "string"}}}
            ]
        }
        {"properties": {"foo": {"enum": ["singleSelect"]}, "zippy": {"type": "boolean"}}},
    ]
}

Single selection with dependencies

This schema validates the same set of instances as the schema above that uses oneOf:

{
    "type": "object",
    "properties": {
        "bar": {
            "type": "array",
            "items": {"type": "number"}
        },
        "buzz": {
            "type": "string",
        }
    },
    "dependencies": {
        "bar": {
            "properties": {"foo": {"enum": ["firstValue"]}},
            "required": ["foo", "bar"]
        },
        "buzz": {
            "properties": {"foo": {"enum": ["secondValue"]}},
            "required": ["foo", "buzz"]
        }
    }
}

Note that dependencies can only specify things based on the presence or absence of properties, so the "if bar is present, else if buzz is present" interpretation must be used for this approach. In some cases, that is exactly what needs to be expressed, but it seems to be more common to use a value as the determinant rather than the presence or absence of a particular property.

If the difference between foo being set to firstValue or secondValue was a difference in exactly how bar is validated (and buzz was not part of the schema at all), then the oneOf approach still works just fine, but the dependencies approach is impossible.

Multiple selection with dependencies

{
    "type": "object",
    "properties": {
        "bar": {"type": "number"},
        "buzz": {"type": "string"},
        "zippy": {"type": "boolean"},
    },
    "required": ["foo"],
    "dependencies": {
        "bar": {
            "properties": {"foo": {"enum": ["multiSelect"]}}
        },
        "buzz": {
            "properties": {"foo": {"enum": ["multiSelect"]}}
        },
        "zippy": {
            "properties": {"foo": {"enum": ["singleSelect"]}}
        }
    }
}

Again, the logic is inverted from the most intuitive reading, with the presence or absence of the other properties determining the value of foo. Since (in this multi-select example) "foo" is the only required property, it’s just about possible to make out the intention that "foo"’s value determines how "bar", "buzz", or "zippy" is validated. But it is arguably substantially less clear than the anyOf example, and as with the single selection example, dependencies cannot handle selection based purely on a value.

The proposals

There are two possible approaches, one of which has two variants:

  • A switch validation keyword, more or less as seen in many programming languages. Originally proposed by @geraintluff (with additional discussion in the old repo).
  • An annotation keyword that clarifies the author’s intent without changing validation rules. One form of this was proposed by @mrjj as bounding, and I am proposing an alternate syntax here.

Clarifying intent with an annotation property

This approach does not change validation at all. Rather, it adds one or two annotation properties that allow schema readers or documentation generators to understand the intent of the schema author for how branches are selected.

selectWith: pointers from outside the branches

selectWith is an annotation keyword that appears at the same level as a oneOf or anyOf. It is either a single Relative JSON Pointer or a list of them. The pointers indicate which properties (or array indicies, for that matter) are intended to determine which branch of the oneOf (or branches of the anyOf) is/are taken.

The values must allow Relative JSON Pointers (which include regular JSON Pointers) in order to allow a schema to be included in another schema as a child schema. Otherwise, the pointer would need to always have the correct full path, severely limiting re-use capabilities. The pointer is resolved with respect to the instance structure.

As an annotation property, selectWith cannot affect validation. Setting it to point to a non-existent property is legal and does not produce an error (following the general principle that nonsensical schemas are valid). Setting it to a property that will only exist on some branches is also possible and to be expected. Unspecified but allowed instance properties/array elements by default have a blank schema, allowing anything.

Here is our single-select example rewritten with selectWith:

{
    "type": "object",
    "selectWith": "0/foo",
    "oneOf": [
        {
            "properties": {
                "foo": {"enum": ["firstValue"]},
                "bar": {
                    "type": "array",
                    "items": {"type": "number"}
                }
            },
            "required": ["foo", "bar"]
        },
        {
            "properties": {
                "foo": {"enum": ["secondValue"]},
                "buzz": {
                    "type": "string",
                    "minLength": 10
                },
                "required": ["foo", "buzz"]
            }
        }
    ]
}

Recall that the pointer is relative to the instance structure, so "0/foo" so this reads "the schema used to validate this instance property are the ones which determine which branch is taken.

The selectWith for the multi-select anyOf would be identical.

selector: booleans within each branch

selector is an alternative syntax directly derived from @mrjj’s bounding proposal (so called because it put bounds on what parts of the schema needed to be fully processed, and therefore constrained error reporting only to the most relevant branches of oneOf/anyOf constructs).

Instead of one annotation keyword at the top, selector is a boolean annotation keyword that may appear anywhere within child schemas in a branch. If selector is the effect is essentially the same as putting a pointer to that location in selectWith.

The only difference is that selectWith pointers are applied to all branches, while selector can be placed in different locations in different branches (and some branches my not have any selector). However, since unspecified properties/array elements have a blank schema (allowing anything) by default, the end effect is the same. The validation outcome remains unchanged no matter which proposal is used.

Here is the single select example using selector:

{
    "type": "object",
    "oneOf": [
        {
            "properties": {
                "foo": {
                    "enum": ["firstValue"],
                    "bounding": true
                },
                "bar": {
                    "type": "array",
                    "items": {"type": "number"}
                }
            },
            "required": ["foo", "bar"]
        },
        {
            "properties": {
                "foo": {
                    "enum": ["secondValue"],
                    "bounding": true
                },
                "buzz": {
                    "type": "string",
                    "minLength": 10
                },
                "required": ["foo", "buzz"]
            }
        }
    ]
}

@mrjj’s original purpose with bounding was to narrow the scope of validation and therefore produce more specific errors. The approach is simply to validate anything marked "selector": true first, as anything that fails the selector validation will fail validation of the entire branch, so it is not necessary to proceed further (or report errors related to) that branch.

selectWith and selector comparison

While they may produce slightly different short-circuit validation behavior, neither of these change the validation outcome.

selector appears within the schema doing the selecting, which makes its effect obvious as soon as you spot it. On the other hand, spotting the selectors scattered throughout a complex set of branches is tedious and error-prone, and implementations will need to walk the branches and locate all of the selectors before being able to use them for short-circuit validation or anything else.

selectWith requires a bit more interpretation for humans who may have to eyeball how a long JSON pointer actually lines up with the branches. However, all selectors are gathered in one place and can be used as soon as they are encountered.

It would be possible to use both, for flexibility (which is why I gave them different names- something may be a selector for an outer oneOf while specifying a selectWith for an inner oneOf. I feel like having both adds complexity without providing much gain.

I am obviously biased, but I prefer selectWith simply because it reads much more clearly from the top down (assuming you put it above your branches). It clearly says "These fields are intended to determine which branch should validate." You can then look across the branch schemas and see what the selection conditions are. Which might be a bit tricky if the branches are complex, but no more so than trying to spot the selector keywords.

selectWith also more closely matches how a documentation generator would use it- the documentation would reference it as part of the description of the whole branch set, so with selector it would have to collect them into essentially the selectWith value anyway.

switch

(much of this section’s wording is copied directly from @geraintluff)

The purpose of the switch keyword is to express a series of conditional relations: "If A1 then B1, else if A2 then B2, else ...".

Values for switch

The value of switch is an array. The entries in the array must be objects, each containing:

  • then: a schema or a boolean
  • optional if: a schema
  • optional continue: a boolean
Validation of switch

For each object in the switch array:

  • if if is specified:
    • if data is not valid against the schema in if, then continue to the next item in switch
  • if the value of then is a boolean:
    • if the value of then is false, then validation fails
  • if the value of then is a schema:
    • if the data is not valid against the schema in then, then validation fails
  • if continue is set to boolean true, then:
    • continue to the next item in switch
switch examples

Here is our regular single-select implemented with switch:

{
    "type": "object",
    "switch": [
        {
            "if": {"properties": {"foo": {"enum": ["firstValue"]}}},
            "then": {
                "properties": {
                    "bar": {
                        "type": "array",
                        "items": {"type": "number"}
                    }
                },
                "required": ["foo", "bar"]
            }
        },
        {
            "if": {"properties": {"foo": {"enum": ["secondValue"]}}},
            "then": {
                "properties": {
                    "buzz": {
                        "type": "string",
                        "minLength": 10
                    },
                    "required": ["foo", "buzz"]
                }
            }
        }
    ]
}

And here is our regular multi-select. (Since nothing but foo is required and additional properties are allowed, it’s a bit silly to specify "bar" and "buzz" at separate schemas but pretend they are two schemas that make sense to do the sway because I don’t want to go redo all of the examples).

{
    "type": "object",
    "required": "foo",
    "switch": [
        {
            "if": {"properties": {"foo": {"enum": ["multiSelect"]}}},
            "then": {
                "anyOf": [
                    {"properties": {"bar": {"type": "number"}}},
                    {"properties": {"buzz": {"type": "string"}}}
                ]
            }
        },
        {
            "if": {"properties": {"foo": {"enum": ["singleSelect"]}}},
            "then": {
                "properties": {"zippy": {"type": "boolean"}}
            }
        }
    ]
}

This actually isn’t very interesting because since the two branches associated with a foo of "multiSelect" are more concisely managed with an inner anyOf, the switch can once again only choose one of its conditions. Here is a more complex example adapted from the original proposal:

{
    "type": "object",
    "switch": [
        {
            "if": {
                "properties": {"indicator": {"enum": ["yellow"]}}
            },
            "then": {
                "required": ["warningMessage"]
            },
            "continue": true
        },
        {
            "if": {
                "properties": {
                    "powerLevel": {"minimum": 9000}
                }
            },
            "then": {
                "required": ["disbelief"]
            },
        },
        {
            "then": {
                "required": ["confidence"]
            }
        }
    ]
}

In this example, if there is a yellow indicator, there must also be a warning message.
Whether there is a warning or not, a high enough "powerLevel" requires "disbelief", otherwise it requires `"confidence".

Since the "indicator" branch specifies "continue": true, we go ahead and check the other conditions whether we have a yellow indicator or not. And since continue is not specified on the "powerLevel" branch, if we match that condition we will not examine the remaining branches.

Because the last branch does not have an "if" schema, it will always match if we reach it. So the only we we do not match it is if we match the minimum power level, as that will end the processing of the switch before we consider the final branch.

Additionally, the point of allowing then to be a boolean is to provide a concise expression to say that the data must be one of the supplied options, e.g.:

{
    "switch": [
        {"if": ..., "then": ...},
        {"if": ..., "then": ...},
        {"then": false}
    ]
}

Comparing the options

selectWith/selector:

  • Geared towards documentation and schema readers
  • Clarifies schema author intent, but does not make the mental model of anyOf/oneOf any more intuitive
  • Does not change validation

switch:

  • Adds a new validation approach
  • Geared towards schema writers and readers
  • More familiar model to many programmers
  • Introduces imperative conditionals to what has previously been a declarative system
@handrews
Copy link
Contributor Author

Based on past experience working with declarative systems that add conditionals, once you do that, you have a programming language and there's no going back. People will want more and more imperative features and it gets harder to explain why they shouldn't go in.

It's pretty much the same problem you see in web templating systems that try to not be a language in order to keep logic out of the templates. As admirable as their goal is, they always tend towards being a language as time goes on, and just end up being a really unsatisfying one.

In my experience working with teams learning JSON Schema, wrapping one's head around this sort of use of oneOf, etc. is a bit tricky at first, but once folks understand the idiom they use it and recognize it readily enough. Having an annotation like selectWith will encourage, just by its schema documentation, people to think about this sort of oneOf/anyOf idiom. Together with a good set of examples on the web site, I think that would work out well enough.

I also confess that I messed myself up multiple times trying to construct a switch that illustrated both the continue and the empty schema the way I wanted. I think that oneOf/anyOf are actually more clear for that sort of complex "try all of these but stop if one of these validates" type of logic. Since the schemas themselves are not imperative, I find it easier to think of declaratively than mostly-declarative-with-a-conditional-on-top.

@handrews handrews changed the title v6 validation and annotation: conditional selection of alternate schemas v6: conditional selection of alternate schemas (includes "switch" and other options) Oct 26, 2016
@epoberezkin
Copy link
Member

epoberezkin commented Oct 28, 2016

Pros:

  1. It clearly separates predicate (that selects the schema to use) from the core of the schema that validates the data.
  2. It leads to less verbose syntax that anyOf/oneOf combinations in many cases.

Cons:

  1. It is indeed more imperative
  2. It can be abused

Weighing pros and cons I think this keyword is useful and should be added, again based on my experience of answering users questions how to deal with tricky anyOf situations. There are usually two solutions: a verbose one, with anyOf, and succinct and clear one with switch.
As to the abuse part, it can apply to almost everything...

Conditionals and iterations (=imperative) are already here, whether we like it or not: they can be implemented via combinations of allOf/not etc. This switch:

{
  "switch": [
    { "if": { "$ref": "if1" }, "then": { "$ref": "then1" } },
    { "if": { "$ref": "if2" }, "then": { "$ref": "then2" } }
  ]
}

is equivalent to:

{
  "anyOf": [
    { "allOf": [{"$ref": "if1"}, {"$ref": "then1"}] },
    { "allOf": [{"not": {"$ref": "if1"}}, {"$ref": "if2"}, {"$ref": "then2"}] },
    { "allOf": [{"not": {"$ref": "if1"}}, {"not": {"$ref": "if2"}}] }
  ]
}

with the assumption that subschemas in anyOf are validated sequentially (which is always the case, as far as I've seen). Switch just provides a less verbose and more efficient alternative to it.

@epoberezkin
Copy link
Member

epoberezkin commented Oct 28, 2016

Actually the assumption that anyOf is sequential is not needed... They are equivalent regardless.

@epoberezkin
Copy link
Member

Also, I suggest clarifying the validation process: https://github.com/epoberezkin/ajv/blob/master/KEYWORDS.md#switch-v5-proposal

Originally filed here: json-schema/json-schema#205

@handrews
Copy link
Contributor Author

@epoberezkin I definitely consider the imperative aspect to be a more significant problem than you do, so we'll have to get some more folks involved. I think that that adding annotations to clarify the intent of the oneOf/anyOf forms is the better route to keep things consistent.

@epoberezkin
Copy link
Member

epoberezkin commented Oct 29, 2016

I am neutral on imperativeness.
Switch can be seen as both imperative and as a logical expression (with implications).
Maybe instead of adding switch we can add implication (if/then), we already have AND/OR/XOR/NOT and then switch can be constructed from several implications:

{
  "allOf/anyOf": [
    { "if": { "$ref": "if1" }, "then": { "$ref": "then1" } },
    { "if": { "$ref": "if2" }, "then": { "$ref": "then2" } }
  ]
}

Same as allOf/anyOf, they are de facto imperative and switch can already be implemented via these keywords (as you can see from the example), just in a more verbose and less clear way:

p => q === !p || (p && q)

@epoberezkin
Copy link
Member

I think I like switch more though :)

@handrews
Copy link
Contributor Author

Now that const has been accepted I have even less interest in this topic.
@epoberezkin if you want to push for this, go ahead. I'll continue to argue for selectWith over switch, but it's not my top priority to win that argument.

However, if you don't want to adopt this issue, I think we can close it given that it hasn't attracted much commentary. If someone else comes along and wants to champion it they can, of course, re-open it or open a new one.

Anyway, please comment here within two weeks if you're going to take over advocating this, otherwise I'll close it.

@epoberezkin
Copy link
Member

I think I've posted enough reasons to adopt switch.

To summarise, the advantage to use it:

  1. It solves real issues users have
  2. It is deterministic with regards to which subschemas are validated (which is important when you use schemas for anything but validation)
  3. From validation perspective, it is a convenient syntactic sugar for existing keywords that reduces complexity and allows implementing implication
  4. Many Ajv users adopted it already, which is the confirmation that it is needed.

So, why don't we include it? What can I do to help it being accepted? PR?

@handrews
Copy link
Contributor Author

Many Ajv users adopted it already, which is the confirmation that it is needed.

That's evidence that something is needed, but not evidence that "switch" is the best solution. In order to have evidence for that, users would also need to be offered the other solution(s) and we would see which is preferred.

What I would like to see to move this forward is more discussion of the impact of adding an imperative construct to what is currently a declarative system. I think this is a principle that should not be broken lightly.

@awwright , @Relequestual , @jdesrosiers , @sam-at-github ? ( @epoberezkin feel free to page others)

@epoberezkin
Copy link
Member

epoberezkin commented Nov 21, 2016

What I would like to see to move this forward is more discussion of the impact of adding an imperative construct to what is currently a declarative system.

As I explained above, switch can be expressed using existing keywords. If that's the case, how it is less declarative than the current vocabulary?

@handrews
Copy link
Contributor Author

Declarative and imperative programming systems can express the same things. But they are different styles, and I believe that a consistent style is better for JSON Schema in the long run.

@handrews handrews changed the title v6: conditional selection of alternate schemas (includes "switch" and other options) conditional selection of alternate schemas (includes "switch" and other options) Nov 24, 2016
@epoberezkin
Copy link
Member

epoberezkin commented Nov 25, 2016

@handrews @awwright how would you feel about dropping switch idea, as being indeed too procedural they way it is defined, and replacing it with two things:

  1. if/then/else - this is a pure boolean algebra, essentially "implication" operation (like ternary in JS or if/then/else in haskell)
  2. switch/select where routing is not based on the algorithm but on the value of a particular data property - also more expression than procedure. We already have such routing on the property name and item index, the idea is also to have routing on the value.

There are ways to achieve the same validation result via existing vocabulary (anyOf etc.), but it is more verbose and it gets really messy from error reporting perspective and also from other usage perspective, beyond validation (no control about which subschemas get validated).

Maybe both deserve separate issues, in which case this can be closed.

@awwright
Copy link
Member

awwright commented Dec 3, 2016

I don't know how I feel about this yet. It seems like a giant can of worms.

I think we should encourage declarative methods where possible, and take a look at #31 to see if we can get the same effect that most people who want an if/then paradigm are going after.

@epoberezkin
Copy link
Member

Not sure what is the proposal in #31.

What people want is better control of validation with the use of predicates, whatever the syntax is. anyOf/oneOf/not are no more or less declarative than if/then - they all can be seen either as boolean expressions or as imperative. Same about select that uses some predicate to choose schema to validate - it is quite declarative.

I don't understand how it can be addressed within the semantics of oneOf etc.

@mrjj
Copy link

mrjj commented Dec 4, 2016

Thank you for raising the problem and making summary!

I see essential point in minimisation of impact, because scope of the new version is now looks very massive, some changes hard to implement would be dropped. Otherwise no one could make a big leap to the next version and we will get more steps to problem solution or some everlasting draft like RFC 6455 without any power to force implementation developers to follow this draft.

So if both solutions (i really appreciate selector as evolution of bounding that was not obvious without documentation) have cons why not to just move toward implementation relief and find better approach solving exactly initial problem without hard questions about declarative/imperative approach and execution order and priorities.

Why not just to tweak how reporting is formed attaching to every failed oneOf/anyOf selector according list of reasons why every of given branches have failed? This will be enough to heuristically detect missing fields, best fitting branch e.t.c. relying on this info.

Pros:

  • no changes in syntax at all
  • minimal validators code impact

Cons:

  • larger output
  • problem to move deeper than one level after failed oneOf/anyOf selectors

After this we will have enough time to think about enhancements having initial issue about gaining details what was wrong in selector generally solved.

@handrews
Copy link
Contributor Author

handrews commented Dec 4, 2016

@mrjj :

Why not just to tweak how reporting is formed attaching to every failed oneOf/anyOf selector according list of reasons why every of given branches have failed?

That's basically what #31 is proposing (once you get past it's slightly confusing first few comments).

scope of the new version is now looks very massive

This change isn't targeted at Draft 06 no matter what happens. Draft 06 core and validation are compatible with Draft 05 and mostly compatible with Draft 04 (the meaning of the "uri" format changed in Draft 05, but that's a pretty minor change). The two potentially breaking changes have been done to still allow the old forms for now, with the expectation that they will be removed in a future draft once folks have had a chance to migrate and/or we can publish a tool to assist with migration.

Draft 06 hyper-schema will probably have more significant changes, but only in areas that were already problematic.

@mrjj
Copy link

mrjj commented Dec 4, 2016

Perfect, so i vote for #31-like solution, sorry i haven't read all relative branches till end.

If approach with extending existing reporting seems viable not only for me and @awwright I'll try to help with implementation details. Concern about O complexity for all branches output seems reasonable, but it could be a matter of developers choice expressed by some kind of validator (not schema) flag "allBranchesOutput: true" or smth like this.

@epoberezkin
Copy link
Member

epoberezkin commented Dec 4, 2016

Why not just to tweak how reporting is formed attaching to every failed oneOf/anyOf selector according list of reasons why every of given branches have failed?

It is already what all validators are doing and that generates a large number of errors in real cases that are very difficult to manage.

This will be enough to heuristically detect missing fields, best fitting branch e.t.c. relying on this info.

That is very theoretic and too "heuristic". None of the validators are doing it and it is highly unlikely that any will, as it is:

  • out of scope of validation
  • not deterministic (results would depend on how you measure "best-fitting" and even on the order or nodes)
  • algorythmically complex (I won't go into estimating how much more complex than simple validation, but the gut feeling is that the naive implemntation would be worse than O(N^2) where N is the number of nodes)

So it is very easy to vote for "the solution" that does not and most likely never will exist. @mrjj, you call #31 a "solution" and also say there are some other "like" it - I am really waiting to see at least one solution, as #31 is just a vague idea, not a solution really.

So I would wait until a proposal for spec exists about which algorythm validators should be using to determine "best-fitting" branch.

But it seems much less trivial problem to solve to me than introducing a keyword that provides an easy to implement, performant (O(N) and faster than anyOf etc.) and deterministic solution within the scope of validation.

@handrews
Copy link
Contributor Author

handrews commented Dec 4, 2016

@mrjj @epoberezkin I still prefer the selectWith proposal which is a slight modification of @mrjj 's bounding proposal, reworked to be more readable.

The "0/foo" relative JSON Pointer in the selectWith example below simply indicates that the "foo" property of the instance is sufficient to make a decision. I show it used with oneOf here as that is the source of the most confusing errors, but it works with anyOf as well.

The subschema designated with selectWith is essentially the if subschema of the switch proposal, but we avoid the imperative style and (more importantly) the complex fall-through behavior required by switch.

{
    "type": "object",
    "selectWith": "0/foo",
    "oneOf": [
        {
            "properties": {
                "foo": {"enum": ["firstValue"]},
                "bar": {
                    "type": "array",
                    "items": {"type": "number"}
                }
            },
            "required": ["foo", "bar"]
        },
        {
            "properties": {
                "foo": {"enum": ["secondValue"]},
                "buzz": {
                    "type": "string",
                    "minLength": 10
                },
                "required": ["foo", "buzz"]
            }
        }
    ]
}

selectWith is much more along the lines of #31 with a focus on optimizing and reporting errors rather than introducing imperative control-flow.

@epoberezkin
Copy link
Member

epoberezkin commented Dec 4, 2016

@handrews If selectWith doesn't affect validation it's useless. If it does, it is not clear how it works. How would validator know which subschema in oneOf to use based on the value in selectWith? Is it supposed to magically guess that because there is properties keyword inside with the same property that is used in selectWith that has enum inside it with some value it should use this value? Do you realise what kind of abominable violation of JSON-schema design principle it is (when only siblings can be taken into account, not grand-grand-grand-children of a sibling as in your case)? What if there is more complex schema inside rather then just with properties?

I really don't get why you prefer so complex solution instead of a simple maping of value to schemas:

{
    "type": "object",
    "selectWith": "0/foo",
    "selectCases": {
        "firstValue": {
            "properties": {
                "bar": {
                    "type": "array",
                    "items": {"type": "number"}
                }
            },
            "required": ["foo", "bar"]
        },
        "secondValue": {
            "properties": {
                "buzz": {
                    "type": "string",
                    "minLength": 10
                },
                "required": ["foo", "buzz"]
            }
        }
    },
    "selectDefault": false
}

Do you see how much simpler it would be to reason about and to process? Or the theoretic purity is so attractive that the voes of implementers and users don't really matter, do they? /rant :)

@epoberezkin
Copy link
Member

I don't know why it should be selectWith though and not just select, everything else seems very simple though...

@handrews
Copy link
Contributor Author

handrews commented Dec 4, 2016

Do you see how much simpler it would be to reason about and to process?

No, because it's not.

There's nothing magical about this. With the exception of the fall-through rules (which are the things to which I object most about switch), everything necessary is already possible in JSON Schema. The only difficulty for reporting is that it is not clear which part of the instance should be checked to provide a clear message. selectWith is a pointer to the thing that should be checked. It's not magical at all.

select vs selectWith: Don't care which.

@epoberezkin
Copy link
Member

The only difficulty for reporting is that it is not clear which part of the instance should be checked to provide a clear message.

That is not "the only difficulty", it is a major difficulty

selectWith is a pointer to the thing that should be checked. It's not magical at all.

I understand that, but you are ignoring my question: how do you determine which subschema inside oneOf it points to in general case, when absolutely any set of schemas can be inside oneOf? Do you realise that pointer to data cannot be resolved into the pointer to the schema in general case? That it is only possible for the subset of JSON-Schemas?

@epoberezkin
Copy link
Member

With the exception of the fall-through rules (which are the things to which I object most about switch)

Don't care at all about fallthrough, very happy to kill that idea by the way.

@epoberezkin
Copy link
Member

And talking about real-life use cases a mapping of value to schema, as in the example above, is much more useful (and also more deterministic, as the values are unique) than the switch (where technically multiple ifs can be matched, that makes it looking more imperative).

@handrews
Copy link
Contributor Author

handrews commented Dec 4, 2016

Do you realise that pointer to data cannot be resolved into the pointer to the schema in general case? That it is only possible for the subset of JSON-Schemas?

Without instance data, that is correct, although humans can usually eyeball it in all but the most complex schemas. With instance data, though, it is easy enough- otherwise we couldn't validate anything at all.

An alternative would be to use pointers-to-schema instead of pointers-to-data. The same idea works either way, and abstract reasoning would be much easier with pointers-to-schema. I don't actually recall why I wrote it as a data pointer. It does work, but schema pointers would be simpler.

@handrews
Copy link
Contributor Author

handrews commented Dec 4, 2016

a mapping of value to schema, as in the example above, is much more useful

I don't actually follow your most recent example. If selectWith used schema pointers do they end up being pretty equivalent? I can't figure out what your firstValue and secondValue are doing.

For one thing, JSON Properties must be strings, which is overly limiting.

@epoberezkin
Copy link
Member

epoberezkin commented Dec 4, 2016

firstValue and secondValue are the values in the data instance that 0/foo points to. I thought it's obvious. In your case you have enum and validator is supposed to "eyeball" it three levels down to match the subschema.

@epoberezkin
Copy link
Member

epoberezkin commented Dec 4, 2016

"eyeball" is not really possible to implement efficiently - it requires deep traversal of all subschemas. You can't really ignore implementations to that extent. In general case it doesn't have solution at all. Imagine oneOf inside oneOf, not, etc. It should be obvious from schema itself which value maps to what, not from "eyeballing".

@epoberezkin
Copy link
Member

epoberezkin commented Dec 4, 2016

@handrews it is a general observation by the way that when you propose some nice theoretic idea you get so attached to it that you don't think how it is going to be implemented. But without implementations the standard is dead. You really have to change that approach and before proposing something try imagining yourself writing actual code that implements your idea. If you can clearly see that code, then the idea is fine. If it requires some complex and non-deterministic algorithms, tree traversals, etc. then this idea is better forgotten. Your selectWith is one such idea - impossible to implement efficiently, if not at all.

@handrews
Copy link
Contributor Author

handrews commented Dec 4, 2016

you don't think how it is going to be implemented. But without implementations the standard is dead.

You throw this at me a lot and I'm getting tired of it. You have no basis for it, you just don't like my ideas and find it easier to attack me personally than to reason about my ideas.

I have written and am writing implementations, and I am quite conscious of the challenges involved. I work in a corporate environment so I can't just toss said implementations up on GitHub to placate you.

You are essentially calling me both careless and stupid, and neither is an appropriate negotiating strategy in a project such as this.

@epoberezkin
Copy link
Member

You are essentially calling me both careless and stupid, and neither is an appropriate negotiating strategy in a project such as this.

Nope. Just preferring theory over practice.

@handrews
Copy link
Contributor Author

handrews commented Dec 4, 2016

Nope. Just preferring theory over practice.

OK, we're done.

@handrews
Copy link
Contributor Author

handrews commented Dec 4, 2016

You are insistent on ascribing motivations to me. You insult me for not finding your solution "obvious" or agreeing with you on what is simple. You are not behaving as a good-faith member of this community, and I will not continue to discuss this with you.

@handrews handrews closed this as completed Dec 4, 2016
@epoberezkin
Copy link
Member

@handrews I am sorry you see it that way. I did not intend to insult you at all. You were using a stronger language on a number of occasions, both towards myself and other people, so I am not sure why you're sensitive in this case.

And you closed your own issue by the way :)

@epoberezkin
Copy link
Member

You have no basis for it, you just don't like my ideas and find it easier to attack me personally than to reason about my ideas.

You've done it on a number of occasions without me even commenting on it by the way. And I do reason about your ideas. You just avoid answering the questions...

@epoberezkin
Copy link
Member

Anyway, I am sorry. It wasn't my intention to cause any offence.

@handrews
Copy link
Contributor Author

handrews commented Dec 4, 2016

Anyway, I am sorry. It wasn't my intention to cause any offence.

Thank you.

I can re-open this if you want. I closed it because I think #31 is sufficient and I did not feel like being attacked further for an issue I don't even care that much about.

You've done it on a number of occasions without me even commenting on it by the way.

Feel free to point any out, past or present, and I will make amends. While I will express technical opinions strongly, and push back on things like user statistics that aren't backed up with research, I do try to avoid ascribing motivations and attacking intelligence. I am generally happy to be called out on such things when they happen, as they are not intended.

I am particularly sensitive about the "theoretical" comment both because of what it implies about me as a person (that I am dismissive of the concerns of others) and the fact that, because of the nature of my work, I cannot currently easily disprove it. It is frustrating not to be able to show examples right now, and to have you hammer me on a topic I cannot address despite the existence of evidence to the contrary makes it very difficult for me to want to discuss anything with you. Particularly when your response to my attempt to explain what I do consider is ignored and you just repeat your accusation.

There aren't that many of us who are fully active on this project- basically you, me, Austin and Ben. I know I am not perfect in my interactions, but I am trying to work out how to make it work for all of us. Please do point out when I fail at this, but please also consider trying to meet me halfway. You don't know what's going on in my head or what work I have done that I cannot publish. Please think about that before characterizing me in a particular way.

@handrews
Copy link
Contributor Author

handrews commented Dec 5, 2016

You just avoid answering the questions...

I admit that I do ignore some of your questions. This is because you are so combative on every possible point that I have started to try to minimize our discussions. It just seems to me that after a certain point, there is no benefit to replying to you- it will provoke another round of argument without advancing either side. And I'll probably lose my temper and say something regrettable. So instead I just stop replying.

Perhaps we can both work on dialing it down a bit? :-) I would like to keep all of the discussions going, but I am worn down by the forcefulness of every single discussion. I can't do but so much about everyone else, but for myself given the choice between escalating and dropping out, I am increasingly choosing to drop out. Escalating wasn't working.

@epoberezkin
Copy link
Member

epoberezkin commented Dec 5, 2016

Feel free to point any out, past or present, and I will make amends.

You wrote: "I figured out that you have an excessively complex mental model for $ref but that didn't change my point of view that I find it extremely simple to work with and implement" That was a bit too personal. But that's fine, no offence taken. So far my mental model is the only one that passes all the tests in JS land... ;)

Perhaps we can both work on dialing it down a bit? :-)

Sure, thank you. I will do my best...

This is because you are so combative on every possible point that I have started to try to minimize our discussions.

I will try to be more patient. I was just getting a bit frustrated because our interactions reminded me your other conversations, when you are asking questions and get some unrelated answers...

I am particularly sensitive about the "theoretical" comment both because of what it implies about me as a person

It doesn't imply anything, from my point of view, at least. Some people start from practical considerations, some from theoretic, I wasn't judgemental about it at all, both approaches work... I was just hoping that highlighting it may help understand where the difference comes from... A bit of "British Parliament style" discourse was unnecessarry though, sorry.

I can re-open this if you want. I closed it because I think #31 is sufficient

I am happy to agree with #31 provided some specific and efficient, both from performance and implementatoin perspective, algorithm of mapping a value to a subschema is proposed and agreed. So I'd appreciate if we considered an explicit mapping or some other ideas in this issue a viable alterntive until then.
Switch without fallthrough (I use switch a lot and never needed fallthrough) also solves the problem, but for most practical use cases selectbased on value is more convenient. Although both can be used, switch allows more complex predicates (select can be expressed via switch, but not the other way around).

@handrews
Copy link
Contributor Author

handrews commented Dec 5, 2016

You wrote: "I figured out that you have an excessively complex mental model for $ref but that didn't change my point of view that I find it extremely simple to work with and implement" That was a bit too personal.

Blah. Not my finest moment. You are right, it was inappropriate and I apologize.

It doesn't imply anything, from my point of view, at least.

I think it's safe to say we often have different points of view :-) But I accept that that was not your intention and will do my best to keep that in mind going forward.

There's a bit more going on this weekend that has me on edge (I'll send you an email- not worth going into on GitHub) and I'm sorry that has made my temper worse than usual.

Let's pick up the select discussion over in #31.

@epoberezkin
Copy link
Member

By the way, I keep saying that I'd rather have a much simpler if/then pair of keywords than switch, multiple such pairs can be combined with anyOf/allOf/oneOf providing simpler alternative to switch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants