Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation: if/then/else #180

Closed
epoberezkin opened this issue Dec 5, 2016 · 75 comments
Closed

Validation: if/then/else #180

epoberezkin opened this issue Dec 5, 2016 · 75 comments

Comments

@epoberezkin
Copy link
Member

epoberezkin commented Dec 5, 2016

This can be seen as the extension of #31 and #64. I see it as the syntax sugar for existing boolean/compound keywords.

The validation process is very simple:

  • if the schema in if is valid, than the schema in then should be validated and its outcome determines the instance validation.
  • if the schema in if is invalid then the schema in else should be validated and its outcome determines the instance validation.
  • if if is not present then the schema itself is invalid (metaschema validation will fail).
  • if neither then nor else is present - the same as above

As I've written in some issue the schema with if/then/else:

{
  "if": {"$ref": "condition" },
  "then": {"$ref": "schema1"},
  "else": {"$ref": "schema2"}
}

is a boolean operation and it is equivalent to the schema below that is possible now:

{
  "anyOf": [
    { "allOf": [ {"$ref": "condition" }, {"$ref": "schema1"} ] },
    { "allOf": [ {"not": {"$ref": "condition" } }, {"$ref": "schema2"} ] }
  ]
}

so if/then/else is as declarative as existing keywords but it provides a more convenient, clear and performance efficient alternative ("condition" will never be validated twice) for a quite common validation scenario.

Using if/then/else the problem in #31/#64 is also solved.

@handrews
Copy link
Contributor

handrews commented Dec 5, 2016

so if/then/else is as declarative as existing keywords

While Wikipedia is not always the best resource, this article is well-cited and the following definition is what I usually hear for declarative programming:

In computer science, declarative programming is a programming paradigm—a style of building the structure and elements of computer programs—that expresses the logic of a computation without describing its control flow.

So if/then/else is by definition not declarative. The distinction between declarative and imperative is control flow. If/then/else is control flow. Imperative vs declarative has nothing to do with whether the outcome is the same- all programming styles can produce the same output. It's about how things are processed.

It is certainly worth discussing if we want to add this imperative construct to JSON Schema, but it is the very definition of an imperative construct.

@awwright awwright changed the title v7 validation: if/then/else Validation: if/then/else Dec 5, 2016
@epoberezkin
Copy link
Member Author

epoberezkin commented Dec 5, 2016

I am not sure why implication is more imperative than and/or/negation - they are all operations from boolean algebra. The fact that the evaluation of implication can be short-circuited (i.e. when "if" is false then "then" doesn't need to be evaluated) desn't make it more imperative than "and" and "or" - they also can be short-circuited.

Also there are languages that use if/then/else as expression, not as a control flow statement. If you don't like the keywords, some other can be used.

Please see this article: https://en.wikipedia.org/wiki/Material_conditional

I am talking about boolean operation p => q (ignoring else here for simplicity, as it is an additional sugar really) which has the following truth table:

p q p=>q
False False True
False True True
True False False
True True False

What makes you think it to be more imperative than and/or/xor/not that we alreay have?

@epoberezkin
Copy link
Member Author

that expresses the logic of a computation without describing its control flow.

That's exactly what is the table above - the logic without control flow.

@handrews
Copy link
Contributor

handrews commented Dec 5, 2016

Because there is a widely accepted plain-english definition that says so. "imperative programming" and "declarative programming" have well-defined meanings, which are not about whether things can be expressed in terms of logical predicates. Go find a credible definition of "declarative programming" that includes an if statement and I'll discuss that definition with you. But making up your own approaches to these terms undercuts your primary argument.

@epoberezkin
Copy link
Member Author

As I said, I don't mind what the keywords are. Let's use "ante" (from "antecedent", the term for "p" in p=>q) and "cons" ("consequent").

@epoberezkin
Copy link
Member Author

epoberezkin commented Dec 5, 2016

@handrews I can refer you back to the same article you quote: https://en.wikipedia.org/wiki/Declarative_programming#Constraint_programming

Subparadigms
Declarative programming is an umbrella term that includes a number of better-known programming paradigms.
Functional programming, and in particular purely functional programming, attempts to minimize or eliminate side effects, and is therefore considered declarative.

Many functional languages support if/then/else construct as its result is pure - it doesn't have side effects and it is not dependent on the order of execution, it's determined by truth table, same as allOf/anyOf/oneOf (and/or/xor).

Also see https://en.wikipedia.org/wiki/List_of_programming_languages_by_type#Declarative_languages

Many languages in this list support conditionals.

@handrews
Copy link
Contributor

handrews commented Dec 5, 2016

Very few systems are purely one style or another. And maybe JSON Schema ends up not being purely declarative in the sense I am advocating. Ultimately, whether if-then-else is "declarative" or not (by whoever's definition) is less important than whether you can convince more people here that it is the right direction for JSON Schema.

@epoberezkin
Copy link
Member Author

epoberezkin commented Dec 6, 2016

An alternative for for the same would be:

{
  "conditional": [
    {"$ref": "condition" },
    {"$ref": "schema1"},
    {"$ref": "schema2"}
  ]
}

There should be exactly 2 or 3 items in this array, more or less should make it invalid.

See also #168 (comment)

I think it can be more preferable as it doesn't look imperative at all to me.

@HotelDon
Copy link

HotelDon commented Dec 6, 2016

I would prefer this suggestion to the switch keyword, at the very least. I kept having problems with switch behaving in unexpected ways, usually because of the way "then" and "continue" interact. This feels much simpler, and harder to trip yourself over.

@epoberezkin
Copy link
Member Author

@HotelDon I agree. I never actually used continue. And you can achieve everything the switch gives you (without continue) by combining the above with the existing keywords.

@Relequestual
Copy link
Member

Relequestual commented Dec 6, 2016

To be honest, I don't really care what paradigm it falls under. I'm not even sure you can clasify JSON Schema under any paradigm, but that's a different debate ("We" don't even all agree if it is code or not anyway).

Things I care about when looking at adding new functionality:

Is there a use case?
Yup, I can think of a few.

Does it make JSON Schema easier to use?
Yup, the example clearly shows.

Does it cause problems for implementors?
I can't answer that one, but considering it's possible that implementors may already be intelegantly asessing anyOf to construct the same logic structure, by looking at for the structure style shown in the example, leads me to think probably not.

I'd also much prefer this to a switch, as, as mentioned, it could be unclear or confusing.

@handrews unless you can think of a reason why this would be a BAD thing, them I'm for it.

@epoberezkin
Copy link
Member Author

By the way, would you prefer conditional or if/then/else? I think I like conditional more, not because it's less imperative looking but because it's a single keyword.

@handrews
Copy link
Contributor

handrews commented Dec 6, 2016

As I said before:

Ultimately, whether if-then-else is "declarative" or not (by whoever's definition) is less important than whether you can convince more people here that it is the right direction for JSON Schema.

People seem to be for this, so I'll go with that. It's not about my personal preference.

We all seem to agree that switch was too complicated. And yes, too imperative, while if/then/else falls into this "it depends on how you think about it" zone. Having given it a rest overnight, I do see @epoberezkin's point of view here even if it is not my preferred way to consider it.

By the way, would you prefer conditional or if/then/else? I think I like conditional more, not because it's less imperative looking but because it's a single keyword.

I prefer if/then/else because it is just more obvious. I suspect it's because I just woke up but I had to think for a second to realize that the 2nd and 3rd schemas in the "conditional" list behave as "then" and "else" respectively.


There is one property I would like to make sure we preserve, which gets to the real-world implications of declarative vs imperative: I want to make sure that it is always safe to evaluate all schema branches. Right now I believe that is true because in general JSON Schema validation is purely functional, without state or side effects. As long as that remains true, it should be safe to (for instance) pass each subschema off to separate threads or co-routines and decide what to do with the results later.

If it's always safe to evaluate both the "then" and the "else" no matter how the "if" validates, then this really is identical to using the existing boolean keywords but more clear. It should also always be safe to only evaluate either the "then" or the "else" depending on the result of the "if" validation. This is the same as saying that you can either short-circuit "allOf", etc. or you can check all branches even if you know that since one fails the overall result will be a failure. It's up to the implementation to decide.

Does this property make sense? Do others agree it is desirable?

@epoberezkin
Copy link
Member Author

epoberezkin commented Dec 6, 2016

If it's always safe to evaluate both the "then" and the "else" no matter how the "if" validates, then this really is identical to using the existing boolean keywords but more clear. It should also always be safe to only evaluate either the "then" or the "else" depending on the result of the "if" validation.

I fully agree. That's what I was trying to explain (badly) with my truth tables.

I prefer if/then/else because it is just more obvious. I suspect it's because I just woke up but I had to think for a second to realize that the 2nd and 3rd schemas in the "conditional" list behave as "then" and "else" respectively.

I agree with that as well. List based conditional in clojure makes me think for a second too, while explicit if/then/else in haskell is immediately clear. So I am ok with if/then/else.

@handrews thank you

@epoberezkin
Copy link
Member Author

@handrews if you like I can try making a PR with the description of the keyword for the document.

@handrews
Copy link
Contributor

handrews commented Dec 6, 2016

@epoberezkin Given that @awwright hasn't weighed in yet, I'd like to keep this out of Draft 6 unless there's full agreement. But I would definitely encourage you to write a PR for Draft 7. Does that seem reasonable? I'm trying (without much success) to keep a focus on resolving Draft 6 issues so we can publish that draft rather than adding more things to the pile.

@epoberezkin
Copy link
Member Author

Yes, agreed.

@HotelDon
Copy link

HotelDon commented Dec 6, 2016

I would prefer a containing keyword for these, so you can have multiple ifs for a single entry. Something like this, maybe?

"conditional": [
    {
        "if": {"$ref": "#/definitions/condition1"},
        "then": {"$ref": "#/definitions/schema1"},
        "else": {"$ref": '#/definitions/schema2"}
    },
   {
         "if": {"$ref": "#/definitions/condition2"},
        "then": {"$ref": "#/definitions/schema3"},
        "else": {"$ref": '#/definitions/schema4"}
    }
]   

The order of the array shouldn't impact whether data passes validation or not, so schema writers can focus on it having it make sense at first glance.

If you only have need for one conditional statement,, then you could just skip the array entirely.

"conditional":  {
         "if": {"$ref": "#/definitions/condition1"},
        "then": {"$ref": "#/definitions/schema1"},
        "else": {"$ref": '#/definitions/schema2"}
    }

@handrews
Copy link
Contributor

handrews commented Dec 7, 2016

@HotelDon is there any reason that "allOf" wouldn't work for this?

"allOf": [
    {
        "if": {"$ref": "#/definitions/condition1"},
        "then": {"$ref": "#/definitions/schema1"},
        "else": {"$ref": "#/definitions/schema2"}
    },
   {
         "if": {"$ref": "#/definitions/condition2"},
        "then": {"$ref": "#/definitions/schema3"},
        "else": {"$ref": "#/definitions/schema4"}
    }
] 

@HotelDon
Copy link

HotelDon commented Dec 7, 2016

@handrews No, I just keep forgetting that "allOf" exists - I tend to use "oneOf" a lot more than "allOf", so it slips my mind a lot.

I would still argue for a single keyword that encompases "if", "then" and "else", to make it more obvious at first glance what those entries are doing. So like this:

"conditional":  {
         "if": {"$ref": "#/definitions/condition1"},
        "then": {"$ref": "#/definitions/schema1"},
        "else": {"$ref": '#/definitions/schema2"}
    }

Or this:

"allOf": [
    "conditional": {
          "if": {"$ref": "#/definitions/condition1"},
          "then": {"$ref": "#/definitions/schema1"},
          "else": {"$ref": '#/definitions/schema2"}
     },
    "conditional": {
           "if": {"$ref": "#/definitions/condition2"},
           "then": {"$ref": "#/definitions/schema3"},
           "else": {"$ref": '#/definitions/schema4"}
    }
]

I realize, however, it's a pretty weak argument for adding a small amount of cruft in exchange for a small amount of clarity, so unless someone else feels strongly about it, the three keyword version is fine with me.

@epoberezkin
Copy link
Member Author

epoberezkin commented Dec 7, 2016

@HotelDon I don't think it adds much clarity. So far JSON-schema avoided adding two level keywords and that is one of the resons I disliked the ideas like switch and patternGroups - they have keywords inside keywords. But those inner things aren't keywords really, as they can't be used on their own. Then what are they?

if/then/else on it's own gives a smaller and more convenient building block than can be used to emulate switch without continue:

"anyOf": [
    {
        "if": {"$ref": "#/definitions/condition1"},
        "then": {"$ref": "#/definitions/schema1"},
        "else": false
    },
    {
        "if": {"$ref": "#/definitions/condition2"},
        "then": {"$ref": "#/definitions/schema2"},
        "else": false
    },
    {
        "if": { "not": { "anyOf": [
            {"$ref": "#/definitions/condition1"},
            {"$ref": "#/definitions/condition2"}
        ] } },
        "then": { "ref": "#/definitions/defaultSchema" },
        "else": false
    }
] 

boolean form of schema makes it easy and elegant to make if/then fail if "if" fails. That's for the cook-book :)

So I think flat is better...

@HotelDon
Copy link

HotelDon commented Dec 7, 2016

So far JSON-schema avoided adding two level keywords

Yeah, I had noticed that a few hours ago when I was comparing the formatting for existing keywords. I wouldn't rule it out entirely for other future keywords, but I can see how it'd be better to keep this flat.

@Relequestual Relequestual added this to the draft-future milestone Dec 7, 2016
@awwright
Copy link
Member

If we want to fit with the current convention of keywords, a "conditional" keyword would probably be the best bet:

{
"conditional":  {
         "if": {},
        "then": {},
        "else": {}
    }
}

Maybe an array could provide "switch" like functionality, where only the first match is picked:

{
"conditional":  [
   {
         "if": {},
        "then": {}
    }.
   {
         "if": {},
        "then": {}
    }
]
}

I also want to get an idea of how frequently this is actually necessary... how often would this actually simplify schemas and their output? Are there any use cases anyone is aware of?

@handrews
Copy link
Contributor

@awwright the thing that got us to the point of consensus at all was dropping the switch functionality, so re-introducing it is counter-productive.

@handrews
Copy link
Contributor

how often would this actually simplify schemas and their output? Are there any use cases anyone is aware of?

Very frequently. I and/or teams I have worked with have done this sort of thing with "oneOf" many times, and while I argued that "oneOf" was sufficient, if-then-else is undoubtedly more intuitive.

@handrews handrews modified the milestones: draft-07 (wright-*-02), draft-future May 16, 2017
@handrews
Copy link
Contributor

@epoberezkin have you gotten any interesting feedback on if/then/else or select/selectCases/selectDefault since adding them to Ajv?

I'd like to move ahead with a PR here. In the absence of compelling feedback, I'd suggest we go with if/then/else and if there is still interest in select/selectCases/selectDefault track that elsewhere. But if select/selectCases/selectDefault has proven more useful then let's see if we still want if/then/else at all.

@epoberezkin
Copy link
Member Author

epoberezkin commented Aug 20, 2017

@handrews People use and ask questions about if/then/else. It is more generic so I think it should be added before select.

select requires $data support and in most real cases it can be implemented via several ifs (very verbose, but without $data), so I'd leave select until the next time regardless whether we add $data now or not.

@handrews
Copy link
Contributor

Great, I'll do a PR for if/then/else. Thanks for all your work adding support for these ideas, having real usage feedback is tremendously helpful.

@Anthropic regarding your

considering the schema in it's data definition role (not just for validation)

For those who might not know, between that comment and now we started a project for proposing new JSON Schema vocabularies, including both a UI generation vocabulary and a code generation/data definition one (I think those go together? lmk if I'm confused).

I can't find where this has come up before, but I'd rather allow such new vocabularies to impose restrictions on how they are used with the validation vocabulary, and continue to support useful validation concepts even if they are difficult to impossible to use for data definition.

I think now that we are looking at these as separate vocabularies, it will be easier to explain imposing some restrictions such as "data definition implementations need not support if/then/else, not, *Of, dependencies, etc." We already sort of did this with the most recent Hyper-Schema revision, which excludes links defined under a "not" or within non-validating *Of branches from use. They're syntactically valid, but implementations MUST NOT attempt to do anything with them. There just aren't sensible semantics for those cases. I'll update that part of Hyper-Schema to also cover if/then/else in the PR.

handrews added a commit to handrews/json-schema-spec that referenced this issue Aug 27, 2017
This addresses json-schema-org#180.

I have intentionally allowed for any combination of these
keywords to be present or absent.  While having a "then" or "else"
without "if" is pointless and/or nonsensical, so are many other
possible JSON Schema keywords.
handrews added a commit to handrews/json-schema-spec that referenced this issue Sep 7, 2017
This addresses json-schema-org#180.

I have intentionally allowed for any combination of these
keywords to be present or absent.  While having a "then" or "else"
without "if" is pointless and/or nonsensical, so are many other
possible JSON Schema keywords.
@dlax
Copy link
Member

dlax commented Sep 8, 2017

Sorry for getting late in these discussions. I was wondering if having the if keyword within a (sub-)schema wouldn't make sense and solve some use cases in combination with oneOf or anyOf. For instance:

{
  "oneOf": [
     {
        "if": { "type": "object" },
        "properties": {
           "foo": {"type": "string"}
        }
     },
     {
        "if": { "type": "array" },
        "items": {"type": "string"}
     },
     { "$ref": "defaultCaseSchema" }
  ]
}

This way, we wouldn't need then and else keywords and this would naturally allow elif control. Has this been considered? Does it make sense?

@handrews
Copy link
Contributor

handrews commented Sep 8, 2017

@dlax It makes sense, but seems less intuitive than if/then/else. And the main point of if/then/else is to offer something intuitive. Everything it does can be done already. And implementing else-if-like (but unordered) logic just looks like this:

{
  "oneOf": [
     {
        "if": { "type": "object" },
        "then": {
          "properties": {
             "foo": {"type": "string"}
          }
        }
     },
     {
        "if": { "type": "array" },
        "then": {
          "items": {"type": "string"}
        }
     },
     { "$ref": "defaultCaseSchema" }
  ]
}

Which is really not awful.

@handrews
Copy link
Contributor

handrews commented Sep 8, 2017

Also, philosophically, nearly all keywords operate independently, such that:

{
  "x": "a",
  "y": "b"
}

is equivalent to

{
  "allOf": [
    {"x": "a"},
    {"y": "b"}
  ]
}

The exceptions are the additional* keywords and now if/then/else. Requiring the implementation of all keywords to be dependent on if breaks the paradigm too much. Being able to write little independent functions for the vast majority of keywords is one of JSON Schema's strengths.

@dlax
Copy link
Member

dlax commented Sep 8, 2017

Requiring the implementation of all keywords to be dependent on if breaks the paradigm too much. Being able to write little independent functions for the vast majority of keywords is one of JSON Schema's strengths.

Makes sense, thanks!

@handrews
Copy link
Contributor

Merged PR #375

@AndreKR
Copy link

AndreKR commented Oct 16, 2017

I'm trying out the syntax from this comment but I can't get it to work:

var Ajv = require('ajv');

var schema = {
	"properties": {
		"foo": {
			"type": "integer"
		},
		"pets": {
			"type": "array",
			"items": {
				"oneOf": [
					{
						"if": { "properties": { "type": { "const": "cat" } } },
						"then": { "$ref": "#/definitions/cat_pet" }
					},
					{
						"if": { "properties": { "type": { "const": "snake" } } },
						"then": { "$ref": "#/definitions/snake_pet" }
					}
				]
			}
		}
	},
	"definitions": {
		"cat_pet": {
			"type": "object",
			"properties": {
				"type": { "type": "string", "const": "cat" },
				"fur_color": { "type": "string", "enum": ["black", "white", "orange"], "default": "black" }
			}
		},
		"snake_pet": {
			"type": "object",
			"properties": {
				"type": { "type": "string", "const": "snake" },
				"overall_length": { "type": "integer", "minimum": 1 }
			}
		}
	}
};

data = {
	"foo": 123,
	"pets": [
		{
			"type": "cat",
			"fur_color": "white"

		},
		{
			"type": "snake",
			"overall_length": 42
		}
	]
};

var ajv = new Ajv();
require('ajv-keywords')(ajv, 'if');
var validate = ajv.compile(schema);
var valid = validate(data);
console.log(data);
if (!valid) console.log(validate.errors);

{ foo: 123,
  pets: 
   [ { type: 'cat', fur_color: 'white' },
     { type: 'snake', overall_length: 42 } ] }
[ { keyword: 'oneOf',
    dataPath: '.pets[0]',
    schemaPath: '#/properties/pets/items/oneOf',
    params: {},
    message: 'should match exactly one schema in oneOf' } ]

@epoberezkin
Copy link
Member Author

@AndreKR When "if" fails the whole schema passes. So in your case both subschemas in oneOf pass. What you miss is "else": false in both subschemas inside "oneOf".

@epoberezkin
Copy link
Member Author

@handrews that's actually the case where "select" with "$data" would have been nicer. So maybe let's try to get it in 08?

@AndreKR
Copy link

AndreKR commented Oct 16, 2017

I also saw something about discriminator, is that still a thing? I didn't use it here because my condition schema will get a second field.

@epoberezkin
Copy link
Member Author

I don't think it ever was a thing... The problem with "discriminator" (the way it is defined in openapi) is that it does implicit mapping directly from property value to the schema key inside definitions, not relying on any existing conventions in JSON-Schema (e.g., such as $ref).

The problem it solves is real, but I'd rather we agree on the solution that is both more flexible (e.g. allows to map a sub-property, allows to map to any sub-schema, possibly in the different file) and aligned with the rest of the spec.

@handrews
Copy link
Contributor

@handrews that's actually the case where "select" with "$data" would have been nicer. So maybe let's try to get it in 08?

Probably draft-09 will be the earliest $data will get considered given the level of controversial topics already attached to draft-08.

@btiernay
Copy link

Please see #1082 for related work in this area.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests