Skip to content
This repository has been archived by the owner on Mar 19, 2019. It is now read-only.

allOf with object schemas and additionalProperties = false #116

Closed
dawidcha opened this issue Feb 11, 2014 · 23 comments
Closed

allOf with object schemas and additionalProperties = false #116

dawidcha opened this issue Feb 11, 2014 · 23 comments

Comments

@dawidcha
Copy link

As I understand, the pattern for reusing base schema definitions across multiple schemas is to use 'allOf' to include base schema that we're mixing in. If this is to validate successfully, all such base schemas must have additionalProperties set to false (or omitted) which means that they, and the derived schema have no way of limiting the allowed property set.

Since I can't actually think of any cases when using allOf on several object schemas with additionalProperties set to false would make any sense, I'd like to propose that the draft spec be updated to say that when using allOf with object schemas, the additionalProperties setting will be treated as 'true', for the purposes of validation, but if all schemas in the allOf list actually have additionalProperties set to false then the derived schema will inherit that setting.

Did that make any sense?

@gazpachoking
Copy link

The desired results make sense, but the logic validators would have to implement sounds horrible. Would be nice to have some way to achieve these results though.

@dawidcha
Copy link
Author

It doesn't feel like something that would be too hard. I was planning on having a go at forking Francis Galiegue's java validator to see.

I'm currently working on professionally on an application that uses JSON in ajax requests (pretty common), and uses the json schema to validate the requests (also not uncommon, I suspect). However, right now we have no alternative, but to cut'n'paste common bits of schemas because our risk control chaps are, quite understandably, against permitting schemas that can't flag unexpected content as incorrect.

We've talked about writing a preprocessor to do the merging, but that feels like an admission of defeat to me.

@geraintluff
Copy link
Member

The solution I like best is actually "ban unknown properties mode". This would be a mode/config for the validator, where the presence of a undocumented property (not accounted-for by any of the schemas) is raised as an error.

The idea is that during development you could throw a wobbly at any unexpected input, during production you could log a warning but continue, or whatever - all using the same schema.

It is also very simple to implement - simply maintain two sets ("properties I have seen", "properties I have validated") and compare the two at the end after all schema validation.

At least one validator (tv4, my one) already supports this, and it will likely be documented explicitly in v5. If you're modifying Francis's Java validator, then it might be a better move to add support for "banUnknownProperties mode" instead.

@fge
Copy link

fge commented Feb 12, 2014

At least one validator (tv4, my one) already supports this, and it will likely be documented explicitly in v5.

That doesn't solve the allOf problems though. Each schema in allOf is independent from one another, so you still cannot "extend" schemas that way.

Schema "patching" still seems like the best solution to me, especially since JSON Patch is an RFC.

@geraintluff
Copy link
Member

@fge - What case doesn't it solve?

The idea is that the rule "no properties I don't understand" is set at the validator-level, and applied after all schema validation. If a property is accounted for by any schema (inside an allOf, whatever), then it won't count as "unknown".

@fge
Copy link

fge commented Feb 12, 2014

@geraintluff the problem here is that you define this at the validator level, and this is information unavailable in the schema itself. Another party may choose to use the schema but not this validation mode.

What is more, it is unconmfortably tied to JSON Objects instances only.

Consider that with my approach, which does not suffer any of these problems ;) Whiich is why I proposed it in the first place.

@geraintluff
Copy link
Member

Patching/merging is basically generating schemas at runtime, which (as well as being conceptually complicated and fairly inelegant) causes problems if you want to reference parts of the generated schema.

And yes - "ban unknown properties" is defined at the validator level. But the requirement always seems to be "I don't care which schema accounts for which properties, but I want to know they're all covered" - and this handles that fairly well.

(Also, we're talking about typo'd property names - I reckon it's fair to limit the discussion to objects).

@jdorn
Copy link

jdorn commented Feb 13, 2014

Given these two JSON objects:

{
  "prop1": 1,
  "prop2": 2
}
{
  "prop1": 1,
  "prop2": "2"
}

It seems like I should be able to use the following schema to represent this:

{
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "prop1": { "type": "number" }
  },
  "requires": ["prop1"],
  "oneOf": [
    {
      "properties": {
        "prop2": { "type": "number" }
      },
      "requires": ["prop2"]
    },
    {
      "properties": {
        "prop2": { "type": "string" }
      },
      "requires": ["prop2"]
    }
  ]
}

According to version 4 of the spec though, nothing can ever validate against this schema since the oneOf requires prop2 to be defined and the additionalProperties requires prop2 to not be defined. This is unintuitive behavior. It seems like additionalProperties in this case should take oneOf into account when looking at which properties have been validated.

The "ban unknown properties" mode can partly fix this issue, but what if you want to embed the above schema into another one where you don't want the "ban unknown properties" mode to apply? There doesn't seem to be a way to do that.

@rage-shadowman
Copy link

I heard someone mention elsewhere that the type parameter should accept an array: "type": [ "string", "number" ]

See: http://json-schema.org/latest/json-schema-validation.html#anchor79

In section 5.5.2.1:
"The value of this keyword MUST be either a string or an array. If it is an array, elements of the array MUST be strings and MUST be unique."

@fge
Copy link

fge commented Feb 13, 2014

@jdorn

According to version 4 of the spec though, nothing can ever validate against this schema since the oneOf requires prop2 to be defined and the additionalProperties requires prop2 to not be defined. This is unintuitive behavior. It seems like additionalProperties in this case should take oneOf into account when looking at which properties have been validated.

This would mean "hacking" {one,all,any}Of to behave differently depending on whether you validate JSON Objects or not. And that is the whole problem.

@geraintluff's solution is to allow two validation modes -- for objects. And that is also a problem to me. It is not a behaviour enforced by the schema itself.

Sure enough, my solution requires generating schemas at runtime, but honestly I don't see the problem. After all, if you use $ref, you are also free to inline schemas at runtime if you want (although you must be careful since this can lead to expansion loop and/or information loss).

I still think my idea is the better one because it presents no ambiguity ;)

@dawidcha
Copy link
Author

Thanks for the responses - I acknowledge that this is seen as a concern.

As I read, there are three possible solutions proposed:

  1. Adjust the meaning of 'allOf' in the so that it can be used with object schemas with "additionalProperties": false
  2. Add an (optional) flag to validators so that the default for 'additionalProperties' becomes false
  3. Implement a schema preprocessor/patcher which makes it possible to merge schemas 'on the fly'.

So laying out my store, as it were: I can see the attraction of 2 because it means that individual validator implementations can make up their own mind whether to support this, but this also means that a schema could validate successfully or unsuccessfully depending on a setting on the validator (which may or may not be available). This feels like a bad thing. Going the preprocessor route has similar challenges I think: the attraction is that we don't have to change the spec, but the downside is that we lose the certainties that the spec brings. It also seems to me that preprocessing is actually just doing the same thing as 1, while avoiding explicitly altering the spec.

Meanwhile, I totally understand that it is a big deal to break the symmetry afforded by the simple definition of 'allOf' - all schemas must validate, no exceptions. However, there's clearly a need to do something with this - and at least there's no risk of anyone relying on the extant behavior: because as it stands including more than one object schema with "additionalProperties": false in an 'allOf' clause would always fail.

Incidentally, this topic is very much related to the 'extends' keyword that was dropped from verion 3 of the schema. I believe, and correct me if I'm wrong, that this was done to avoid use of polymorphism in which agreeing to validate a base schema implicitly commits to validating any schema that derives from it (bad). However, in practise, notwithstanding the risks of using 'mixin types' in data modelling, the alternative often means a lot of duplicated code that must be kept in sync.

Another way round this, without altering the spec, that's been suggested to me would be to insist on aggregation instead of mixing in. For example instead of:
{
"allOf": [
{ "$ref": "schema1" },
{ "$ref": "schema2" }
]
}

we use
{
"array": {
"items": [
{ "$ref": "schema1" },
{ "$ref": "schema2" }
]
}
}

Logically, this amounts to almost the same thing - the difference is if there are properties that overlap which would necessarily appear twice in the second schema, and once in the first. However, there are two reasons to prefer the first:

  1. Some client libraries (I'm thinking backbone), struggle with nested javascript objects
  2. Sometimes you want for overlapping properties to be represented only once in the overall schema.
  3. The first way feels more natural

@fge
Copy link

fge commented Feb 14, 2014

OK, I have another proposal. That would be a new keyword, propertiesFrom. Example:

{
    "type": "object",
    "propertiesFrom": { "$ref": "link://to/other/schema#" },
    "properties": { "other": { "properties": "here" } },
    "additionalProperties": false
}

The value of propertiesFrom would be a JSON Schema. Target: object instances only. In effect, this keyword would compel the validator to extract properties from said schema and no other keyword (not even required) and validate them alongside the other defined properties. It also has an effect on additionalProperties, since the required property set is then the union of propertiesFrom and properties.

Another possible keyword: strictProperties. Value: boolean; if false, no effect; if true, it would be the equivalent of combining additionalProperties and required.

Comments?

@dawidcha
Copy link
Author

That would work

The negative is that it introduces a couple more keywords into the standard. So assuming we go for the maxim less is more, what creates more complexity, asymmetry in the logic for 'allOf', or extra keywords?

I'd argue that since 'allOf', when used with objects with additional properties set to false has little utility, the equation sits on the side of making the meaning of 'all of' change with objects.

However, that said, I'd support anything that presents a workable solution

@wking
Copy link

wking commented Jan 8, 2015

Any progress on this? propertiesFrom sounds fine to me, as does special additionalProperties handling for allOf (using something like the ban unknown properties semantics when any schema in the allOf has additionalProperties set to false).

@fge
Copy link

fge commented Jan 8, 2015

@wking well, since then I have taken another route; and I strongly disagree with the "ban unknown properties" mode, it's a hack (note that it cannot be enforced at the schema level).

My proposal is instead to implement schema patching via either "$merge" or "$patch"; please see the Google group for more information, I have posted threads recently about this.

@wking
Copy link

wking commented Jan 8, 2015

On Thu, Jan 08, 2015 at 12:10:39PM -0800, Francis Galiegue wrote:

My proposal is instead to implement schema patching via either
"$merge" or "$patch"; please see the Google group for more
information, I have posted threads recently about this.

Links to recent threads about $merge and $patch:

  • How is the proposal for $merge and $patch going? 1
  • Status of $merge and $patch, please 2.
  • $patch and $merge, an explanation -- plus strictProperties 3.

@wking
Copy link

wking commented Jan 8, 2015

And the thread kicking off $merge and $patch seems to be 1, although
I'm not sure why it moved to the mailing list from here ;).

@JulienPalard
Copy link

I just got hit by this problem and I'm not a big fan of trying to stick on allOf nor the "merge": {"source": ..., "with: ...}.

First: allOf seems semantically false, we're clearly not requiering "all of the given schema" to be true as it's impossible due to their "additionalProperties", and I'm against "adding an if to cope with this".

Second: I first though of merge before reading the mailing list, but this way:

"merge": [
    { "$ref": "schema1" },
    { "$ref": "schema2" }
]

This way we'll be able to merge more than two schemas, which may have a usage.

Also it looks a lot like:

{
"array": {
    "items": [
        { "$ref": "schema1" },
        { "$ref": "schema2" }
        ]
    }
}

but clear / explicit / readable (I personally, as a newcomer of jsonschema, don't understand why the "array / item" would do the trick for now, I'll have to read the doc again ^-^)

@epoberezkin
Copy link

@dawidcha I agree that the current definition of additionalProperties in the standard causes a considerable confusion. See for example this recent issue about it. The problem is that we try to assign language semantics to the rules that are defined in a very strict, shallow way (i.e. not affected by anything but siblings) and doing so we confuse ourselves.

I nevertheless think that bending the standard to solve the confusion problem is an absolutely terrible idea. There are a lot of applications and schemas relying on this strictly defined behaviour, regardless how intuitive it is, and making the same keywords mean different things in the next version of the standard will make migration very difficult for a lot of people. So I think all currently defined keywords should keep their meaning and the confusion problem should be solved with education - there are plenty of materials around. I think even removing keywords is a bad idea, but changing them is just terrible...

We can extend the standard to allow it to solve the problems that currently are difficult to solve. I think "unknown" properties concept proposed by @geraintluff can be a good idea, although I am yet to see the case where the schema cannot be refactored to achieve what you want to achieve without it. At the same time, refactoring often hides the meaning of the schema which is not a good thing.

I agree with @fge that this is the "mode of validation" rather than the part of the schema makes it a bit strange and almost out of the scope of the standard. I think we could introduce the keyword "unknownProperties" that can be only boolean (no need to make it accept the schema I think :), that would take into account all properties defined in properties/patternProperties/patternGroups both on the same level and in all subschemas in allOf, anyOf, oneOf and maybe even "not" and "switch" keywords.

That downside is that this approach would require to use this keyword on all levels where such restriction is necessary. The upside is that it would allow flexibility, while validation mode won't. Validators still could implement it as the mode, as z-schema did with additionalProperties and many other keywords. But it would be outside of the standard.

The complexity comes from the question what to do with properties defined in $refs that are used inside allOf etc. @geraintluff, does your current implementation of "unknown properties" take properties from refs into account? What if they are recursive?

So all this complexity makes me love the current shallowness of the standard and holds me back from implementing unknown properties in Ajv

@fge
Copy link

fge commented Feb 23, 2016

OK, I am not an active member of the process anymore, but might I just mention that my patch proposal would just solve all of these miseries?

@epoberezkin
Copy link

I missed those ideas. It works I think, can probably be simpler.

@j-funk
Copy link

j-funk commented Apr 14, 2016

+1 for propertiesFrom solution proposed by @fge above.

@handrews
Copy link

@dawidcha
This repository has been deprecated, could you please close this issue (none of us have permissions for it anymore).

There has been further discussion of this general topic here: https://groups.google.com/forum/#!forum/json-schema

You can join the current work at https://github.com/json-schema-org/json-schema-spec if you'd like to continue discussing this.

@Julian Julian closed this as completed Dec 29, 2016
kanitw referenced this issue in vega/vega-lite Feb 4, 2017
cc: @domoritz -- I don't understand why output for this one correctly validates  examples while using `&` is wrong.  Maybe this is possibly a bug.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests