Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support declaring that all defined properties are required #659

Closed
mfulton26 opened this issue Sep 28, 2018 · 22 comments
Closed

Support declaring that all defined properties are required #659

mfulton26 opened this issue Sep 28, 2018 · 22 comments

Comments

@mfulton26
Copy link

I would like a way to make all properties defined in a schema to be required without having to explicitly list them.

@mfulton26 mfulton26 changed the title Support declaring that all define properties are required Support declaring that all defined properties are required Sep 28, 2018
@mfulton26
Copy link
Author

Supporting "required": true on a property schema would work for me because what I am really trying to avoid is simply needing to repeat the property names.

@handrews
Copy link
Contributor

There are concerns over this, including:

  • We have avoided keywords that depend on each other, as they are generally more difficult to work with
  • It's not entirely clear how that would interact with patternProperties
  • People then want an optional keyword to carve out exceptions, and it gets messier from there- past discussions have bogged down on this IIRC.

On the plus side:

  • We now have a way (annotations) to formally describe keyword interactions rather than just making them all special cases, which was part of the problem in the past. We already now define a "properties" annotation and define additionalProperties and unevaluatedProperties in terms of that annotation (see "unevaluatedProperties" to facilitate re-use and better schema organization #556 and the current spec on master). So we could do that with "required": true
  • Saying that patternProperties is not affected by this would not be that weird. It's rare to try to get required and patternProperties to interact anyway, usually required just goes with properties.

I'm at least tentatively open to this idea. Let's see how others chime in.

@jgonzalezdr
Copy link
Contributor

I've been using JSON schemas for defining JSON-based communication protocols for a couple of years, and one of the more common and recurrent problem that I face is with mandatory properties. Most of the time we have JSON objects with dozens if not hundreds of properties, all of then mandatory, and maintaining the "required" keyword is a nightmare: New schemas for properties are added under "properties" but left out of "required" by mistake, properties are renamed under "properties" but not in "required", etc.

In short: If I could vote, adding a simple way to indicate that all the properties under "properties" are required would have top priority.

Extending the "required" keyword to indicate that it accepts an array or a boolean, and that if it is a boolean with value true then all the properties under "properties" are required, is simple, straightforward and even backwards compatible.

Using annotations for the interaction between "required" : true and "properties" would be very "elegant", but currently the annotation for "properties" is only the set of matched properties, not the set of all properties defined.

As a consequence, either the annotation info will have to be extended to additionally include some kind of optional "type" to differentiate between "matched" and "defined" annotations for "properties", or the "properties" annotation result will have to become something more complex like a "tuple of sets of names".

I personally think that either option of using annotations would end up with a entangled natural language specification in every place were the annotation for "properties" would be used, unless annotation values abstraction is removed by specifying formal data structures for annotations (as proposed in #643 ).

Finally, another option could be defining a new keyword "requiredProperties" exactly like "properties" but which validates successfully only if all the names within the keyword value to appear in the instance. That way, using "requiredProperties" instead of "properties" would fulfill the need. However, the drawback would be the interaction mess of property-related annotations if this keyword was defined in the validation specs instead of the core specs, but defining it in the core specs has no sense.

@jgonzalezdr
Copy link
Contributor

BTW, if just modifying slightly the "required" keyword specs, they could be something like:

The value of this keyword MUST be either a boolean or an array. If it is an array, elements of this array, if any, MUST be strings, and MUST be unique.

If this keyword is an array, an object instance is valid against this keyword if every item in the array is the name of a property in the instance.

If this keyword has boolean value true, and the "properties" keyword is present in the same subschema, an object instance is valid against this keyword if for every name within the "properties" keyword's value a property with the same name is present in the instance.

Omitting this keyword, setting this keyword with boolean value false, or setting this keyword with boolean value true when the "properties" keyword is not present in the same subschema, has the same behavior as an empty array.

@Relequestual
Copy link
Member

I've actually seen more than one instance where someone has put required: true in properties and expected that to work. Obviously that is NOT what we want.

Cany anyone see any issues with allowing required to also have a boolean true value to mean "all keys in properties are required"?

I haven't read @jgonzalezdr suggested patch in detail, but this IS more than an n = 1 problem, which seems to have a reasonably simple solution.

@jgonzalezdr
Copy link
Contributor

As commented in the PR:

Maybe the solution could be creating a new "requiredProperties" keyword with the semantic that I just proposed above for "required", which by the way would have a "naming convention" homogeneous with the rest of validation keywords for properties.

With this more meaningful keyword name the expected / implicit semantics would be a bit more evident (also explicitly indicating that the keyword applies exclusively to properties): "requiredProperties" : true can be easily understood as "all defined properties are required", "requiredProperties" : false as "no properties are required", and "requiredProperties" : ["a", "b"...] as "the listed properties are required".

The "required" keyword could be deprecated at some point, as it is a bit misleading. Anyway both could coexist during some time if needed.

@awwright
Copy link
Member

I mentioned elsewhere

Unfortunately this was previously defined as a boolean, but with different semantics, and this behavior is implemented in more than a few packages for reverse compatibility, e.g. tdegrunt/jsonschema.

I've once suggested having a "requiredProperties" keyword that functions the same as "properties" does, except that all the keywords there are also required. There was an argument against this too and I forget what it was.

@handrews Do you remember what you commented on that idea, if any?

@jgonzalezdr
Copy link
Contributor

I've actually seen more than one instance where someone has put required: true in properties and expected that to work. Obviously that is NOT what we want.

@Relequestual: I've also seen schema writers misinterpret it that way. If the keyword was "requiredProperties", maybe the plural form will be a hint strong enough to realize that it does not apply inside the schema for properties but a higher level instead.

@handrews
Copy link
Contributor

handrews commented Nov 24, 2018

@awwright I probably objected to the fact that requiredProperties would be a hybrid applicator + assertion, and my general feeling is that hybrid keywords are vexing.

However, we've worked with keyword classification a lot more since then, and the idea is worth revisiting.

@jgonzalezdr I'm not sure I understand this concern:

Using annotations for the interaction between "required" : true and "properties" would be very "elegant", but currently the annotation for "properties" is only the set of matched properties, not the set of all properties defined.

In terms of validation, there is no practical difference between using the matched vs defined properties.

For code or UI generation, there is no concept of matched properties (matched against what? the whole point of these use cases is that there is not yet an instance). We will need to address that issue for what I call "generative" use cases anyway, and I don't think it would be any harder here than it already is.

Additionally, much of the work on generative vocabularies will be adding keywords that disambiguate problematic validation keywords that really only function usefully in the context of an instance. This may be another case where generative use cases simply use an additional keyword that has no effect in instance-based use cases.

@handrews
Copy link
Contributor

Ugh, I meant that requiredProperties would be a hybrid applicator + assertion (not annotation as I originally wrote- I edited the text but commenting here for folks reading this through notification emails).

Addressing @jgonzalezdr 's point:

However, the drawback would be the interaction mess of property-related annotations if this keyword was defined in the validation specs instead of the core specs, but defining it in the core specs has no sense.

I'm not entirely sure where I'd put it. My first impulse was core, but then reading this I can see the argument for validation. This is a bit of a warning "design smell", but I'm not sure that the problem is with the keyword itself. I'll think on this more.

@awwright
Copy link
Member

awwright commented Nov 25, 2018

I don't have any problem with a little bit of sugar for authoring convenience.

We typically try to factor apart keywords as much as possible (e.g. there's a reason you have to specify { type:"string", minLength:1 }, as two keywords, to assert a non-empty string).

If we wanted to take this to its logical conclusion, we could specify a keyword like { minLengthString: 1 } that also implies type:"string" but there's almost zero demand for such a feature.

Unlike my minLengthString hypothetical, duplicating property names seems like the code smell, and as such, a keyword that reduces duplication of keyword names seems reasonable to me.

@handrews
Copy link
Contributor

@awwright @jgonzalezdr my thought right now is to collect more community feedback here on the two approaches: a "magic" boolean value for required vs requiredProperties. And possibly other ideas, now that we have started considering this seriously.

So I'd probably prefer not to put it into draft-08 unless we quickly get a lot of feedback. I want to focus on finishing $vocabulary and getting draft-08 out the door as soon as we can (my end-of-year deadline is already looking unlikely).

Draft-09 will be much smaller, as draft-08 really dug into the hardest problems. I expect it will be a bug fix / feedback incorporation draft, plus features like this that just need a little more discussion.

I know we slipped in contentSchema very late recently, but the content* keywords are an obscure corner of the spec, while anything regarding object structure is heavily used. Also, contentSchema is a direct analogue of two other schema-for-a-media-type-instance keywords in Hyper-Schema, so it's not really introducing anything novel.

@awwright I strongly agree with your code smell comment for setting the bar for this sort of keyword- the problem of keeping required in sync is well-documented, and its contents are literally present in another keyword. minLength's value is distinct from that of type, and there is no synchronization problem, really.

@handrews
Copy link
Contributor

BTW I am open to putting a solution into draft-08, I'm just unsure of which and would like to be cautious in such a core area.

@daniwelter
Copy link

We have come across the a similar use case as @jgonzalezdr a couple of times recently in the HCA metadata:

During schema edits, properties were removed or renamed but the update was not propagated to the required array. We therefore had properties declared as required that were not defined under properties{}.

These schemas were not flagged as invalid but as all our schemas use additionalProperties:false, no document can be successfully validated against such a schema as it would either be missing a required property or contain an undefined additional property.

We would appreciate any advice on how to handle this short of hand-rolling hacks into our validation process.

@jgonzalezdr
Copy link
Contributor

Another option for solving this issue, maybe a bit more less disruptive, would be extending the current behaviour such that required array elements are considered regular expressions.

This approach would be a lot more flexible, being almost backwards compatible (except rare property names clashing with regex special characters).

However, some dependency on the contents of a sibling properties keyword would still exist.

Some examples:

  • All properties are required: "required": [".*"]
  • All properties begining with "js-" are required: "required": ["^js-.*"]
  • All properties not begining with "x-" are required: "required": ["^(?!x-).*"]

@Clemens-U
Copy link

For me, this is something that should be solved at the tools level. JSONBuddy as a schema editor gives you a warning if a required property is not defined at the related schema level. This way you will see immediately the issue while you are editing the schema and this helps saving time. Another advantage, you can catch this error even if you don't have all possible combinations of sample data available (which will rarely be the case) and validated. Maybe your test data doesn't cover all of the schema definitions? From a practical perspective, you need a tool to solve this.

Anything else can only be handled at the time of validating the JSON data which is already at production level in the worst case.

@jgonzalezdr
Copy link
Contributor

jgonzalezdr commented Dec 9, 2018

A mistyped property can be detected by a authoring helper tool, but forgetting to mark a property as required when you already have several dozen properties cannot be detected by a tool.

My personal experience when defining protocols or storage formats not only with JSON, but also with other languages like XML or ASN.1, is that 95% of the time properties (or their equivalents in other languages) are required. I think that not by chance, most schema languages by default consider that defined elements are mandatory, unless explicitly defined as optional (e.g. DTD, XSD, ASN.1).

However, JSON Schema design is to make properties optional. This is fine, it's a design decision that has most probably a good reason behind. However, given that many schema authors will need many times all of the properties to be required, I think it is very interesting to let that design decision being reflected in a formal and machine verifiable way in the schema.

I also think that we should take in account that many times schemas are also read by humans, so keeping verbosity reduced is also good: If I have to review a schema and understand which of the 83 properties are required and which not...

IMO, it is much more dangerous going to production with something that should be required marked as optional, than the other way around. As I said before, many times, because of the lack of time, only positive cases ("happy path") are tested. When that happens, properties that should be optional are detected, but properties that shouldn't be optional are not detected.

@jgonzalezdr
Copy link
Contributor

For those following this thread through email, edited: "IMO, it is much more dangerous going to production with something that should be optional marked as required, than the other way around" => "IMO, it is much more dangerous going to production with something that should be required marked as optional, than the other way around".

@awwright
Copy link
Member

@jgonzalezdr I think there's multiple use-cases for JSON Schema and that's one of them. Especially in comparison to a database, where you have to specifically label a column as NULLable.

But there's also many cases where the default is nothing is required, all unknown . Especially protocols, including the biggest application protocols like MIME and HTTP.

I just want to be careful to not prefer one style over the other.

Also note, in previous versions of JSON Schema, you could do this:

{
    patternProperties: {
        "^js-": { required: true }
    }
}

@handrews
Copy link
Contributor

@awwright what did "required": true even mean within "patternProperties"? Was it a pattern-specific "minProperties": 1? (There's an issue for patter-related minimums/maximums but it bogged down well over a year ago and no one wanted it enough to sort it out).

@awwright
Copy link
Member

... good question
You're right, nothing happened when you did that.

@awwright
Copy link
Member

awwright commented Feb 6, 2020

This issue is re-summarized and reopened at #846

@awwright awwright closed this as completed Feb 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants