Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default specification (6.2) needs fleshing out #204

Closed
pipobscure opened this issue Dec 25, 2016 · 15 comments
Closed

default specification (6.2) needs fleshing out #204

pipobscure opened this issue Dec 25, 2016 · 15 comments

Comments

@pipobscure
Copy link

pipobscure commented Dec 25, 2016

The specification for default is insufficient in terms of its requirements for implementations.

At minimum we have to think about the implications for further validation. The current specification suggests that a value for default that does not validate against the provided schema. However there are more implications presented by default. For example the effect of default if the property is also required. If default does not satisfy required then its purpose is unclear at best and misleading at worst.

I have created a PR against the test-suite with this in mind.

In my opinion the specification should be clearer by stating that the value for default MUST be valid according to the provided schema. The presence of default should also satisfy other schema criteria such as required.

I am maintaining a JS implementation (draft3 & draft4) for corporate in-house use (open-sourcing is in progress). It has the option to actually modify the original JSON-Document, or alternatively keep it untouched but continue validation as if the default value had been there all along. We have yet to come across a situation where this was insufficient or unclear.

@pipobscure
Copy link
Author

@handrews
Copy link
Contributor

@pipobscure I agree that we need more clarification.

An important point that needs addressing somewhere (whether in the spec or on the web site in supplementary documentation) is that default (like title, description, and examples) has no direct impact on validation. An implementation may choose to use the default value in whatever situations (pre-populating a UI input form or filling in missing values in a representation in a request or response), but that is application-specific and independent of validation.

Non-validating defaults

Obviously, if you use a default value in an instance but it does not validate, that would be strange. But so would several other things that are possible with JSON Schema. The right model here is probably linters vs compilers/interpreters. A schema linter should warn about a non-validating default or example, but it's not technically incorrect and may be useful in some odd circumstance. This was previously discussed in #125.

The main reason to avoid a MUST here is to avoid imposing an implementation requirement during validation when this is really not a validation keyword (see also "Related topics" below).

default and required

As for required, since a default is not necessarily used by every application, the presence of a default does not automatically satisfy required.

The same application may use a default to pre-populate user input (in which case it will end up satisfying the requirement if the user does not change it), but also decline to use the default at the request/response processing layer. So if a representation hits that layer without a field, some applications may prefer to raise an error rather than writing in the default.

Presumably, the desired usage would be documented in description.

Related topics

A separate metadata/annotation vocabulary has been proposed in #136, and one topic to consider there would be whether to leave any of the metadata/annotation keywords in the validation vocabulary. It might make things more clear to separate them.

There is also an ongoing discussion about wording related to defaults for the schema keywords themselves in PR #171.

Suggested path forward.

We should add language stating that whether and how the value of default is used is application-defined. I think that will make it clear enough why required is automatically satisfied. It's less clear to me how to concisely explain why it doesn't necessarily need to validate. That may need to go in a more verbose guide on the web site.

@Relequestual
Copy link
Member

No work to be done on the 25th!!! Tsk!

p.s @handrews I'm not in the office next week.

@handrews
Copy link
Contributor

@Relequestual good! :-D
Yesterday was celebratory, today I'm pretty much not getting out of bed. Why that includes responding to GitHub issues I don't know but apparently it does :-)

@pipobscure
Copy link
Author

@handrews in the case that default is just metadata/annotational, then there shouldn't be anything in the test-suite for it, since it's pretty-much-ignored/simply-informational. (Same as there is nothing in the test-suite for title). I can live with that.

In my validator, I have an option to trigger default actually modifying the document. That way if it's pure validation, it is ignored (treated as an annotation) but when the default-switch is turned on, it is used to copy the default value into the original document. In that mode, the validator can be used to create a valid document (or augment an existing one) in this mode it also validates the default value.

I think this is a sensible way to implement this, but I'd really like to clarify that this is "legitimate".

If someone let's me know the procedure (or points me to a place to where I can look them up) I'd volunteer to make an attempt at a PR for the spec.

(Also: isn't the point of celebratory days to mess with stuff like github issues for the pure fun of it? 😄 )

@handrews
Copy link
Contributor

I think this is a sensible way to implement this, but I'd really like to clarify that this is "legitimate".

We don't (yet) do a great job of separating the different things going on under the general umbrella of "JSON Schema", although the split into three specs was a good start that just needs more follow-through.

There are a couple of things one can implement with respect to JSON Schema:

A validator, strictly speaking, just uses the validation keywords (everything except the meta-data section) to deliver a validation result.

Beyond that, we can implement various applications and helpers, including but not limited to:

Each of those might use default in a different way. The documentation use is obvious. The UI use case probably pre-populates an input field, so it uses the default before the user is involved in the system. Once the user supplies instance data, the default is not examined further- either the user accepted it or they didn't, and the input processing won't change it back even if the user deleted it (which is legal if the field is not also required).

In the API case, the default might be used after the user (which may be a program rather than a human) sets up the data, to fill in whatever data is missing. This might be done to reduce the representation weight on the wire (although the actual benefit of this "optimization" is usually so small as to be irrelevant unless you're in a really constrained environment).

So is what you're doing legit? Well, yes, but in the sense of it being one legit way of using the default, and not the only one. Up to you what use cases you are targeting with your work.

For instance, I am working on a generic hyperschema client. In addition to running a validator module, it will use the defaults to fill in under-specified messages. The schemas that this client works with will be written and documented such that it is never ambiguous as to whether the default should be used or the field was intentionally omitted. But that use of the default will be in the representation management part of the code (the actual application-specific code), not in the validation module (which, depending on the language, might actually be a 3rd-party module).

isn't the point of celebratory days to mess with stuff like github issues for the pure fun of it?

😄😄 Indeed!

@pipobscure
Copy link
Author

@handrews the differing uses for default make me really queasy. In that case it becomes possible that differing assumptions are made as to what the use will be. Rather than specify default as basically unrestricted making it essentially useless for all use-cases, I'd rather create some name-spaced extensions that allow for concurrent different use-cases.

At least that would allow schemas to remain portable. In the current case, schemas loose their portability, since an app that requires default to be valid and an app that requires default to be invalid would require different schemas.

@Julian
Copy link
Member

Julian commented Dec 26, 2016

That is not lack of portability, that is adherance to a spec, perfect or not.

@pipobscure if I haven't said it already please don't misinterpret anything as ingratitude, your input is more than welcome. But you need to understand first what is, and then why it is, and then whether it needs changing.

@pipobscure
Copy link
Author

pipobscure commented Dec 27, 2016

@Julian I agree that it's adherence to spec, but I do have an issue with "perfect or not". I was under the impression that the point of the repo was to move the spec closer to perfect 😄

I understand the what is: the default is RECOMMENDED to be valid, which means it should be valid unless you have a good reason not to be. (See other linked issues and their discussions with reference to the meaning of RECOMMENDED in rfcs)

I also understand why it is: the intent was to allow a bit of flexibility in the use of the keyword especially in regards to the use of the schema. If used to pre-fill forms/ui there might be different needs as when it's used for documentation and those again may be different from generating a "defaulted" complete document.

I believe that the answer to whether it needs changing is yes.

Just to be clear: I am NOT arguing for a simple change from RECOMMENDED to MUST. That has been discussed all over the place ad-nauseam. I suggest we need to make clearer what values for default are acceptable.

My personal preference would be that: the value value for default must be such that if the property is undefined then substituting the default value MUST validate when taking into account the schema for the property. This sounds very similar to replacing RECOMMENDED with MUST but there is an important difference.

Simply replacing RECOMMENDED with MUST would mean that:

{
  "type": "object",
  "properties": {
    "foo": {
      "bar": { "type": "integer", "default": 1 }
    }
  },
  "required" : [ "foo" ],
  "default: {}
}

would be invalid, since the default value itself would not be strictly valid. My recommendation would be that default if present at all, has to be able to be used as a substitute that can be used to validate the resultant document.

Now back to the question of whether it needs changing.

As I said, I believe the answer to be yes.

I believe that the current wording suggests that while there is a clear preference for the value of default to be valid exists, there is nothing to prevent you from supplying a value that is entirely different.
This might be OK when your perspective is that of someone writing a JSON-Schema for a specific purpose such as validation or documentation. However when writing a schema generically this becomes insufficiently clear. Differing use-cases will have different expectations to what the value is and none can assume anything specific about the value.

  • The documentation engine cannot assume it's something suitable to use in output
  • The validation engine cannot assume it's something suitable to use to validate
  • The default generation engine cannot assume it's something that will be useful in generating a valid document

In other words, none of the engines/use-cases are actually able to rely on the value in any way. Unless you specifically craft the schema to fulfil the use-case and you are even aware of the use-case a-priori you have no sensible way to use that value.

That in turn leads to the situation that if you want to be able to use a value you have to add more specification. Either by restricting default further (which would lead to valid schemata to become invalid) or by adding a distinct custom keyword that fits the need of your use-case.

That in turn leads to the default keyword being unable to be used for any use-case rather that being the keyword used for all similar use-cases. Which then leads to the specification being worse that if it wasn't there at all.

So to my view, the current definition of `default is so deficient as to be worse than useless and a blotch on an otherwise good specification. There are two ways forward in my view:

  1. remove the default keyword entirely from the specification
  2. make the default keyword actually something well-defined

Right now, I think it's just a case of being to chicken to say "actually this thing MUST to be valid", because there might be some theoretical case where someone would want it not to be valid.

P.S.: If I haven't said it already: Please don't misinterpret anything I write as a personal attack on you or your work. I believe having an open debate is the way to produce the best OSS and Specs. I deeply respect all the work that has already gone into this and all those who have done that work. The only reason I am being argumentative is out of a desire to make this thing better. Once there is a clearer agreed way forward, I am volunteering to actually do the work as well.

@handrews
Copy link
Contributor

@pipobscure There is nothing wrong with a system that makes use of JSON Schema putting extra constraints on how it is used. That is to be expected. A UI generation system will have different needs from hypermedia description, and both are different from static documentation generation. There is no "generic" way to use schemas and a given schema might not be suitable for all three.

There is also the question of: why does the value of default need to be reliable? What is wrong with just raising an error if a specific application finds that the default does not meet its requirements?

There are many similar cases around JSON Schema. Structural validation may pass, as may whatever semantic validation can be achieved through "format", but the application may produce an error because, while syntactically valid, the document is not suitable in some other way (it conflicts with the application state, or some complex constraint that is either not possible to express in JSON Schema or just too complicated to add to the standard).

Just because the spec allows a nonsense value there doesn't mean your application or tool has to do anything specific. If your system is documented to write the default into the instance before validation, then obviously anyone expecting to use that system needs to use a valid default. And your system in that case can easily validate each default and raise a sensible error if it fails validation.

@handrews
Copy link
Contributor

@pipobscure I would be more in favor of individual vocabularies defining how to use "default" or something similar. To the extent that I have a problem with "default" it is not that it doesn't necessarily validate, it is that there is no real guidance on how to use it. In its current location (in the validation spec) there is nothing that we can say about how to use it.

But in the hyper-schema spec we could say things like:

  • In a request (such as given by "schema" in the LDO), a default will be used on the server side to fill in missing portions of the request. Such a default MUST validate.
  • In a response (such as provisionally given by "targetSchema" and canonically set by the response's profile media type parameter or describedBy link, the client will use the default to fill in missing portions of the response. Such a default MUST validate.

In a UI Schema spec, we could say:

  • In a UI form generating schema, input will be pre-populated with the default value. The default need not validate, but SHOULD be clear whether it is a usable default or a placeholder instructional value

In a documentation schema spec, we could say:

  • The default MAY be a default value or MAY be a description of the default behavior. Default behavior need not match any specific possible value. It SHOULD be clear whether the given default is a value or description.

Alternatively, we could drop the notion of a global default and instead create specific keywords for specific behavior. The driving consideration there should be how likely it would be for someone to need to re-use the same schema in such a way that two conflicting uses of "default" would be required.
For instance, in a UI vocabulary, rather than a literal default value you might want placeholder text.

@epoberezkin
Copy link
Member

I think shrug label ¯_[ツ]_/¯ is timely here :)

@pipobscure
Copy link
Author

Alternatively, we could drop the notion of a global default and instead create specific keywords for specific behavior. The driving consideration there should be how likely it would be for someone to need to re-use the same schema in such a way that two conflicting uses of "default" would be required.

I'd be very much in favour of this. At a minimum it would make things a lot clearer and make schema compatibility a lot easier.

Just for background, and why I might be a bit more passionate about default than is reasonable 😄 . We do have an application where we need both behaviours at the same time now. When we started, we used default for setting a value since we were only validating and creating objects by setting defaults. We then merged with another team that was building a UI. Now we have the situation that are basically maintaining 2 almost identical schemas with the only difference being the values in default. We have the situation under control, but it would have been nice if this was better specified.

@handrews
Copy link
Contributor

@pipobscure that situation with the two teams is very good to know about, thanks! It's always helpful to see how things happen in the real world, especially unexpected things.

I also just wrote up the beginnings of an idea for splitting the definition of annotation keywords like "default" from requirements on how they are used in various situations in #136. I know that's not your preferred solution but if you have any particular thoughts on it please comment there.

I'm going to (as @epoberezkin suggested) tag this issue with the shrug label (meaning "too long and wandering for most people to read). If the idea of splitting up "default" is appealing, I can file a PR to specifically track that idea (or you're welcome to). If there's other stuff to keep discussing here we can do that, otherwise best to close this one out, I think.

@Relequestual
Copy link
Member

Closing in favour of someone really wanting this who can clearly answer the number of use cases and issues @handrews rasises in this issue, to raise a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants