Skip to content
This repository has been archived by the owner on Mar 19, 2019. It is now read-only.

[v5 proposal] choices (improvement to enum) #211

Closed
nemesifier opened this issue Dec 1, 2015 · 57 comments
Closed

[v5 proposal] choices (improvement to enum) #211

nemesifier opened this issue Dec 1, 2015 · 57 comments

Comments

@nemesifier
Copy link

Problem

the enum attribute for strings is really useful but has a severe limitation: it does not support providing a human readable label for raw values.

I tried workaround solutions but I prefer to stick to enum for its simplicity and for the fact that UI implementations (like JSON Editor) can very easily generate an HTML select with the allowed values.

Unfortunately though, enum cannot provide human readable values for decoration and documentation purposes (just like title and description properties included in the json-schema validation RFC), which makes it difficult to work with a long list of raw values that differ greatly from the corresponding human readable ones, for example, in a list of timezone settings for the entire globe, how can a programmer remember that "SAST-2" means "Africa/Mbabane" and "MST7MDT,M3.2.0,M11.1.0" means "America/Cambridge Bay".

The human readable value can be used for different purposes:

  • decoration
  • documentation
  • UI generation
  • validation error message generation

Solution proposed

Many popular frameworks like Django (python), Ruby on Rails (ruby) and Symfony (php) support specifying a list of choices, which are usually either a list of tuples or a dictionary (associative array), in which one element is the raw value while the other is the corresponding human readable label.

An example of a subset of choices for timezones:

[
    ["UTC", "Coordinated Universal Time"],
    ["GMT0", "Africa/Abidjan"],
    ["GMT0", "Africa/Accra"],
    ["EAT-3", "Africa/Addis Ababa"],
    ["WAT-1", "Africa/Bangui"]
]

Backward compatibility

Adding a "choices" (or an equivalent) attribute can be backward compatible with the current draft (v4):
if both enum and choices would be present, choices would be used.
if only enum is present implementations can still support it for backward compatibility.

Full backward compatible example:

{
  enum: ["UTC", "GMT0", "EAT-3"],
  choices: [
    ["UTC", "Coordinated Universal Time"],
    ["GMT0", "Africa/Abidjan"],
    ["EAT-3", "Africa/Addis Ababa"],
  ]
}

Advantages

Using a list of tuples (opposed as a list of objects) has the advantage of being a very simple, portable, and concise.

It's easier to maintain, faster to type and easier to read. Additional attributes can be allowed from the third element onwards if needed.

Another advantage is that this pattern is very well known in many MVC frameworks and it will result familiar to a lot of developers.

Downsides

The main downside is duplication of efforts when backward compatibility is needed: In an eventual transition from v4 to v5 both enum and choices should be supported in implementations. Popular schemas that aim to maximum compatibility also should use both.

@nemesifier nemesifier changed the title [Proposal v5] choices (similar to enum) [v5 proposal] choices (improvement to enum) Dec 1, 2015
@RicoSuter
Copy link

@RicoSuter
Copy link

And I think this idea is better as you can have a name (for code generation) and a description:
https://groups.google.com/d/msg/json-schema/w_5mVYB7OHg/RrmYLJL_B_QJ

The problem with code generation:
RicoSuter/NSwag#14

@nemesifier
Copy link
Author

Too verbose and overcomplicated for my tastes, but as far as it is now almost anything would be better than enum.

@sgpinkus
Copy link

sgpinkus commented Feb 7, 2016

I like https://github.com/json-schema/json-schema/wiki/enumNames-%28v5-proposal%29 better. It's easier to maintain BWC with this. Suppose you were to follow the recommendation on the front end. The back end cares not for enumNames. The front end could just optionally use enumNames as a lookup table. I would further propose that enumNames does not have to be 1to1, but the front end gives preference to enumNames and falls back to the actual enum.

@nemesifier
Copy link
Author

@sam-at-github suppose you have a list of 50, 100 or 200 entries with raw values that differ quite a lot from human readable values, with the enumNames proposal the schema would become unreadable at best. How can you understand which human readable value is associated to the 39th element of the enum list? You can't! You have to resort to complexity. Complexity is bad, it leads to bugs and increased costs for maintainance. It slows down adoption.
The solution proposed here is nothing new, it is infact the status quo in most modern web dev frameworks: you have a raw value and besides to it the human readable value.

@sgpinkus
Copy link

sgpinkus commented Feb 7, 2016

I like the fact that the enum and the human readable representation are orthogonal in the alternative proposal. Although I don't really like using an implicit index match to find the associated human readable value as is done in the alternative. Would prefer something like:

{
  enum: [0,1,2,3],
  enumLut: [[0, "zero"], [2, "two"]]
}

The advantage is that its BWC. Since Json Schema ignores values it does not understand, the front end can exploit the enumLut. Front ends that don't implement that feature can ignore it. On the wire enumLut is irrelevant. It's an optional rendering instruction. That is the aspect I like about the alternative.

@nemesifier
Copy link
Author

Could you explain to me what's the difference between:

{
  enum: [0,1,2,3],
  enumLut: [[0, "a"], [1, "b"], [2, "b"]]
}

and my proposal:

{
  enum: ["UTC", "GMT0", "EAT-3"],
  choices: [
    ["UTC", "Coordinated Universal Time"],
    ["GMT0", "Africa/Abidjan"],
    ["EAT-3", "Africa/Addis Ababa"],
  ]
}

They're identical, the difference is just in the wording enumLut and choices

@sgpinkus
Copy link

sgpinkus commented Feb 7, 2016

If you say they are identical then they are. Sorry I misunderstood this part of your proposal:

if both enum and choices would be present, choices would be used.

When you put, enum, and the optional choices side by side as above it's more obvious what you mean. So yeah 👍 for choices as above. Not sure about choices as the keyword to use though. Then again, how valuable is it to actually encapsulate this in the spec?

@nemesifier
Copy link
Author

@sam-at-github I did not understand your last question.

Regarding the keyword, choices is widely used in the web development arena, see:

Ruby on rails which uses the same concept of choices but using the word options, which IMHO is still way better than enum, a word which comes from technical jargon. "Choices" or "Options" are natural language words which can also be learned faster by newer generations of coders.

@sgpinkus
Copy link

sgpinkus commented Feb 7, 2016

@sam-at-github I did not understand your last question.

Hmm, Ok so I think maybe your proposal and my interpretation is not actually identical then. Given:

{
  enum: ["UTC", "GMT0", "EAT-3"],
  choices: [
    ["UTC", "Coordinated Universal Time"],
    ["GMT0", "Africa/Abidjan"],
    ["EAT-3", "Africa/Addis Ababa"],
  ]
}

You say "if both enum and choices would be present, choices would be used.". I was thinking enum would always have to be present. A UI can choose to use choices to render the enum values in a more convenient way for humans. Validators do not have to understand choices. Its Meta data.

I think your saying, given the schema:

{
  choices: [
    ["UTC", "Coordinated Universal Time"],
    ["GMT0", "Africa/Abidjan"],
    ["EAT-3", "Africa/Addis Ababa"],
  ]
}

And json:

"UTC"

A v5 json schema validator should validate the json successfully because it knows what choices is?

Ruby on rails which uses the same concept of choices but using the word options, which IMHO is still way better than enum, a word which comes from technical jargon.

Hmmm, yeah but there is a difference. enum is strictly about data structure (and so is json schema for the most part). choices caught up with UI concerns. There is no choices in C. Pretty sure there is no choices in XSD either - but it does have enum.

And yeah, actaully I think "choices" is the best name for the concept, but again its not exactly the same as enum.

@saibotsivad
Copy link

I'm mostly a lurker, not a contributor, just trying to follow this discussion.

I don't think this is in the current schema, but I would have thought/proposed that you might get something like this for an enum with the choices type metadata:

timezone: {
  enum: [{
    type: "string",
    property: "UTC",
    description: "Coordinated Universal Time"
  },{
    type: "string",
    property: "GMT0",
    description: "Africa/Abidjan"
  }]
}

(Since the metadata isn't really part of the validation, I would have thought it'd be in a description property.)

Which would then have something like this as valid:

timezone: "GMT0"

Hopefully I'm not way off base on what's being discussed.

On February 7, 2016 1:45:06 PM CST, sam-at-github notifications@github.com wrote:

@sam-at-github I did not understand your last question.

Hmm, Ok so I think maybe your proposal and my interpretation is not
actually identical then. Given:

{
enum: ["UTC", "GMT0", "EAT-3"],
choices: [
["UTC", "Coordinated Universal Time"],
["GMT0", "Africa/Abidjan"],
["EAT-3", "Africa/Addis Ababa"],
]
}

You say "if both enum and choices would be present, choices would be
used.". I was thinking enum would always have to be present. A UI
can choose to use choices to render the enum values in a more
convenient way for humans. Validators do not have to understand
choices. Its Meta
data
.

I think your saying, given the schema:

{
choices: [
["UTC", "Coordinated Universal Time"],
["GMT0", "Africa/Abidjan"],
["EAT-3", "Africa/Addis Ababa"],
]
}

And json:

"UTC"

A v5 json schema validator should validate the json successfully
because it knows what choices is?

Ruby on rails which uses the same concept of choices but using the
word options, which IMHO is still way better than enum, a word which
comes from technical jargon.

Hmmm, yeah but there is a difference. enum is strictly about data
structure (and so is json schema for the most part). choices caught
up with UI concerns. There is no choices in C. Pretty sure there is
no choices in XSD either - but it does have enum.

And yeah, actaully I think "choices" is the best name for the concept,
but again its not exactly the same as enum.


Reply to this email directly or view it on GitHub:
#211 (comment)

Sent from my Android device with K-9 Mail. Please excuse my brevity.

@saibotsivad
Copy link

I'm mostly a lurker, not a contributor, just trying to follow this discussion.

I don't think this is in the current schema, but I would have thought/proposed that you might get something like this for an enum with the choices type metadata:

timezone: {
enum: [{
type: "string",
property: "UTC",
description: "Coordinated Universal Time"
},{
type: "string",
property: "GMT0",
description: "Africa/Abidjan"
}]
}

(Since the metadata isn't really part of the validation, I would have thought it'd be in a description property.)

Which would then have something like this as valid:

timezone: "GMT0"

Hopefully I'm not way off base on what's being discussed.

On February 7, 2016 1:45:06 PM CST, sam-at-github notifications@github.com wrote:

@sam-at-github I did not understand your last question.

Hmm, Ok so I think maybe your proposal and my interpretation is not
actually identical then. Given:

{
enum: ["UTC", "GMT0", "EAT-3"],
choices: [
["UTC", "Coordinated Universal Time"],
["GMT0", "Africa/Abidjan"],
["EAT-3", "Africa/Addis Ababa"],
]
}

You say "if both enum and choices would be present, choices would be
used.". I was thinking enum would always have to be present. A UI
can choose to use choices to render the enum values in a more
convenient way for humans. Validators do not have to understand
choices. Its Meta
data
.

I think your saying, given the schema:

{
choices: [
["UTC", "Coordinated Universal Time"],
["GMT0", "Africa/Abidjan"],
["EAT-3", "Africa/Addis Ababa"],
]
}

And json:

"UTC"

A v5 json schema validator should validate the json successfully
because it knows what choices is?

Ruby on rails which uses the same concept of choices but using the
word options, which IMHO is still way better than enum, a word which
comes from technical jargon.

Hmmm, yeah but there is a difference. enum is strictly about data
structure (and so is json schema for the most part). choices caught
up with UI concerns. There is no choices in C. Pretty sure there is
no choices in XSD either - but it does have enum.

And yeah, actaully I think "choices" is the best name for the concept,
but again its not exactly the same as enum.


Reply to this email directly or view it on GitHub:
#211 (comment)

Sent from my Android device with K-9 Mail. Please excuse my brevity.

@sgpinkus
Copy link

sgpinkus commented Feb 8, 2016

@saibotsivad Yeah that is simimlar to the current workaround mentioned in enunNames:

{
    "oneOf": [
        {"enum": ["value1"], "title": "Value #1"},
        {"enum": ["value2"], "title": "Value #2"}
    ]
}

@Ixrec
Copy link

Ixrec commented Feb 8, 2016

Another mostly-lurker here.

  1. My main concern is that however this feature gets implemented (if at all), I would like the "unnamed" enums currently provided by the enum keyword to remain in JSON schema with their current semantics. Since most of my enums are human-readable, the current behavior is already ideal for me. The "backwards compatibility" section of this proposal strongly implies that it would eventually make enum obsolete in favor of choices. Is that the intent?

  2. @saibotsivad Your example schema is similar to one way I might have liked this feature to be implemented. Sadly, I'm pretty sure the exact format you've suggested would be backwards incompatible because enum values can be of any type, including objects. More specifically, if you put your example timezone schema into http://jsonschemalint.com/draft4/, you'll find that the object {"type": "string", "property": "UTC", "description": "Coordinated Universal Time"} validates successfully, as it should under v4 of the spec, which I assume is not at all what you intended.

  3. What is the best way to present an alternative proposal for a feature? Just leave a comment here? Open another issue? Create a new page on the wiki? (going with this comment for now since I assume we don't want to split the discussion)

  4. In brief, my alternative proposal is that since enum itself can only be an array in v4, we could also allow it to be an object.

{
    "enum": ["UTC", "GMT0", "EAT-3"]
}

{
    "enum": {
        "UTC": "Coordinated Universal Time",
        "GMT0": "Africa/Abidjan",
        "EAT-3": "Africa/Addis Ababa",
    }
}

Specifically, I propose that these two schemas be considered valid JSON Schemas with identical semantics. I believe that this (along with many slight variations thereof) achieves the desired goals of having the human-readable names clearly visually associated with their corresponding values, maintaining complete backwards compatibility, and making it easy for code to either ignore or utilize the human-readable values as they wish.

@YurySk
Copy link

YurySk commented Feb 8, 2016

Hi,

I am very new to this discussion, but I would caution against introducing UI oriented things in the schema. We tried to introduce UI annotations in a generic xsd-based model layer and it didn't take at all due to a simple fact that UI had to be localized. I.e. literal labels did not make sense at all. We thought about resource ids but even then that would presume that the client used an id-based localization mechanism.
So it basically didn't go anywhere. At the end it was understood that UI and the underlying data structures should not be mixed.
If an application wants UI annotations then it can be achieved via non-schema attributes in a fashion that suits the application the best.

Hope I understand the issue correctly.

@nemesifier
Copy link
Author

@YurySk well there are quite a few other proposal which your suggestion would be against, like #119.
I'm sorry but I strongly disagree with you. Most modern web frameworks implement a similar solution for their models and it has worked perfectly. You are not presenting any real use case scenario that would make the proposed solution unpractical or counterproductive. Citing abstract dogma-like concepts is not a solid argument.

@nemesifier
Copy link
Author

@Ixrec

  1. Yes I'm implying a deprecation of enum.

  2. agree on the fact that this alternative solution which overrides enum is not backward compatible, we would need a new keyword.

  3. I've created a wiki page: https://github.com/json-schema/json-schema/wiki/choices-(v5-proposal-to-enhance-enum) in which at the bottom I linked to this discussion.

  4. your alternative proposal looks beautiful! But it has a flaw, which is a mistake I commited myself many times in the past:

You cannot and should not rely on the ordering of elements within a JSON object.

From the JSON specification at http://www.json.org/:

"An object is an unordered set of name/value pairs"

As a consequence, JSON libraries are free to rearrange the order of the elements as they see fit. This is not a bug.

What it's currently missing from JSON Schema is an ordered list of values with a relative human readable label.

@mitar
Copy link

mitar commented Feb 8, 2016

I on the other hand do not like that schema would contain user readable strings. Especially because those strings are often useless: they have to be translated, in some languages translations depend on the context (like number, of verb case). So then you end up with a mapping table between those second strings to translations. Why then not simply use raw enum values for translations to begin with?

So, I would claim that if user-friendly strings are needed, use translation engine in your platform and use raw enum values as keys for translation.

@erosb
Copy link

erosb commented Feb 8, 2016

I also discourage introducing the choices keyword. As far as I can understand it provides something that is very close to the semantics of enum , and also the oneOf keyword provides a somewhat similar functionality. Adding choices to these existing solutions would just confuse both schema authors and validator implementation maintainers.

The alternative proposal of @Ixrec is a better idea, but I'm still not sure if it is needed. The json schema spec already provides 2 ways to add metadata to the schema which doesn't directly affect the validation process:

  • additional properties in the schema (already mentioned above)
  • the description keyword is also a similar solution.

@YurySk
Copy link

YurySk commented Feb 8, 2016

@nemesisdesign
Human readable strings often need to be translated, so listing them in a schema has a limited usefulness. Also, capitalization sometimes depends on specific use.

@nemesifier
Copy link
Author

Another argument for considering this part of the validation spec, is that if validation fails, the human readable value can be included in the validation error message.

@YurySk the human readable string in english can be passed to the i18n framework, this is currently how django does it, although not every framework might do it this way, but generally speaking, if the framework is using gettext, the general practice is to write the software in english and provide translations in .po files, see Gettext, O'Reilly Media.

@nicklasb
Copy link

nicklasb commented Apr 3, 2016

Hi!
I am also new here, but first a disclaimer; I am involved in the angular-schema-form/json-schema-form-project but I am just expressing my own opinion here.

For me, JSON schema has always been about definition and validation, and definition and validation to a reasonable degree. This has kept it from immediately growing to the size and complexity of XML schema. That kind of complexity has very few use cases where it is actually necessary.
And having been involved in such projects, I think that even in those cases, that level of complexity is not rational.

So I am not supporting this, for these reasons:

  1. There is a really big opportunity that if kept neat and tidy, JSON schema may be the next big thing, as it can be used for validation and definition in all levels of a system while still being simple and not encouraging complexity. But if JSON schema starts including non-validation aspects like this, I don't really know what should not be included. Why not then have a hint property? Descriptions of the items? L8n? And going the other way, why not have information on storage details?
  2. This functionality is already handled by several UI-form-frameworks (like angular-schema-form).
    It fits nicely there, as it is addressing a human UI issue, not a data issue.
  3. I must agree with @YurySk, mapping an Id to a name is not an implicit job of the validation or the data definition. There are a myriad of ways of doing this, so why locking everybody in one?
  4. The point with enum is not its simplicity, but the fact that it is an enumeration. An enumeration is just an enumeration, it is a clear concept that does not involve any meta structure.

@nemesifier
Copy link
Author

@nicklasb have you actually read the current JSON-Schema validation RFC?

See the following section bold added by me:

6.1. "title" and "description"

[...]

6.1.2. Purpose

Both of these keywords can be used to decorate a user interface with information about the data produced by this user interface. A title will preferrably be short, whereas a description will provide explanation about the purpose of the instance described by this schema.

Both of these keywords MAY be used in root schemas, and in any subschemas.

Same applies for the default property, which is irrelevant from the point of view of the validation, that type of information is used by user interfaces to provide default values.

Exactly for the reason that JSON-Schema is GREAT, we should improve it to make it simpler to achieve this kind of features.

The good thing about choices is that it can provide a human readable value for validation purposes, eg: when there's an error.

@nicklasb
Copy link

nicklasb commented Apr 8, 2016

I do not agree, that one thing can be used in a ui, does not in any way
imply that something else that is strictly for ui should be added.
On 8 Apr 2016 16:02, "Federico Capoano" notifications@github.com wrote:

@nicklasb https://github.com/nicklasb have you actually read the
current JSON-Schema validation RFC?

See the following section
http://json-schema.org/latest/json-schema-validation.html#anchor98 bold
added by me
:

6.1. "title" and "description"

[...]

6.1.2. Purpose

Both of these keywords can be used to decorate a user interface with
information about the data produced by this user interface. A title will
preferrably be short, whereas a description will provide explanation about
the purpose of the instance described by this schema.

Both of these keywords MAY be used in root schemas, and in any subschemas.

Same applies for the default property, which is irrelevant from the point
of view of the validation, that type of information is used by user
interfaces to provide default values.

Exactly for the reason that JSON-Schema is GREAT, we should improve it to
make it simpler to achieve this kind of features.

The good thing about choices is that it can provide a human readable
value for validation purposes.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#211 (comment)

@nemesifier
Copy link
Author

@nicklasb I did not understand your last message, please explain it in a better way. I spent time answering to your comment, I would like you to do the same. Thanks.

@nicklasb
Copy link

nicklasb commented Apr 8, 2016

@nemesisdesign
First, I think that we perhaps could lower the intensity here a bit.

And second, that is my answer.
I thought that it was pretty well put, but perhaps a bit condensed.

What I am saying is that I do not agree with your reasoning.

That the title and description of a field can be used in a user interface, doesn't make it right to add things that are specifically for UI functionality.
Title and description are the minimal amount of information on each entity that still explains the schema. Enums are usually easy to describe in the description.

WRT default values, they are certainly not only of use in a UI, a default value is part of the data definition, just as there are default values of fields in a RDBMS database.

I think that It is really important to define a clear scope for a standard and stay with it.
I don't know of any standard that have survived scope creep.

So far, JSON schema is easy to implement and compact. I am not saying that enum descriptions would kill it. I am just saying that there is no use for it, and that this problem is completely natural to solve in other places than the schema definition.

WRT to the earlier l8n comparison, one should not equate l8n on static UI elements in a site like "menu" and so on which will never change and doesn't have an underlying Id, with Id's in a data definition that might. I would not use the UI l8n functionality to translate those, when they are likely defined in the database. But that is a minor objection.

@YurySk
Copy link

YurySk commented Apr 8, 2016

@ nicklasb, well put. I have been trying to say the same thing, the concern separation needs to be kept crystal clear - JSON schema is (or rather should be) first and foremost an expression of constraints on a data structure. Creeping all sorts of cool features in is a sure way to blur the boundaries, increase an implementation effort and decrease clarity of the purpose. The more of that is done the greater the chance of people abandoning it like many other great things the industry has created over the years.

BTW, it's a mistake to think that even V4 is super-easy to implement. If you want user-friendly tooling it's not easy at all, since the spec allows to create contradictory and unsatisfiable schemas in all sorts of weird and wonderful ways. Trapping all that and issuing warnings adds to an effort significantly. I am definitely not looking forward to implementing any obscure stuff that my users have no use for just to remain spec-compliant.

@nicklasb
Copy link

nicklasb commented Apr 8, 2016

@YurySk

I would certainly agree that it doesn't seem like an easy implementation, but I would say it is far easier than the full extent of, say, XML schema.

If I may digress, the cool thing with JSON schema is exactly its looseness. I am not sure that warnings, however helpful they may be, are in that vein.
Because I don't think that one should not define huge and complex schemas in JSON schema.

For example, I recently (approximately and tentatively) converted the schema.org definitions to json schema, and basically, that is going too far. On the other hand, one could say that schema.org goes too far. :-)

But anyway, when you start defining really complex formats, I'd say you move out of the JSON form realm and into XML schema. And to some extent you are also moving back about 15 years, because I have been involved in those kinds (even that, especially) of implementations, and I think that they would have benefitted from being broken up instead of bundled, it doesn't feel like that is how I would choose to work today.

@YurySk
Copy link

YurySk commented Apr 8, 2016

I think warnings are important for an interactive tool, they just keep your life easier.
Of course, one can go too far with that (perhaps this is my failing), but there are a lot of cases where you actually want them. For instance, it's very easy to define a bunch of "enum" values and later to modify the schema so the none are valid anymore, i.e. your schema will never validate. You probably would like to know that the second you changed your schema :) I find, there's a quite a few cases like that.

I am not sure I like 'go to XML when your stuff gets complex' reasoning. Oftentimes, there's simply no XML support in your environment. Or your tool may be so general-purpose that you don't even control the schemas at all, users contribute them. Or you grow into complexity gradually...

@nicklasb
Copy link

nicklasb commented Apr 9, 2016

Well, there are some that may be useful.

But I'd wager that those warnings takes a fair amount lot of time to implement, perhaps even close to the time taken to implement the actual validation. So perhaps it is not fair to include that time in how long it takes to implement a format, as to how want to warn that depends on your requirements, not the standard itself.

I mean, I could think up a million* such cases for XML schema as well, in addition you have the security vulnerabilities with more complex standards to guide people around.

Sorry, I just realized that am so far off topic now. I should not discuss everything. :-)

* Not literally :-)

@YurySk
Copy link

YurySk commented Apr 9, 2016

Sure, it's not fair to include all the user-friendlines into the equation. It's just sooner or later that needs to be done and depending of the spec it may or may not be easy. In Json schema it's rather on the 'not' side.
Anyway....Considering this, it's already a healthy chunk of work.

@nicklasb
Copy link

nicklasb commented Apr 9, 2016

I bet it is.

@nemesifier
Copy link
Author

I insist that this argument on this point is logically invalid: title, description and default properties have no use in the validation process but they are included in the json-schema validation RFC.
They have been included by the authors because they are really useful properties, even if they do not indicate that something will fail its validation depending on their content.
It's a way to decorate, document and make the schema more readable.
They also would suffer the same problem of translation, but that is not a real issue because it's easily solved with gettext.

Same applies for choices: it's a way to decorate and document the schema that can be used by UI implementations to decorate the UI too, if needed, but its main purpose is to explain to the future programmer coming to read the schema the meaning of the values.

Other UI specific proposals would be better off included in a separate UI-centric specification, but I believe this is not the case, unless we would support a different proposal like enumTitles, which would be a separate keyword that should be included in a separate (new) UI-centric specification.

@mitar
Copy link

mitar commented Apr 9, 2016

What about a compromise? JSON schema allows extra properties to exist in schema objects, so we could simple have "pure validation" and "extended" JSON schema?

@nemesifier
Copy link
Author

@mitar, yes that is possible, there is a JSON-schema implementation that does this; it works but I dislike that solution because it doesn't bring the advantage of decoration and documentation of the schema. I'll try to explain what I mean with the following example (timezone values used by OpenWRT):

{
    "type": "object",
    "properties": {
        "timezone": {
            "type": "string",
            "enum": [
                "AMT4AMST,M10.3.0/0,M2.3.0/0",
                "CST6CDT,M4.1.0,M10.5.0",
                "VET4:30",
                "GFT3",
                "EST5",
                "CST6CDT,M3.2.0,M11.1.0",
                "MST7MDT,M4.1.0,M10.5.0",
                "CST6",
                "AMT4AMST,M10.3.0/0,M2.3.0/0",
                "AST4",
                "GMT0",
                "PST8PDT,M3.2.0,M11.1.0",
                "MST7"
            ]
        }
    }
}

I had to cut the schema for brevity, so I did not include the timezones of the entire globe, but I ask you for a moment to use your immagination to put there the timezones of the entire globe.
Adding an enumTitles or equivalent property would not give us the advantage of decoration and documentation, we wouldn't be able to associate the description of the value with the relevant raw value.

Choices instead would give us this advantage (just like the properties title and description which aren't used for validation but for decoration/documentation purposes, which is not of secondary importance when writing a schema):

{
    "type": "object",
    "properties": {
        "timezone": {
            "type": "string",
            "choices": [
                ["AMT4AMST,M10.3.0/0,M2.3.0/0", "America/Campo Grande"],
                ["CST6CDT,M4.1.0,M10.5.0", "America/Cancun"],
                ["VET4:30", "America/Caracas"],
                ["GFT3", "America/Cayenne"],
                ["EST5", "America/Cayman"],
                ["CST6CDT,M3.2.0,M11.1.0", "America/Chicago"],
                ["MST7MDT,M4.1.0,M10.5.0", "America/Chihuahua"],
                ["CST6", "America/Costa Rica"],
                ["AMT4AMST,M10.3.0/0,M2.3.0/0", "America/Cuiaba"],
                ["AST4", "America/Curacao"],
                ["GMT0", "America/Danmarkshavn"],
                ["PST8PDT,M3.2.0,M11.1.0", "America/Dawson"],
                ["MST7", "America/Dawson Creek"],
            ]
        }
    }
}

But I admit that if we cannot get consensus on something like choices, having a standardized enumTitles or equivalent property would still be better than what we have now.

@mitar
Copy link

mitar commented Apr 9, 2016

OK, you convinced me. :-) It is not a question of just schema being used for UI, but also having schema be descriptive enough to be understandable. Consumers of schemas can be also people.

@nemesifier
Copy link
Author

@mitar wow, that's an achievement :-)

@nicklasb
Copy link

nicklasb commented Apr 9, 2016

Well, I cannot speculate on the original intentions of the authors of the RFC, so I won't.

But I do know that a schema it not only about validation, but also about definition.
That is the purpose of title, description, they provide definition. They basically makes it possible to navigate the schema and helps when you want to make a user interface to edit the schema(like @YurySk is up to). But that is as much metadata that there should be IMO, there is no use for more. Enums are usually pretty self-explanatory or part of some standard, and lookup tables are stored elsewhere. Knowing what they mean is therefore a non-issue.

default is important completely regardless of a UI, because knowing the default value of something is. In databases it may be an identity, a uuid or just "sofa". Or whatever default you want a record to have. Defaults are important in data.

The same does not apply to choices. It is purely for human UI consumption and unimportant for managing the data. (and are not seldomly huge lists, what are the facilities to handle really long lists?)

@mitar
Yes, an extended version would be an alternative, however, I do not think it is that a good one, as:

  1. It should preferably be ordered, not unordered; You want to have an inherent order in a UI-definition unless you want to add extra "weight" properties and similar which while working for say Drupal or Joomla, where you always makes the UI design in a UI, you don't always do with a form format, so you'd rather want to express that as an array to be able to predict the layout.
  2. UI:s are usually is one of many "views" of data, it is not a 1 to 1 relation.
  3. It would have a huge number of settings that because they too are unordered would mix with the schema settings. It would be practically impossible to make any sense of the schemas outside a UI.

In the json-schema-form project we are currently trying to create a form standard to be used by the different implementations (angular-schema-form, react-schema-form and future *-schema-form, some will actually be desktop implementations), and when you do that, the importance of clearly separating data and UI definitions becomes evident. With good, clean separation, a schema can be reused all over an organization(or even a sector) and UI forms becomes optional. Without it, well then it won't work.

@nicklasb
Copy link

nicklasb commented Apr 9, 2016

And then there are the practical reasons:

Defining the schema and the UI is usually two completely different projects and often involving different people. You want to be able to freely change the UI without having to change the data definition.

It would drive people crazy to have schema version increment constantly without there being any actual difference, not to mention that the commit history of the schema would be filled with UI-changes. And if you want to make breaking changes to the UI, definitions, should you make a new major version of the schema too?

They are two different things, and for many reasons.

@nicklasb
Copy link

nicklasb commented Apr 9, 2016

@mitar That is not my point, this happens regardless of versioning scheme.
UI changes would drive schema versions so those depending on the schemas but not the UI would have to wade through unrelated schema changes.

It is a plain and clear separation of concern.

(with regards to choices, they are actually often not even defined in the UI-definition but fetched from the backend, so that is how far from the schema they end up)

@YurySk
Copy link

YurySk commented Apr 9, 2016

In regards to default. I don't think it's so much a UI thing. It's just an indication of a value that can be used to fill a missing piece of info. That can just easily happen in batch mode when data is being processed.

Is there a particular reason enum labels cannot be expressed as a non-schema attribute of type array at an application level. Why standardtize it? Not every application will even use a construct like that. I'm fact, I bet that most won't. The task of standardizing UI definitions is a separate one. After all, why not standardize everything. For example I could probably make a convincing case that we need to out data storage attributes in as well. For example, a table name and field where an element to be stored by a generic DBMS storage facility.
In fact, we created such a facility using XSD and tooling that is driven off an annotated schema. It works really nicely.
Another example is binary storage type. For instance we could want to say that a particular number has a storage length of 2 4 or 8 bytes and it is Big Endian. I can see how that can be useful in some applications.

My point is that all of that can be already successfully achieved in JSON Schema as well as XSD. All that needs to be said at the spec level is that a non-schema information in a schema is OK. Then a number of extending specs can be created as public standards or application level conventions.

@YurySk
Copy link

YurySk commented Apr 9, 2016

Having said all this, I cannot disagree with the issue of documenting enum values. This issue is made worse by lack of comments in JSON.
However, I would do it differently. I would add a flavor of "enum" which is an object with "value" and "description" properties, "value" being required and "description" optional. Obviously "description" is a string and "value" is {}, I.e. anything. A similar overloading approach is used in "dependencies" for example.
V4 schemas use an array flavor and new schemas may use the object flavor. No competing keyword is introduced and, in my mind, it is clearer that implying a meaning of an array element by its position.

EDIT: obviously this wouldnt work because we would need an array of such objects and this would make it indistinguishable from the current flavor.

@YurySk
Copy link

YurySk commented Apr 9, 2016

A question. Say 'choices' are added. I presume enum is still supported. How would we reconcile it if both are present?

@nicklasb
Copy link

nicklasb commented Apr 9, 2016

@YurySk

I no issue at all with not documenting enum values on each value. A description is sufficient. If not, there is something wrong.

In normal coding, the typical enum notation is ["BEN", "SHA", "OUR"], and that is in almost all languages. Documentation is just a some descriptive text, equalling a comment in the code. I have never once in my 20 years of coding had any problem describing enums in such cases.

This because either:

  • Enum values are either old and weird sector specific stuff, like for example BEN, SHA, OUR. Because who the heck uses those kinds of abbreviations today and gets away with it.
    Perhaps 3-5 items that a simple description link to a documentation will show what they are. Actually, in that cases the customer knew what they meant and didn't want them to be described, it was faster to find BEN that "The beneficiary pays all charges".
  • Or they are not, and then their names are usually readable enough to be obvious..
  • Or there are a whole lot of them, in which case they should not be in the schema at all, that would be abusing the concept. Instead they should then be stored outside the schema.
    JSON schema isn't supposed to guarantee referential integrity, that is for other mechanisms.

For example country codes. Are they good in an enum? No. Because in any non-trivial system design, country codes are used all over and have their own storage location, be it a table or top node or whatever.

Either way, there is no point in putting extra features into the schema to provide that information as:

  1. To put it there would be horrible system design.
  2. There is it no real use case value beyond trivial case convenience.

No UI related elements in JSON Schema. Please.

Look! You are making me saying "please", now. :-)

@YurySk
Copy link

YurySk commented Apr 9, 2016

@nicklasb

I tried to make a distinction between documentation purposes and UI one. I don't like the idea of standardizing UI bits.
It's really a shame JSON has no comments.
On the other hand "enum" and "description" are both schema-wide so I suppose all the text can go there. Not as pretty as documenting each value in-place, but altering metadata just for that seems rather silly now that I had some time to think about it.

@mitar
Copy link

mitar commented Apr 9, 2016

However, I would do it differently. I would add a flavor of "enum" which is an object with "value" and "description" properties, "value" being required and "description" optional.

That is an interesting idea. So yea, does enums mean that all values are of the same type? Do choices mean the same? We could really go in the way that if values of enum schema definition are just strings, then you can add them as they are, but otherwise you can have subschema which defines not just the value, but also type, and description.

@nicklasb
Copy link

nicklasb commented Apr 9, 2016

Actually, I have started commenting my JSON schemas lately. As I realized all the parsers I used accepted it.

@YurySk
Copy link

YurySk commented Apr 9, 2016

How?

@nicklasb
Copy link

nicklasb commented Apr 9, 2016

{
// Comment
"this": "value"
}

First I just did it temporarily, but know I am thinking of not removing it.

@YurySk
Copy link

YurySk commented Apr 9, 2016

This is useful, thanks!
I presume this makes the whole line a comment.

@nicklasb
Copy link

nicklasb commented Apr 9, 2016

Actually I was partially wrong there, JSmin was used before the parser, and I didn't see that, sorry.
https://pypi.python.org/pypi/jsmin

But it is kind of useful.

(it makes the schemas non-portable....but on the other hand I cache them in the system, so they aren't visible there anyway)

@nicklasb
Copy link

nicklasb commented Apr 9, 2016

Apparently Crockford himself is ok with it:
https://plus.google.com/+DouglasCrockfordEsq/posts/RK8qyGVaGSr

(and he is developing a JSmin: http://crockford.com/javascript/jsmin
Slight WTF there. :-) )

@themihai
Copy link

themihai commented Jun 25, 2016

I'm currently using a custom[0] implementation of JSON Schema similar with @saibotsivad 's proposal. Separate fields for the enum description/meta data seem very odd and not orthogonal with the rest of the schema specification. You don't want the "description" and "title" fields in a separate object than the property object itself so why would you want that for enum properties? I would propose to make enum an array of objects (EnumSchema) . This way it becomes future-proof for additional features/fields(.e.g. internationalisation, additional UI meta etc).

[0]
timezone: {
enum: [{
type: "string",
format: "ISO8601",
value: "UTC_ISO",
description: "Coordinated Universal Time formatted using ISO8601"
},{
type: "string",
format: "date-time",
value: "UTC_Date",
description: "UTC time using rfc3339 format (e.g. 1990-12-31T23:59:60Z) "
}]
}

@tad-lispy
Copy link

Hello. Very interesting thread.

We are using JSON-schema to generate complex UIs. So far we have been using custom labels field next to enum, but while reading this thread I've realised that oneOf together with constant may work better:

{ 
  "oneOf": [
    { 
      "title": "One",
      "description": "One is the magic number.",
      "constant": 1 
    },
    { "title": "Two",
      "description": "It takes two hands to clap.",
      "constant": 2
    },
    { 
      "title": "Three",
      "description": "Three is a company.",
      "constant": 3
    }
  ]
}

It's basically the same as @themihai's solution, but within the spec. So I would vote against this proposal - simple cases are covered by enum, and complex by oneOf* + constant.

* or anyOf if we need different labels for the same value.

@handrews
Copy link

This discussion should be moved to the new (active) JSON Schema specification repository:
json-schema-org/json-schema-spec#57

While I tried to capture the basics in the new issue, I am sure I left some significant points from this long discussion out. Feel free to copy them over.

@handrews
Copy link

@nemesisdesign Could you please close this since we have an active discussion in the new repository?
There is no one left with the project who has permissions to close other people's issues anymore so we can't close it ourselves.

@Julian Julian closed this as completed Dec 29, 2016
@nemesifier
Copy link
Author

@handrews sorry, I was so slow to respond. I'll try to participate in the discussion again.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests