Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v6 validation: "contains" #63

Closed
handrews opened this issue Sep 19, 2016 · 19 comments
Closed

v6 validation: "contains" #63

handrews opened this issue Sep 19, 2016 · 19 comments

Comments

@handrews
Copy link
Contributor

Originally written by @geraintluff at https://github.com/json-schema/json-schema/wiki/contains-(v5-proposal)

Proposed keywords

  • contains

We also might want an equivalent for objects (like containsProperty).

Purpose

Specifying that an array must contain at least one matching item is awkward. It can currently be done, but only using some inside-out syntax:

{
    "type": "array",
    "not": {
        "items": {
            "not": {... whatever ...}
        }
    }
}

This would replace it with the much neater:

{
    "type": "array",
    "contains": {... whatever ...}
}

It would also enable us to specify multiple schemas that must be matched by distinct items (which is currently not supported).

Values

The value of contains would be either a schema, or an array of schemas.

Validation

If the value of contains is a schema, then validation would only succeed if at least one of the items in the array matches the provided sub-schema.

If the value of contains is an array, then validation would only succeed if it is possible to map each sub-schema in contains to a distinct array item matching that sub-schema. Two sub-schemas in contains cannot be mapped to the same array index.

Example

Plain schema

{
    "type": "array",
    "contains": {
        "type": "string"
    }
}

Valid: ["foo"], [5, null, "foo"]
Invalid: [], [5, null]

Array of schemas

{
    "type": "array",
    "items": {"type": "object"},
    "contains": [
        {"required": ["propA"]},
        {"required": ["propB"]}
    ]
}

Valid:

  • [{"propA": true}, {"propB": true}]
  • [{"propA": true}, {"propA": true, "propB": true}]

Invalid:

  • []
  • [{"propA": true}] - no match for second entry
  • [{"propA": true, "propB": true}] - entries in contains must describe different items

Concerns

Implementation

The plain-schema case is simple.

The array case is equivalent to Hall's Marriage Theorem. There are relatively efficient solutions for the general problem - but, I suspect a brute-force search will be surprisingly effective and efficient (due to the relatively small number of entries in contains).

It may or may not be worth warning schema authors about stuffing hundreds of entries into contains, because a naive implementation could easily end up having O(n3m) complexity.

Complexity of understanding (for humans)

Behaviour for the array for may be slightly complicated. For example:

{
    "type": "array",
    "contains": [
        {"enum": ["A", "B"]},
        {"enum": ["A", "B", "C"]},
        {"enum": ["A", "D"]},
    ]
}

In this case, ["A", "B", "C"] is valid.

However, this is not due to the syntax - it's simply a complex constraint.

@epoberezkin
Copy link
Member

I think array syntax should not be part of the standard. What is the real life use case for it (that cannot be replaced with two contains)? The use case where we want different items match different schemas seems theoretic and can almost always be achieved by using schemas that won't match the same data, such as {type: 'string'} and {type: 'number'} for example.

@awwright
Copy link
Member

I think the point with the array is two items from "contains" can't match the same item in the array, whereas with "allOf", it could.

But I'm not sure who in the world even needs this feature.

@Relequestual
Copy link
Member

I would hold off till someone can come up with a concreet useful real life example. I'd be tempted to start tagging issues with "requires real life use case"... =]

@epoberezkin
Copy link
Member

@awwright I understand the idea, the concern is exactly as @Relequestual writes - it is theoretic. Even when in cases when you want 2 items you can achieve it by using exclusive schemas (as in my simple example) in allOf rather than allowing for generic requirement where any implementation would have a high performance cost.

@handrews
Copy link
Contributor Author

Does anyone want to advocate for even the one-schema version of this feature? If not I will close it based on @awwright 's "if you don't want to champion it, you shouldn't have migrated it" principle.

@epoberezkin
Copy link
Member

@handrews I think that single item version can be useful for heterogenous arrays, that are useful in some cases (e.g. UI description should be an array).

Although in many cases you can get away with positional items (where you can use items keyword with array value), there are cases where you would not be able to, especially where you would use validation not for simply valid/invalid use case but as a predicate for some other decisions (which seems to be quite common by the way).

So I'd like to see this feature in the standard. Not with multiple schema values though.

@handrews
Copy link
Contributor Author

@epoberezkin that works for me. I can see the use for the single-item version.

The only thing that the multiple-item version does that you couldn't do with an "allOf" is ensure that each item matches a distinct element of the array, and I find that condition a little weird to begin with.

handrews added a commit to handrews/json-schema-spec that referenced this issue Oct 28, 2016
This addresses the enhancement requested in issues json-schema-org#32 and json-schema-org#63.
Only the single-schema form from json-schema-org#63 is added here as the multi-schema form
did not gather significant support in the absence of a clear use case.
@handrews
Copy link
Contributor Author

handrews commented Nov 7, 2016

OK, single 'contains' is in! The consensus was arrays should not be added. If we come up with a real use case for it, we can file that again on its own.

@cmrd-senya
Copy link

Hello. I'm developing a JSON schema for a following object

{
"data": [
{"guid1": "..."},
{"guid2": "..."},
{"text": "dfsdfsdf"}
],
"signature": "ASFddf...sdsa=="
}

So, data is an array. I can't use a plain object with just guid1, guid2 and text as its properties instead, becase for me the order of the properties is important (see below for example). Each of the sub-objects in the array has only one property.

I want to allow extra elements for the array, so I can't set contraints on "type". But I want to require this array to contain all of "guid1", "guid2" and "text",

This is a basic schema for this.

{
  "type": "object",
  "properties": {
    "data": {
      "type": "array",
      "uniqueItems": true,
      "items": {
        "type": "object"
      },
      "contains": ...something...
    },
    "signature": {"type": "string"}
  },
  "required": ["data", "signature]
}

So I'd like this to be valid:

{
"data": [
{"guid1": "..."},
{"guid2": "..."},
{"text": "dfsdfsdf"}
],
"signature": "ASFddf...sdsa=="
}

This is also valid:

{
"data": [
{"guid1": "..."},
{"guid2": "..."},
{"text": "dfsdfsdf"},
{"newAndUnexpectedProperty": "a value"}
],
"signature": "ASFddf...sdsa=="
}

And these are invalid, because some of required properties are missing:

{
"data": [
{"guid1": "..."},
{"text": "dfsdfsdf"},
{"newAndUnexpectedProperty": "a value"}
],
"signature": "ASFddf...sdsa=="
}
{
"data": [
{"guid2": "..."},
{"text": "dfsdfsdf"},
{"newAndUnexpectedProperty": "a value"}
],
"signature": "ASFddf...sdsa=="
}
{
"data": [
{"newAndUnexpectedProperty": "a value"}
],
"signature": "ASFddf...sdsa=="
}
{
"data": [
{"guid2": "..."},
{"guid1": "dfsdfsdf"}
{"newAndUnexpectedProperty": "a value"}
],
"signature": "ASFddf...sdsa=="
}

This schema should also pass a validation, but the difference with the first one is the the "data" objects are not equal to each other:

{
"data": [
{"text": "dfsdfsdf"},
{"guid2": "..."},
{"guid1": "..."},
],
"signature": "ASFddf...sdsa=="
}

and that is the important reason I can't just use an plain object with properties like this:

{
"data": {
"text": "dfsdfsdf",
"guid2": "...",
"guid1": "..."
},
"signature": "ASFddf...sdsa=="
}

Isn't it a use case for a multiple-item version of contains? I can't use an object-version of contains, because verifying "at least one" element doesn't fit me. I want to verify that each "guid1", "guid2" and "text" are in place. I want to allow additional objects. And I want ordering, because this data is going to be signed and the verified against a signature, so it's important that I have the same order of the values when I serialize and deserialize the object.

@handrews
Copy link
Contributor Author

handrews commented Dec 15, 2016

You don't actually need contains for this at all. You want the tuple form of array+items:

{
    "type": "object",
    "properties": {
        "data": {
            "type": "array",
            "items": [
                {
                    "type": "object",
                    "properties": {"guid1": {...}},
                    "required": ["guid1"]
                },
                {
                    "type": "object",
                    "properties": {"guid2": {...}},
                    "required": ["guid2"]
                },
                {
                    "type": "object",
                    "properties": {"text": {...}},
                    "required": ["text"]
                }
            ],
            "additionalItems": true,
            "uniqueItems": true
        },
        "signature": {"type": "string"},
    },
    "required": ["data", "signature"]
}

@cmrd-senya
Copy link

Ah, alright, thanks! Gonna try that!

@cmrd-senya
Copy link

cmrd-senya commented Dec 15, 2016

the tuple form of array+items:

Is the form described somewhere in the documentation and what is the minimal JSON schema version supports it?

@cmrd-senya
Copy link

I tried your schema with this validator, but it doesn't fit me with one case: when I change the order of the elements:

{
  "data": [
    {
      "guid1": "..."
    },
    {
      "text": "dfsdfsdf"
    },
    {
      "guid2": "..."
    }
  ],
"signature": "..."
}

It doesn't valdate any order except of the one defined in schema, while what I want is to validate any order, but still keep the order information.

@handrews
Copy link
Contributor Author

Oh, I see. I missed that part. You can do this with contains, then, but you still only need the single value version:

{
    "type": "object",
    "properties": {
        "data": {
            "type": "array",
            "allOf": [
                {
                    "contains": {
                        "type": "object",
                        "properties": {"guid1": {...}},
                        "required": ["guid1"]
                    }
                },
                {
                    "contains": {
                        "type": "object",
                        "properties": {"guid2": {...}},
                        "required": ["guid2"]
                    }
                },
                {
                    "contains": {
                        "type": "object",
                        "properties": {"text": {...}},
                        "required": ["text"]
                    }
                }
            ],
            "uniqueItems": true
        },
        "signature": {"type": "string"}
    },
    "required": ["data", "signature"]
}

BTW the array form of "items" (which I tend to think of as defining a tuple rather than a list) has existed since at least draft 4. "contains" will be in draft 6.

@cmrd-senya
Copy link

Thanks! I guess this could work.

@handrews
Copy link
Contributor Author

Thanks! I guess this could work.

Cool. The only difference between this and the array form proposed originally (aside from the array form being a bit more concise) is that technically these different contains could all be met by a single element in the array. I don't think that's a huge concern here- it's only if someone comes up with a compelling use case for that that we'd need to reconsider the array form of contains.

@epoberezkin
Copy link
Member

epoberezkin commented Dec 20, 2016

technically these different contains could all be met by a single element in the array

For most practical use cases, like the one above, the schemas for items will be mutually exclusive.

EDIT: actually not the one above, but it can be made so :)

@ericbrown30
Copy link

Oh, I see. I missed that part. You can do this with contains, then, but you still only need the single value version:

{
    "type": "object",
    "properties": {
        "data": {
            "type": "array",
            "allOf": [
                {
                    "contains": {
                        "type": "object",
                        "properties": {"guid1": {...}},
                        "required": ["guid1"]
                    }
                },
                {
                    "contains": {
                        "type": "object",
                        "properties": {"guid2": {...}},
                        "required": ["guid2"]
                    }
                },
                {
                    "contains": {
                        "type": "object",
                        "properties": {"text": {...}},
                        "required": ["text"]
                    }
                }
            ],
            "uniqueItems": true
        },
        "signature": {"type": "string"}
    },
    "required": ["data", "signature"]
}

BTW the array form of "items" (which I tend to think of as defining a tuple rather than a list) has existed since at least draft 4. "contains" will be in draft 6.

Hey! that's a great solution but could you please let me know how can I validate length of string elements? Here, we have used contains thus if anyone item has valid length then json would be passed. What should I do if I have to validate length of each and every string element in array.

@karenetheridge
Copy link
Member

@asaqib27 { "type": "array", "items": { "additionalProperties": { "maxLength": 10 } } }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants