$data #51

Open
handrews opened this Issue Sep 16, 2016 · 18 comments

Comments

Projects
None yet
7 participants
@handrews
Member

handrews commented Sep 16, 2016

Originally written by @geraintluff at https://github.com/json-schema/json-schema/wiki/%24data-(v5-proposal)

NOTE: JSON Relative Pointer is defined as an extension of JSON Pointer, which means that an absolute JSON pointer is legal anywhere that a relative pointer is mentioned (but not vice versa).

Absolute JSON Pointers always begin with /, while relative JSON pointers always begin with a digit. Resolving a pointer beginning with / behaves the same whether it is being resolved "relative" to a specific location or not, just as resolving a URI "/foo/bar" is resolved the same whether there is an existing path component to the URI or not.

Proposed keywords

  • $data

This keyword would be available:

  • inside any schema
  • contained in an object ({"$data": ...}) for the following schema properties:
    • minimum/maximum
    • exclusiveMinimum/exclusiveMaximum
    • minItems/maxItems,
    • enum
    • more...
  • contained in an object ({"$data": ...}) for the following LDO properties:
    • href
    • rel
    • title
    • mediaType
    • more...

Purpose

This keyword would allow schemas to use values from the data, specified using Relative JSON Pointers.

This allows more complex behaviour, including interaction between different parts of the data.

When used inside LDOs, this allows extraction of many more link attributes/parameters from the data.

Values

Wherever it is used, the value of $data is a Relative JSON Pointer.

Behaviour

If the $data keyword is defined in a schema, then before any further processing of the schema:

  • The value of $data is interpreted as a Relative JSON Pointer.
  • The pointer is resolved relative to the current instance being validated/processed/etc.
  • The resolved value is taken to be the value of the schema for all further processing.

When used in one of the permitted schema/LDO properties, then before any further processing of the schema/LDO:

  • The value of $data is interpreted as Relative JSON Pointer.
  • The pointer is resolved relative to the current instance being validated/processed/etc.
  • The resolved value is substituted as the property value.

Example

{
    "type": "object",
    "properties": {
        "smaller": {"type": "number"},
        "larger": {
            "type": "number",
            "minimum": {"$data": "1/smaller"},
            "exclusiveMinimum": true
        }
    },
    "required": ["larger", "smaller"]
}

In the above example, the "larger" property must be strictly greater than the "smaller" property.

Concerns

Theoretical purity

Currently, validation is "context-free", meaning that one part of the data has minimal effect on the validation of another part. This has an effect on things like referencing sub-schemas. Changing this is a big issue, and should not be done lightly.

Some interplay of different parts of the data can currently be specified using oneOf (and the proposed switch) - but crucially, these constraints are specified in the schema for a common parent node, meaning that sub-schema referencing is still simple.

The use of $data also (in some cases) limits the amount of static analysis that can be done on schemas, because their behaviour becomes much more data-dependent. However, the expressive power it opens up is quite substantial.

Not available for all keywords

It's also tempting to allow its use for all schema keywords - however, not only is that a bad idea for keywords such as properties/id, but it also might present an obstacle to anybody extending the standard.

Not available inside enum values

It should be noted that while {"enum": {"$data":...}} would extract a list of possible values from the data, {"enum": [{"$data":...}]} would not - it would in fact specify that there is only one valid value: {"$data":...}.

Similar concerns would exist with an extra keyword like constant - what if you want the constant value to be a literal {"$data":...}? However, perhaps constant could be given this data-templating ability, and if you want a literal {"$data":...}, then you can still use enum.

Describing using the meta-schema

The existing mechanics of $ref can be nicely described using a rel="full" link relation.

The mechanics of $data, however, would be impossible to even approach in the meta-schema. We could describe the syntax, but nothing more. Is this a problem?

@handrews

This comment has been minimized.

Show comment
Hide comment
@handrews

handrews Sep 17, 2016

Member

I feel that there are several use cases here, and it might be best to split them up.

URI Template resolution

For hypermedia interactions, where the instance data must be referenced in order for hyperlinking to work at all, the extended templating syntax from issue #52 covers the necessary use cases. It narrowly targets hyperlinking and does not involve instance data in any other aspect of JSON Schema, as it applies only in situation where we were already referencing instance data.

Link title

I can see how a link title might reference data in the same way as the URI Template. The template may include the id of a related thing, while the title may include the related thing's name. Either way, the URI and the title are both things that are presented back to the user, which should be affected by the instance data as they are describing a relation involving that data.

Therefore I would prefer to see the URI Template extended syntax (with "vars") be used here rather than a more generic approach that applies to more than just hypermedia values.

rel and mediaType

The use cases for mediaType and rel are not immediately obvious to me. The relation type should not, in my mind, change based on the instance data. Only the specific instance to which the relation points should change. A mediaType specified at runtime from instance data would not be of use in planning what a program can do with different representations. It's not clear to me why you wouldn't just list out the possibilities. If some media type links may only be present some of the time, there are other ways to express that using "oneOf" (or possibly "switch") to associate links with only certain variations of the content.

Interactions during validation

I feel like this should somehow express the constraints in terms of the relations among the fields rather than pulling in data that will produce the desired result. So somehow explicitly saying that "larger" should be strictly greater than "smaller" without loading the data into the schema before validation.

I'm not 100% sure what that would look like. In this case, possibly very much like the $data example, as either way you need to reference the related field. But I would be more comfortable with something that clearly states "this is describing relationships among data" rather than "this loads a value from instance data, and treats it as part of the schema, whatever that happens to mean."

I don't feel like I'm articulating this well, but I'm going to go ahead and post this comment in the interest of provoking discussion :-)

Member

handrews commented Sep 17, 2016

I feel that there are several use cases here, and it might be best to split them up.

URI Template resolution

For hypermedia interactions, where the instance data must be referenced in order for hyperlinking to work at all, the extended templating syntax from issue #52 covers the necessary use cases. It narrowly targets hyperlinking and does not involve instance data in any other aspect of JSON Schema, as it applies only in situation where we were already referencing instance data.

Link title

I can see how a link title might reference data in the same way as the URI Template. The template may include the id of a related thing, while the title may include the related thing's name. Either way, the URI and the title are both things that are presented back to the user, which should be affected by the instance data as they are describing a relation involving that data.

Therefore I would prefer to see the URI Template extended syntax (with "vars") be used here rather than a more generic approach that applies to more than just hypermedia values.

rel and mediaType

The use cases for mediaType and rel are not immediately obvious to me. The relation type should not, in my mind, change based on the instance data. Only the specific instance to which the relation points should change. A mediaType specified at runtime from instance data would not be of use in planning what a program can do with different representations. It's not clear to me why you wouldn't just list out the possibilities. If some media type links may only be present some of the time, there are other ways to express that using "oneOf" (or possibly "switch") to associate links with only certain variations of the content.

Interactions during validation

I feel like this should somehow express the constraints in terms of the relations among the fields rather than pulling in data that will produce the desired result. So somehow explicitly saying that "larger" should be strictly greater than "smaller" without loading the data into the schema before validation.

I'm not 100% sure what that would look like. In this case, possibly very much like the $data example, as either way you need to reference the related field. But I would be more comfortable with something that clearly states "this is describing relationships among data" rather than "this loads a value from instance data, and treats it as part of the schema, whatever that happens to mean."

I don't feel like I'm articulating this well, but I'm going to go ahead and post this comment in the interest of provoking discussion :-)

@HotelDon

This comment has been minimized.

Show comment
Hide comment
@HotelDon

HotelDon Sep 19, 2016

@handrews
So, for me, my use cases sit entirely in the interactions during validation category, so that's what I'm going to speak to.

I think I have a basic understanding of what you're trying to say - you'd prefer a solution where JSON Schema validators use $data (or some other similar feature) as a pointer whose value is checked during validation, instead of recompiling the schema before validation even begins, where it inserts the value of those pointers directly into the schema.

Would it be possible to modify the proposal to remove the portions about modifying the schema directly, and include language elsewhere in v6+ that validation shouldn't modify schemas for any reason? Then, $data would be functionally identical to the way it works now, but doesn't encourage "bad behavior" among the various JSON Schema validators.

HotelDon commented Sep 19, 2016

@handrews
So, for me, my use cases sit entirely in the interactions during validation category, so that's what I'm going to speak to.

I think I have a basic understanding of what you're trying to say - you'd prefer a solution where JSON Schema validators use $data (or some other similar feature) as a pointer whose value is checked during validation, instead of recompiling the schema before validation even begins, where it inserts the value of those pointers directly into the schema.

Would it be possible to modify the proposal to remove the portions about modifying the schema directly, and include language elsewhere in v6+ that validation shouldn't modify schemas for any reason? Then, $data would be functionally identical to the way it works now, but doesn't encourage "bad behavior" among the various JSON Schema validators.

@handrews

This comment has been minimized.

Show comment
Hide comment
@handrews

handrews Sep 19, 2016

Member

@HotelDon this is what I mean by "I'm not articulating this well"

It's not so much the reading/loading of the data (which likely has to be done lazily because of $refs), it's what you can do with it after it is loaded. Although thinking about this more I may be OK with it.

The way the proposal is written, with a list of allowed properties that trails off with "more..." left me very concerned about the scope. However what's not explicitly called out, but I think would be better than listing the fields, is that all of the fields proposed for $data take a literal value, and not a schema. We really need to make sure that data is never interpreted as a schema- that's a security nightmare- just use it to shove in links to all of your favorite malware sites!

But I think the intent here is that $data is only used to load data in place of a literal value. I can get behind that.

I still think it is valuable to separate the hypermedia template resolution cases out and use vars as specified in issue #52 for that. Since the values in vars are already assumed to be pointers into the instances, so requiring them to be little{"$data": "/pointer"} objects instead of just pointers is overkill.

Member

handrews commented Sep 19, 2016

@HotelDon this is what I mean by "I'm not articulating this well"

It's not so much the reading/loading of the data (which likely has to be done lazily because of $refs), it's what you can do with it after it is loaded. Although thinking about this more I may be OK with it.

The way the proposal is written, with a list of allowed properties that trails off with "more..." left me very concerned about the scope. However what's not explicitly called out, but I think would be better than listing the fields, is that all of the fields proposed for $data take a literal value, and not a schema. We really need to make sure that data is never interpreted as a schema- that's a security nightmare- just use it to shove in links to all of your favorite malware sites!

But I think the intent here is that $data is only used to load data in place of a literal value. I can get behind that.

I still think it is valuable to separate the hypermedia template resolution cases out and use vars as specified in issue #52 for that. Since the values in vars are already assumed to be pointers into the instances, so requiring them to be little{"$data": "/pointer"} objects instead of just pointers is overkill.

@HotelDon

This comment has been minimized.

Show comment
Hide comment
@HotelDon

HotelDon Sep 19, 2016

I had never considered the possibility of someone trying to load schema's with $data, so I guess that is what got me confused.

So maybe fix it to sound more like this:

This keyword would be available:

  • inside any schema
  • contained in an object ({"$data": ...}) for most schema properties that accept literal values. For example:
    • minimum/maximum
    • minItems/maxItems
    • pattern
    • enum
    • etc...

I'm still having a hard time wrapping my head around the hypermedia/LDO portion of this proposal, so I don't have much of an opinion on it. It might be helpful if @geraintluff chimed in to defend his original proposal a bit, assuming he still has any interest in doing so.

I had never considered the possibility of someone trying to load schema's with $data, so I guess that is what got me confused.

So maybe fix it to sound more like this:

This keyword would be available:

  • inside any schema
  • contained in an object ({"$data": ...}) for most schema properties that accept literal values. For example:
    • minimum/maximum
    • minItems/maxItems
    • pattern
    • enum
    • etc...

I'm still having a hard time wrapping my head around the hypermedia/LDO portion of this proposal, so I don't have much of an opinion on it. It might be helpful if @geraintluff chimed in to defend his original proposal a bit, assuming he still has any interest in doing so.

@epoberezkin

This comment has been minimized.

Show comment
Hide comment
@epoberezkin

epoberezkin Oct 28, 2016

Member

My 2¢: people seem to use it a lot with Ajv, judging by the questions. So it must be useful.

I think relative JSON pointer should be extended to allow navigating array items (see #115)

Member

epoberezkin commented Oct 28, 2016

My 2¢: people seem to use it a lot with Ajv, judging by the questions. So it must be useful.

I think relative JSON pointer should be extended to allow navigating array items (see #115)

@handrews

This comment has been minimized.

Show comment
Hide comment
@handrews

handrews Oct 31, 2016

Member

I've become more receptive of this proposal while working with some of the more difficult hyper-schema problems such as discussed in #108

Member

handrews commented Oct 31, 2016

I've become more receptive of this proposal while working with some of the more difficult hyper-schema problems such as discussed in #108

@handrews handrews changed the title from v6 validation and hyper-schema: $data to validation and hyper-schema: $data Nov 24, 2016

@awwright

This comment has been minimized.

Show comment
Hide comment
@awwright

awwright Dec 3, 2016

Member

I'm solidly of the opinion that checking data consistency is solidly out of the scope of JSON Schema. Although it's certainly an option for validators that do want to offer the feature.

And if it's a popular feature then... maybe it's something we have to look into, perhaps as a separate document though.

Member

awwright commented Dec 3, 2016

I'm solidly of the opinion that checking data consistency is solidly out of the scope of JSON Schema. Although it's certainly an option for validators that do want to offer the feature.

And if it's a popular feature then... maybe it's something we have to look into, perhaps as a separate document though.

@handrews

This comment has been minimized.

Show comment
Hide comment
@handrews

handrews Dec 3, 2016

Member

@awwright $data has important uses in hyper-schema whether it is available in general validation or not.

Member

handrews commented Dec 3, 2016

@awwright $data has important uses in hyper-schema whether it is available in general validation or not.

@Relequestual

This comment has been minimized.

Show comment
Hide comment
@Relequestual

Relequestual Jan 5, 2017

Member

I can see this could be useful. A few clear usecases might be helpful if anyone has the time or inclination.

Member

Relequestual commented Jan 5, 2017

I can see this could be useful. A few clear usecases might be helpful if anyone has the time or inclination.

@handrews

This comment has been minimized.

Show comment
Hide comment
@handrews

handrews Jan 5, 2017

Member

@Relequestual I'd like to see whether PR #179 is accepted or not before digging into use cases here. If it is accepted, that will clarify how to present the future use cases. If it is not, I'll need to come up with a different approach anyway.

Member

handrews commented Jan 5, 2017

@Relequestual I'd like to see whether PR #179 is accepted or not before digging into use cases here. If it is accepted, that will clarify how to present the future use cases. If it is not, I'll need to come up with a different approach anyway.

@timgdavies timgdavies referenced this issue in open-contracting/standard Jan 19, 2017

Open

Articulating cross-references in the schema #414

@handrews handrews added this to the draft-07 (wright-*-02) milestone May 16, 2017

@mrkvon mrkvon referenced this issue in ditup/ditapi Aug 3, 2017

Merged

Refactoring validation to json-schema #14

@handrews

This comment has been minimized.

Show comment
Hide comment
@handrews

handrews Aug 30, 2017

Member

I'm moving this out of draft-07/wright-*-02. It is a huge topic that has seen no progress and almost no real discussion in the past year. And there is no clear advocate with time available to move it forward.

Member

handrews commented Aug 30, 2017

I'm moving this out of draft-07/wright-*-02. It is a huge topic that has seen no progress and almost no real discussion in the past year. And there is no clear advocate with time available to move it forward.

@handrews handrews modified the milestones: draft-future, draft-07 (wright-*-02) Aug 30, 2017

@handrews handrews removed the hypermedia label Sep 2, 2017

@handrews handrews changed the title from validation and hyper-schema: $data to $data Sep 2, 2017

@handrews

This comment has been minimized.

Show comment
Hide comment
@handrews

handrews Sep 26, 2017

Member

Random thought: Would it make sense to define $data as part of a separate vocabulary for data interaction? (for lack of a better term)

If we went this route, it would also add to the use cases for #314 for understanding multiple vocabularies in use simultaneously.

Member

handrews commented Sep 26, 2017

Random thought: Would it make sense to define $data as part of a separate vocabulary for data interaction? (for lack of a better term)

If we went this route, it would also add to the use cases for #314 for understanding multiple vocabularies in use simultaneously.

@epoberezkin

This comment has been minimized.

Show comment
Hide comment
@epoberezkin

epoberezkin Oct 6, 2017

Member

Separate vocabulary seems overkill...

Member

epoberezkin commented Oct 6, 2017

Separate vocabulary seems overkill...

@koegel koegel referenced this issue in eclipsesource/jsonforms Oct 10, 2017

Closed

Validate multiple fields - and custom validation #662

@exist3nz

This comment has been minimized.

Show comment
Hide comment
@exist3nz

exist3nz Nov 15, 2017

Hi everyone, I can see this issue is closed but it's the most relevant for what I'm looking for.
We use json schemas to defined configurations and some of our fields have a min and max field. Now, it's quite easy to see that in this situation we would want somehow to restrict via the schema that the max cannot be less than the min and it seemed this $data solution would have been perfect.

After researching around the topic I am very confused if this is still in the road map or not, or if there are alternative solutions to point to other values for validation. Please let me know what's the status as keeping the standard and not finding workaround hacks would be my first choice.

Hi everyone, I can see this issue is closed but it's the most relevant for what I'm looking for.
We use json schemas to defined configurations and some of our fields have a min and max field. Now, it's quite easy to see that in this situation we would want somehow to restrict via the schema that the max cannot be less than the min and it seemed this $data solution would have been perfect.

After researching around the topic I am very confused if this is still in the road map or not, or if there are alternative solutions to point to other values for validation. Please let me know what's the status as keeping the standard and not finding workaround hacks would be my first choice.

@handrews

This comment has been minimized.

Show comment
Hide comment
@handrews

handrews Nov 15, 2017

Member

@exist3nz this issue is still open- several linked issues are closed, which can look confusing the way GitHub displays it, but this is still very much open.

It's in the "draft-future" milestone which means that we do intend to consider it, but not in the next two drafts (draft-07, which will likely go out next week, or draft-08 which will focus on issues around re-usability). At some point I will make a draft-09 milestone, and assuming we can resolve the target set of issues for draft-08 in that one draft, $data will likely be the focus for draft-09. Although that's not set in stone by any means.

This would be an expansion of the scope of JSON Schema, which is why we are trying to nail down some key things within the current scope first.

Member

handrews commented Nov 15, 2017

@exist3nz this issue is still open- several linked issues are closed, which can look confusing the way GitHub displays it, but this is still very much open.

It's in the "draft-future" milestone which means that we do intend to consider it, but not in the next two drafts (draft-07, which will likely go out next week, or draft-08 which will focus on issues around re-usability). At some point I will make a draft-09 milestone, and assuming we can resolve the target set of issues for draft-08 in that one draft, $data will likely be the focus for draft-09. Although that's not set in stone by any means.

This would be an expansion of the scope of JSON Schema, which is why we are trying to nail down some key things within the current scope first.

@anweiss anweiss referenced this issue in usnistgov/OSCAL Dec 2, 2017

Open

JSON schema gaps in "implementation" layer #77

@gregsdennis

This comment has been minimized.

Show comment
Hide comment
@gregsdennis

gregsdennis Dec 19, 2017

Collaborator

Please note that this remark comes from the point of view of a hardcore .Net developer.

I'm not sure about other frameworks/languages, but in the web side of .Net , we have data contracts. These are implemented via attributes (annotations) that we can place on properties of our DTO models. This mechanism has no way to reference other properties in the way that we're thinking of here. (To be honest I think it's rather limited in other ways, too.)

Given that it's built into the framework, though, it is very widely used, and it raises a concern that some languages would not be able to model this kind of feature.

Now, that's not to say that it shouldn't be done. Those languages would just be more reliant upon validating via schema than via the framework, which is counter to having these annotations included in the first place.

Collaborator

gregsdennis commented Dec 19, 2017

Please note that this remark comes from the point of view of a hardcore .Net developer.

I'm not sure about other frameworks/languages, but in the web side of .Net , we have data contracts. These are implemented via attributes (annotations) that we can place on properties of our DTO models. This mechanism has no way to reference other properties in the way that we're thinking of here. (To be honest I think it's rather limited in other ways, too.)

Given that it's built into the framework, though, it is very widely used, and it raises a concern that some languages would not be able to model this kind of feature.

Now, that's not to say that it shouldn't be done. Those languages would just be more reliant upon validating via schema than via the framework, which is counter to having these annotations included in the first place.

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Dec 21, 2017

Those languages would just be more reliant upon validating via schema than via the framework

I have to make sure in my schema that a user-inputted max is larger than a certain minimum;
Though I can gladly use $data with AJV package/framework, I want to see this feature in the JSON-Schema;

This enhances my code maintainability:
If I ever stop using AJV or move to another package/framework for whichever reason, I don't want to lose such an ability, or have to go over the burden of re-writing the code to check for data consistency.

ghost commented Dec 21, 2017

Those languages would just be more reliant upon validating via schema than via the framework

I have to make sure in my schema that a user-inputted max is larger than a certain minimum;
Though I can gladly use $data with AJV package/framework, I want to see this feature in the JSON-Schema;

This enhances my code maintainability:
If I ever stop using AJV or move to another package/framework for whichever reason, I don't want to lose such an ability, or have to go over the burden of re-writing the code to check for data consistency.

@handrews

This comment has been minimized.

Show comment
Hide comment
@handrews

handrews May 25, 2018

Member

Bringing over some commentary from #541 and #549 since this is the oldest issue in this area of functionality:

I'm proposing that all of the possible $data-tagged features (including but not limited to loading instance data into the schema, loading external data into the schema, and asserting relationships among instance locations) be worked on as a new vocabulary. All of the proposals in this area add substantial complexity, and also do not need to change the existing core and validation specification concepts. With vocabulary support being added in draft-08, having one or more vocabularies for this area would allow it to develop independent from core and validation, which we hope are approaching a final draft. I expect this area would be pretty active with new ideas and feedback, which would delay finalization significantly if added to core or validation (and it's unrelated to hyper-schema, which has its own mechanisms for working with instance data in URI Templates).

Member

handrews commented May 25, 2018

Bringing over some commentary from #541 and #549 since this is the oldest issue in this area of functionality:

I'm proposing that all of the possible $data-tagged features (including but not limited to loading instance data into the schema, loading external data into the schema, and asserting relationships among instance locations) be worked on as a new vocabulary. All of the proposals in this area add substantial complexity, and also do not need to change the existing core and validation specification concepts. With vocabulary support being added in draft-08, having one or more vocabularies for this area would allow it to develop independent from core and validation, which we hope are approaching a final draft. I expect this area would be pretty active with new ideas and feedback, which would delay finalization significantly if added to core or validation (and it's unrelated to hyper-schema, which has its own mechanisms for working with instance data in URI Templates).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment