New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event processing software would like to have value-based schema dispatch #652

Open
timbray opened this Issue Sep 11, 2018 · 5 comments

Comments

Projects
None yet
5 participants
@timbray

timbray commented Sep 11, 2018

I work at AWS on event-driven software, e.g. CloudWatch Events - see https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/CloudWatchEventsandEventPatterns.html - and with the CNCF CloudEvents spec - see https://github.com/cloudevents/spec/blob/master/json-format.md

Events tend to be JSON texts, to come in streams, and be heterogeneous. That is to say, lots of types of events succeed each other in a stream. In general, there will be some common fields, often in a top-level "envelope" wrapper, and then some type-specific "payload" data. There is heavy use of "type" fields, for example the "type" field in CW Events and the "eventType" field in CloudEvents. All processing tends to be of the form "look at the Type field and figure out what to do".

I've found it very hard to use JSON Schema for this kind of data. Basically, I want to switch schemas based on the value of a field. The rules for something with a top-level "Type": "Foo" are different from something with "Type": "Bar".

The current "dependencies" keyword can change things based on the presence of a field, which is not what we want.

You can sort of get what you want with JSON Schema by using "oneOf", where your schema ends up looking some thing like

"additionalProperties": {
"oneOf": [
{ "$ref": "#/definitions/FooEvent" },
{ "$ref": "#/definitions/BarEvent" },
{ "$ref": "#/definitions/BazEvent" },
...
"FooEvent": {
"properties": {
"Type": { "enum" : [ "Foo" ] }
... lots of rules ...
}
"BarEvent": {
"properties": {
"Type": { "enum" : [ "Bar" ] }
... lots of rules ...
}

The problem with this is that it's really hard for a schema processor to produce good error messages. It runs through all the oneOf options and explain why each one of them can match. What we'd like is for some "Type" field to be magic so that it knows that the rest of the schema depends on the value of that field. That way, the schema would be more idiomatic, and the error messages could be super helpful: "Type 'FooEvent' lacks required field 'Timestamp'" or some such.

Is it possible I'm just missing an idiomatic, clean, obvious way to do what I want with JSON Schema? That would make me happy.

@gregsdennis

This comment has been minimized.

Collaborator

gregsdennis commented Sep 12, 2018

With Draft-07 you can use the if/then/else keywords to build a chain. It can get pretty deeply nested, though, if you have a lot of cases.

Something like this:

{
  "if":{
    "properties":{
      "type":{"const":"EventA"}
    }
  },
  "then":{
    "$ref": "#/definitions/EventA"
  },
  "else":{
    "if":{
      "properties":{
        "type":{"const":"EventB"}
      }
    },
    "then":{
      "$ref": "#/definitions/EventB"
    },
    "else":{
      "if":{
        "properties":{
          "type":{"const":"EventC"}
        }
      },
      "then":{
        "$ref": "#/definitions/EventC"
      },
      "else": false
    }
  }
}

(I like the const keyword over a single-valued enum.)

Not really sure if that helps your error messaging much, but it has the benefit that you don't have to completely evaluate all of the complete subschemas. You just evaluate the type property over and over until you find one that matches, and then you apply that schema.

@Relequestual

This comment has been minimized.

Member

Relequestual commented Sep 12, 2018

Hey @timbray Thanks for coming by to ask! It's great to have someone from Amazon engaging.

@gregsdennis is right here. Using if / then / else is your best bet. You'll have to be using at least draft-6, as those keywords aren't in draft-4. It's best if you define the draft version you're using in your schema (using the $schema keyword), as many validators now support multiple drafts.

I'd like to actively encourage you to join our JSON Schema slack server!! Discussion link on http://json-schema.org

@timbray

This comment has been minimized.

timbray commented Sep 12, 2018

Thanks for the guidance! - using if/else hadn’t occurred to me.

If you already have an if/else construct, is it reasonable to wonder about having a switch/case one as well? That would be a very idiomatic fit with the very common case that a JSON document has a "Type" field whose value is an enum whose value should switch in the right schema for the event.

@handrews

This comment has been minimized.

Member

handrews commented Sep 13, 2018

@timbray we discussed it, but it got complicated due to all of the different ways programming languages do or don't implement fall-through. Currently no JSON Schema that takes a list of schemas depends on the order of the schemas in the list, except for the array form of items which matches the schema and instance positions. We did not want to add ordered processing- I'm glossing over a lot here, but the AJV validator experimented with a switch keyword which was deemed more complicated than we wanted given that there are alternatives.

For exaple, this idiom:

{
  "anyOf": [
    {
      "if": {...},
      "then": {...}
    },
    {
      "if": {...},
      "then": {...}
    },
    ...
  ]
}

implements a unordered switch with something resembling fall-through. The same construct with oneOf implements a mutually exclusive switch, which is the same as a nested if/then/else chain but without the deep nesting nightmare that you get into without the oneOf.

Given those options, you can fiddle around with the *Of and if keywords to implement quite a few sorts of switches.

@awwright

This comment has been minimized.

Member

awwright commented Oct 31, 2018

I wrote a little about a similar problem at https://stackoverflow.com/questions/49823500/how-to-validate-a-json-object-against-a-json-schema-based-on-objects-type-descr/49996397#49996397

And in turn, #31 is a related issue tracking improvement of error reporting, even without any use of if/then. (But then again, maybe if/then is the solution to this problem.)

This is probably a frequent enough issue it should get some sort of treatment in the spec, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment