Hierarchical output structure needs clarification around which nodes are kept #1319

gregsdennis · 2022-09-26T23:32:39Z

Follow-up to #939

Currently, the specification says that all nodes are to be kept for the hierarchical structure. This needs to be clarified to state more precisely which nodes are intended.

This structure was meant to be a replacement for the 2019-09/2020-12 verbose structure. Therefore all nodes, including those which do not impact validation would be retained.

However, I realize this makes for a significantly larger structure in many cases. As such, I would like to propose that we default to only including those output units which affect the final outcome, but allowing for implementations to include all of them upon request (or configuration)

I'm open to opposing arguments and alternatives.

The text was updated successfully, but these errors were encountered:

handrews · 2022-09-27T00:43:34Z

@gregsdennis I've always been a little unclear about how verbose works. I recently noticed this (I'm sure I saw it before, but I'd forgotten):

This includes sub-schema validation results that would otherwise be removed (e.g. annotations for failed validations, successful validations inside a not keyword, etc.).

The spec mandates that annotations for failed validation scopes are deleted, so I don't understand what's happening here. On the other hand, I can see the value of such annotations being preserved for debugging, but technically doing so would require modifying the spec (unless those annotations are re-categorized as errors somehow).

gregsdennis · 2022-09-27T02:03:07Z

I think its primary use is debugging. It can be helpful knowing why, e.g., unevaluatedProperties processed foo, which you can find out by following annotations (another reason I output even the "internal" annotations).

But I think that use case is probably less common, and we should default to a more common use case, which is people want what affects final outcome.

handrews · 2022-09-27T02:31:26Z

another reason I output even the "internal" annotations

yes, I think most people do output them, if they support it at all. And I definitely see the debugging use.

Are you including annotations from failed subschemas, though? That is the part that I'm a bit confused about. Because the spec requires deleting them, but I can see them being useful for debugging as well so I wasn't sure what happened with verbose in that case.

gregsdennis · 2022-09-27T02:41:38Z

I output them where they were produced, yes, but it should be apparent from the validation results that they don't "flow down" (or up, depending on your tree orientation) or factor into the outcome.

handrews · 2022-09-27T02:48:38Z

I output them where they were produced, yes, but it should be apparent from the validation results that they don't "flow down" (or up, depending on your tree orientation) or factor into the outcome.

I don't really know what this means. Does this mean that you save them for output but not for use in parent schemas? Does this mean that sometimes there are both annotations and errors in the output unit?

gregsdennis · 2022-09-28T00:11:12Z

Yeah, it's really confusing and definitely needs clarification. The thing is, I'm not sure what it should be, so this discussion is useful.

(This will use the new output structures because they're simpler.)

The way I currently have my implementation set up, I only output annotations on nodes that pass, including for hierarchical. That said, it does include annotations from nested locations that passed.

Example:

// schema
{
  "anyOf": [
    {
      "title": "bar is required, foo is specified",
      "type": "object",
      "properties": {
        "foo": {
          "title": "found foo"
        }
      },
      "required": [
        "bar"
      ]
    },
    {
      "title": "foo is required, bar is specified",
      "type": "object",
      "properties": {
        "bar": {
          "title": "found bar"
        }
      },
      "required": [
        "foo"
      ]
    }
  ]
}

// instance
{ "foo": "value" }

// output with nested passing nodes (my current setup)
{
  "valid": true,
  "evaluationPath": "",
  "schemaLocation": "https://json-everything/base#",
  "instanceLocation": "",
  "nested": [
    {
      "valid": false,
      "evaluationPath": "/anyOf/0",
      "schemaLocation": "https://json-everything/base#/anyOf/0",
      "instanceLocation": "",
      "errors": {
        "required": "Required properties [\"bar\"] were not present"
      },
      "nested": [
        {
          "valid": true,
          "evaluationPath": "/anyOf/0/properties/foo",
          "schemaLocation": "https://json-everything/base#/anyOf/0/properties/foo",
          "instanceLocation": "/foo",
          "annotations": {
            "title": "found foo"
          }
        }
      ]
    },
    {
      "valid": true,
      "evaluationPath": "/anyOf/1",
      "schemaLocation": "https://json-everything/base#/anyOf/1",
      "instanceLocation": "",
      "annotations": {
        "title": "foo is required, bar is specified",
        "properties": []
      }
    }
  ]
}

// output with all annotations, including those in failed nodes
{
  "valid": true,
  "evaluationPath": "",
  "schemaLocation": "https://json-everything/base#",
  "instanceLocation": "",
  "nested": [
    {
      "valid": false,
      "evaluationPath": "/anyOf/0",
      "schemaLocation": "https://json-everything/base#/anyOf/0",
      "instanceLocation": "",
      "annotations": {
        "title": "bar is required, foo is specified",
        "properties": [
          "foo"
        ]
      },
      "errors": {
        "required": "Required properties [\"bar\"] were not present"
      },
      "nested": [
        {
          "valid": true,
          "evaluationPath": "/anyOf/0/properties/foo",
          "schemaLocation": "https://json-everything/base#/anyOf/0/properties/foo",
          "instanceLocation": "/foo",
          "annotations": {
            "title": "found foo"
          }
        }
      ]
    },
    {
      "valid": true,
      "evaluationPath": "/anyOf/1",
      "schemaLocation": "https://json-everything/base#/anyOf/1",
      "instanceLocation": "",
      "annotations": {
        "title": "foo is required, bar is specified",
        "properties": []
      }
    }
  ]
}

I think for debugging purposes, showing all of the annotations (the second output example) is arguably more useful, but it's also really confusing if you don't understand how annotations work or how they're used internally.

I'm not really sure that hiding annotations the annotations at failed nodes (the first output example) is terribly useful for debugging, which kinda makes me wonder if even the nested nodes of a failure need to be shown.

Note that this probably also impacts what's included in basic, and it may also relate to #632.

Just for fun (and probably completeness, etc, whatever), here are a couple more options:

// output omitting children of failed nodes
{
  "valid": true,
  "evaluationPath": "",
  "schemaLocation": "https://json-everything/base#",
  "instanceLocation": "",
  "nested": [
    {
      "valid": false,
      "evaluationPath": "/anyOf/0",
      "schemaLocation": "https://json-everything/base#/anyOf/0",
      "instanceLocation": "",
      "errors": {
        "required": "Required properties [\"bar\"] were not present"
      }
    },
    {
      "valid": true,
      "evaluationPath": "/anyOf/1",
      "schemaLocation": "https://json-everything/base#/anyOf/1",
      "instanceLocation": "",
      "annotations": {
        "title": "foo is required, bar is specified",
        "properties": []
      }
    }
  ]
}

// output omitting children that don't impact validation result
{
  "valid": true,
  "evaluationPath": "",
  "schemaLocation": "https://json-everything/base#",
  "instanceLocation": "",
  "nested": [
    {
      "valid": true,
      "evaluationPath": "/anyOf/1",
      "schemaLocation": "https://json-everything/base#/anyOf/1",
      "instanceLocation": "",
      "annotations": {
        "title": "foo is required, bar is specified",
        "properties": []
      }
    }
  ]
}

It could be argued that something like a oneOf would need to show all of its children to prove that only one validated successfully (also what were the errors for the ones that didn't?). For something like not, presumably you'd always want the subtree.

gregsdennis · 2022-09-28T00:23:34Z

Looking at the "include all annotations" one again, I think it could be made clearer by having another property (annotationsDropped here) indicating that the annotations were dropped.

{
  "valid": true,
  "evaluationPath": "",
  "schemaLocation": "https://json-everything/base#",
  "instanceLocation": "",
  "nested": [
    {
      "valid": false,
      "evaluationPath": "/anyOf/0",
      "schemaLocation": "https://json-everything/base#/anyOf/0",
      "instanceLocation": "",
      "annotationsDropped": true,
      "annotations": {
        "title": "bar is required, foo is specified",
        "properties": [
          "foo"
        ]
      },
      "errors": {
        "required": "Required properties [\"bar\"] were not present"
      },
      "nested": [
        {
          "valid": true,
          "evaluationPath": "/anyOf/0/properties/foo",
          "schemaLocation": "https://json-everything/base#/anyOf/0/properties/foo",
          "instanceLocation": "/foo",
          "annotations": {
            "title": "found foo"
          }
        }
      ]
    },
    {
      "valid": true,
      "evaluationPath": "/anyOf/1",
      "schemaLocation": "https://json-everything/base#/anyOf/1",
      "instanceLocation": "",
      "annotations": {
        "title": "foo is required, bar is specified",
        "properties": []
      }
    }
  ]
}

That way there's an explicit indication (as opposed to the implication of valid: false) that the annotations of that subtree aren't considered higher up in the tree.

gregsdennis · 2022-09-29T20:56:46Z

A subtle alternative would be to list the annotations in a droppedAnnotation property (instead of having annotations and a separate property that indicated they were dropped).

{
  "valid": true,
  "evaluationPath": "",
  "schemaLocation": "https://json-everything/base#",
  "instanceLocation": "",
  "nested": [
    {
      "valid": false,
      "evaluationPath": "/anyOf/0",
      "schemaLocation": "https://json-everything/base#/anyOf/0",
      "instanceLocation": "",
      "droppedAnnotations": {
        "title": "bar is required, foo is specified",
        "properties": [
          "foo"
        ]
      },
      "errors": {
        "required": "Required properties [\"bar\"] were not present"
      },
      "nested": [
        {
          "valid": true,
          "evaluationPath": "/anyOf/0/properties/foo",
          "schemaLocation": "https://json-everything/base#/anyOf/0/properties/foo",
          "instanceLocation": "/foo",
          "annotations": {
            "title": "found foo"
          }
        }
      ]
    },
    {
      "valid": true,
      "evaluationPath": "/anyOf/1",
      "schemaLocation": "https://json-everything/base#/anyOf/1",
      "instanceLocation": "",
      "annotations": {
        "title": "foo is required, bar is specified",
        "properties": []
      }
    }
  ]
}

I think I like this better.

This actually affects basic as well. Perhaps, with default settings, only output units that directly affect the validation outcome or contain annotations that weren't dropped should be kept in either format. The spec could then have a provision to allow an implementation to provide an option that includes all output units and dropped annotations, and the dropped annotations would need to be in a droppedAnnotations property to provide clarity.

That would mean that you could have nodes that have:

failed validation
an error message
a list of dropped annotations

handrews · 2022-09-29T20:59:17Z

@gregsdennis I like the droppedAnnotations idea.

gregsdennis · 2022-09-29T22:27:27Z

I think the primary question I have for the default behavior (don't show everything) is how to specify what's considered to "affect validation output" for keywords that have multiple subschemas.

allOf - Failures could just show all of the failed subschemas. Passes would show only nodes with annotations.
anyOf - Failures would need to show all of the failed subschemas. Passes would show only nodes with annotations (same as allOf).
oneOf - Failures would need to show either all of the subschemas (if all failed) or the multiple passed subschemas (would failures need to be shown for a multiple-pass case?). Passes would only show the passed subschema.
properties & prefixItems would need to show all of the failed subschemas. Passes would show only nodes with annotations (same as allOf).

It seems that oneOf is the real outlier in the multiple-pass case. I suppose we could just say that a failure would show all subschemas (all failed subschemas wouldn't show the oneOf multiple-pass case). But then if only one subschema of an allOf fails, that's a lot of unnecessary information.

I'm not sure what to do with that. I'd prefer that the keywords themselves not specify what the output is, however they already do that for annotations. I'm open to ideas.

gregsdennis · 2022-11-07T21:16:06Z

not is another interesting one. I might expect that the results of its subschema are always shown. If not passes, you'll likely want to see what failed inside of it. If it fails... ? Is it valuable to see a passed subschema validation result in this case? You're just going to get a lot of annotations that are ultimately dropped by not.

handrews · 2022-11-07T21:25:20Z

I think it's worth keeping in mind the possible interaction of this issue and discussion json-schema-org/community#236, particularly since @marksparkza is taking a crack at implementing something based on it.

That proposal has to do with enabling/disabling annotation collection on a more fine-grained level, and while I intend for it to be separate from the output system, both annotations and the more general concept of a node of data are things that are kept or not, which (if kept) show up in output.

I feel like having a general filtering layer would a.) allow for a wider variety of implementations and use cases, such as a more interactive approach with callbacks at various points of validation (don't ask for details, I literally just made that up right now) and b.) would allow the output part of the specification to stay simple. It outputs whatever it gets. It doesn't care whether there was filtering, or on what criteria, etc. Decisions about what types of filters should be easy or on by default (e.g. drop all nodes that fail validation) can be part of the filtering system.

gregsdennis · 2022-11-07T21:30:27Z

I think that conversation is somewhat orthogonal to this one since annotations are merely a property in the output unit. This conversation is about the structure, i.e. "Which nested results are included?"

Edit: this just sank in...

Decisions about what types of filters should be easy or on by default (e.g. drop all nodes that fail validation) can be part of the filtering system.

handrews · 2022-11-07T21:49:16Z

Glad to see the edit (and yeah why do points always click right after we hit "send"? It's like they hide until you've already commented).

The other side of this is that both annotation collection and output nodes in general take up space, and potentially a lot of it. There are performance reasons to filter both. With something on the scale of the FHIR mega-schema, it might only be feasible to store output notes for a portion of the schema, so you might have to run with different filters in order to narrow the errors down and get the information you need. I'm thinking of this b/c I know someone (maybe you?) talked about error collection running out of memory.

gregsdennis · 2023-05-07T16:41:53Z

Just an update. I'm implementing some basic pruning logic on a few keywords in JsonSchema.Net, and it looks promising.

From the test suite:

// schema
{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "items": [
    {}
  ],
  "additionalItems": {
    "type": "integer"
  }
}

// instance
[null,2,3,"foo"]

// without pruning
{
  "valid": false,
  "evaluationPath": "",
  "schemaLocation": "https://json-everything.net/fc30dcb71c",
  "instanceLocation": "",
  "details": [
    {
      "valid": true,
      "evaluationPath": "",
      "schemaLocation": "https://json-everything.net/fc30dcb71c",
      "instanceLocation": ""
    },
    {
      "valid": true,
      "evaluationPath": "/additionalItems",
      "schemaLocation": "https://json-everything.net/fc30dcb71c#/additionalItems",
      "instanceLocation": "/1"
    },
    {
      "valid": true,
      "evaluationPath": "/additionalItems",
      "schemaLocation": "https://json-everything.net/fc30dcb71c#/additionalItems",
      "instanceLocation": "/2"
    },
    {
      "valid": false,
      "evaluationPath": "/additionalItems",
      "schemaLocation": "https://json-everything.net/fc30dcb71c#/additionalItems",
      "instanceLocation": "/3",
      "errors": {
        "type": "Value is \"string\" but should be \"integer\""
      }
    }
  ]
}

// with pruning
{
  "valid": false,
  "evaluationPath": "",
  "schemaLocation": "https://json-everything.net/263f35c2f9",
  "instanceLocation": "",
  "details": [
    {
      "valid": true,
      "evaluationPath": "/0",
      "schemaLocation": "https://json-everything.net/263f35c2f9#/0",
      "instanceLocation": "/0"
    },
    {
      "valid": false,
      "evaluationPath": "/additionalItems",
      "schemaLocation": "https://json-everything.net/263f35c2f9#/additionalItems",
      "instanceLocation": "/3",
      "errors": {
        "type": "Value is \"string\" but should be \"integer\""
      }
    }
  ]
}

With pruning enabled, it's much easier to see what the problem is.

jam01 · 2024-06-03T04:08:46Z

Hey all. I'm wrapping up a validation and annotation collection library I started from scratch a couple of months back, and just found this issue plus the new doc regarding the new output format.

I based my output on the 2020-12 format and have some questions/concerns about the new one. What I'm not sure is what is the right gh issue or discussion to bring it up...? I found some discussions but they're 2+ years old now.

Anyway, please let me know where makes sense to comment a bit and ask questions on that :)

gregsdennis · 2024-06-03T05:46:17Z

Hey @jam01. Feel free to open a new issue or a discussion on the community repo. Happy to answer questions.

In short, 2020-12 is the latest published. It's recommended, not required, by the specification. In my lib, I use the format that's in the new doc. (It's in a new doc because we've decided to extract it to allow others to devise other formats for non-machine audiences.)

gregsdennis added Priority: Medium output labels Sep 26, 2022

gregsdennis mentioned this issue Oct 18, 2022

add droppedAnnotations #1329

Merged

gregsdennis added this to the draft-next milestone Jan 22, 2023

handrews mentioned this issue Feb 20, 2023

Document existing support for "hierarchical" output marksparkza/jschon#52

Closed

gregsdennis mentioned this issue Mar 3, 2023

Gregsdennis/output spec #1385

Closed

gregsdennis mentioned this issue Apr 27, 2023

Validation of object type with anyOf/oneOf json-everything/json-everything#440

Closed

gregsdennis mentioned this issue Aug 4, 2023

Bring back the option to exclude EvaluationResults json-everything/json-everything#500

Closed

1 task

gregsdennis mentioned this issue Sep 1, 2023

Extract output to its own spec #1429

Merged

gregsdennis mentioned this issue Sep 12, 2023

JsonSchema.GenerateCode fails with "This schema is not in a supported form." with no additional details json-everything/json-everything#523

Closed

1 task

gregsdennis mentioned this issue Nov 7, 2023

Valid ValidationResults has detail with error [when usage of const in if statement] json-everything/json-everything#556

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hierarchical output structure needs clarification around which nodes are kept #1319

Hierarchical output structure needs clarification around which nodes are kept #1319

gregsdennis commented Sep 26, 2022

handrews commented Sep 27, 2022

gregsdennis commented Sep 27, 2022

handrews commented Sep 27, 2022

gregsdennis commented Sep 27, 2022

handrews commented Sep 27, 2022

gregsdennis commented Sep 28, 2022

gregsdennis commented Sep 28, 2022 •

edited

Loading

gregsdennis commented Sep 29, 2022

handrews commented Sep 29, 2022

gregsdennis commented Sep 29, 2022

gregsdennis commented Nov 7, 2022

handrews commented Nov 7, 2022

gregsdennis commented Nov 7, 2022 •

edited

Loading

handrews commented Nov 7, 2022

gregsdennis commented May 7, 2023 •

edited

Loading

jam01 commented Jun 3, 2024

gregsdennis commented Jun 3, 2024

Hierarchical output structure needs clarification around which nodes are kept #1319

Hierarchical output structure needs clarification around which nodes are kept #1319

Comments

gregsdennis commented Sep 26, 2022

handrews commented Sep 27, 2022

gregsdennis commented Sep 27, 2022

handrews commented Sep 27, 2022

gregsdennis commented Sep 27, 2022

handrews commented Sep 27, 2022

gregsdennis commented Sep 28, 2022

gregsdennis commented Sep 28, 2022 • edited Loading

gregsdennis commented Sep 29, 2022

handrews commented Sep 29, 2022

gregsdennis commented Sep 29, 2022

gregsdennis commented Nov 7, 2022

handrews commented Nov 7, 2022

gregsdennis commented Nov 7, 2022 • edited Loading

handrews commented Nov 7, 2022

gregsdennis commented May 7, 2023 • edited Loading

jam01 commented Jun 3, 2024

gregsdennis commented Jun 3, 2024

gregsdennis commented Sep 28, 2022 •

edited

Loading

gregsdennis commented Nov 7, 2022 •

edited

Loading

gregsdennis commented May 7, 2023 •

edited

Loading