Add built-in funtion to support JSON Schema Validation #1449

rbuckland · 2019-05-23T08:36:05Z

Desired Feature

Validate input JSON document against a set of pre-configured schemas.

json-schema draft 4 and 7 support
json-schema URI to be "local" or remote

The interface I propose is simply a new built in function.

result := schema.validate(<uri to schema>, <data-to-validate>)

the result should be

{ 
   valid: true|false,
   errors: [ 
      ..errors... if it was not valid
    ]
}

The text was updated successfully, but these errors were encountered:

srenatus · 2019-05-23T13:03:01Z

😄 s/Ass/Add/, I guess? (PR title)

BenderScript · 2019-05-23T20:51:03Z

There is an interesting library for it: https://github.com/xeipuuv/gojsonschema

rbuckland · 2019-05-24T03:54:33Z

@repenno yes - this is the library i modelled the "result" above off of.
Three elements for the schema library to support (use) are

License compatibility
JSON Schema Spec support (eg: draft-04, draft-06 draft-07) etc
Active library development (community, supporting entities etc)

BenderScript · 2019-05-24T15:50:00Z

The feature itself needs some discussion but ok. The issue would be the vendoring of the library. If @tsandall is okay with the dependency it should be ok to implement.

tsandall · 2019-05-29T18:01:36Z

We've talked about adding support for input (and contextual data) schema validation for a while. It's probably time to start putting a plan in place.

Before we commit to adding a JSON Schema validation built-in function, we need to figure out (a) how it would relate to the existing type inferencing & checking we do for virtual documents and (b) what syntactic changes we could introduce to elevate schema to a first-class feature.

/cc @timothyhinrichs

timothyhinrichs · 2019-06-22T01:08:56Z

Some thoughts on schema checking, as a first-class citizen. (Potentially complementary to the builtin approach.)

Declaring schema for input

Schemas can be defined as simply objects under /data, so if you want to use Rego to define that schema (e.g. sharing parts across multiple schemas) you can. (Not sure if JSON-schema already allows this.) Then annotate rego with the proper schema. Here we're considering schema for just the input object.

Examples from k8s use-case, but applicable to all use cases. There are different levels of granularity at which you might apply schema. The most important seems to be Rule-level schema.

Rule-level schema
This is important for authz cases because there are often multiple rules with allow/deny.

@ruleschema = data.schemas.io.k8s.v1.pod
deny[msg] {
	input.request.kind.kind == “Pod”
	input.request.object.spec.containers[_].image == “nginx”
}

Document-level schema

@docschema = data.schemas.bar
foo {
	input.a.b
}
foo {
	input.a.b
}

Module-level schema and package level schema are floating in the file. Could have keywords like moduleschema and packageschema; or make them look like annotations for consistency and apply them possibly to the 'package' directive.

@moduleschema = data.schema.bar
@packageschema = data.schema.qux

Multiple Schemas
People often introduce abstractions where input may have multiple different schemas. We could consider allowing multiple schema declarations.

@docschema=data.k8s.workload.daemonset
@docschema=data.k8s.workload.deployment
container[x] {
	x := input.request.spec.template.spec.containers[_]
}

On the other hand, this will happen most often at the helper-level, and type/schema-inference from upper-level decisions may handle this anyway. Typically every top-level decision (or at least every top-level rule) has one kind of input it handles.

Functionality

Schema failures could be compile-time failures. The downside would be that you couldn't load/run policies on their own without also having the schemas, which makes sharing difficult. Could have a strict/relaxed mode.
Schema failures could be compile-time warnings with an evaluation flag indicating they should turned into errors.
Schema failures could be checked at eval time when an input is provided; if none of the decision rules match that schema, the system could return undefined immediately.

Roadmap

I'd think about just supporting rule-level schema to start. We could even just use the keyword @Schema and introduce @docschema and @packageschema later if required.
Suggestions for alternative syntax other than annotation are worth considering. We don't have annotations today. Perhaps

schema data.foo.bar 
deny[msg] { ... }

anderseknert · 2021-06-02T12:26:14Z

With JSON schema support now in OPA, adding a built-in to do this on arbitrary data seems a lot more viable, and might be an alternative approach to do on-the-fly validation of input or data objects as well 🤔

stale · 2021-11-22T19:20:02Z

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

jkulvich · 2022-11-28T11:01:40Z

Ok, let me copy my message with explanation from #5417 :)

Hi! Let me explain our case.
We would like to validate some JSON object with JSON schema, and have an error if occurred.
Both these JSONs (schema and doc) passes with input or data.

So, next points are important for us:

Validation flag.
Validation error.
Performance (single function call for both values above).

And, I can image and implement one of next contracts:

jsonschema.match(schema, doc): string | nil - Just like regexp.match. In this case we can use a nil string as a success validation flag.
jsonschema.validate(schema, doc): { valid: bool, error: string | nil } - More comfortable way to get only what we need with single built-in call. But I couldn't find any implemented built-in with similar return signature, and I'm not sure that this is acceptable by OPA conventions.

Of course, we can follow classical way with 2 built-ins .is_valid(): bool, and .validate(): string. But performance.

Can I have any advice? Thanks!

anderseknert · 2022-11-28T11:11:14Z

As for performance, I can imagine this would be another built-in that would benefit from caching, in order to avoid having to re-parse the schema with each request. @philipaconrad was that done for some other similar built-in recently, or was there just talk about it? I can't remember 😅

srenatus · 2022-11-28T11:17:15Z

@anderseknert the issue you're thinking about is probably #5377.

anderseknert · 2022-11-28T11:27:29Z

Indeed, thanks @srenatus 👍 Let's keep this one in mind for that enhancement as well.

olegroom · 2022-11-28T12:54:23Z

So the function we ok with is jsonschema.match(schema, doc): string | nil
Now, can we start develop it and make a pull request?

anderseknert · 2022-11-28T13:01:28Z

Returning null or undefined to signal sucess doesn't align well with the rest of the built-in functions, IMO. I haven't thought too much about it, but I'd probably rather return a set of errors, and have an empty set signal success. I guess we can think more about it as you work on it, as that should be a minor detail to update later, or so I'd assume.

Also, I'd prefer the jsonschema.validate name over jsonschema.match. A jsonschema.is_valid that took only the schema itself and checked it for validity sounds like it could be useful too.

anderseknert · 2022-11-28T13:03:54Z

As for where to start, see the docs on contributing, contributing code, development and adding built-in functions.

jkulvich · 2022-11-29T07:13:48Z

Well, what's our decision? It's ok if I implement next built-ins?

jsonschema.validate(schema, doc): []string // List of errors
jsonschema.is_valid(schema, doc): bool // OK or Not OK

Next, I will be grateful for examples with OPA built-ins shared cache. Thanks!

srenatus · 2022-11-29T09:07:49Z

jsonschema.validate(schema, doc): []string // List of errors

Strings have no structure. So, the callers of this method would have to use the error strings as-is. I could imagine having a set of objects instead, pointing to a part of the schema or doc...? But that depends on what the underlying json schema library gives us, and on what a caller actually needs.

You've clearly got a use case in mind here -- is it really sufficient for you to have an array of error messages?

💭 Array or set? WDYT?

I will be grateful for examples with OPA built-ins shared cache.

There are two relevant caches in OPA:

intra-query cache: two calls in a single policy evaluation should not need two validation runs
inter-query cache: two calls in two subsequent policy evaluations should not need two validation runs
both of these would only apply to the same schema input of course. Concretely, they would cache the parsed schema, and re-use that to speed up schema checking on subsequent calls.

However, using the inter-query cache for this has no precedent, and is still something we're trying to figure out. (That was in the context of the graphql discussion.)

We should also check if the underlying jsonschema lib already does some sort of caching on its own.

jkulvich · 2022-11-29T10:08:12Z

Ok, list of objects instead of strings looks like a better decision.

Above proposal with gojsonschema looks like fine lib for me.
So, I made a bit example with possible fields which can be returned for validation built-in.
Thus, this is a built-in proposal usage example:

allow {
  jsonschema.is_valid(data.schema)
  # ...

  errors := jsonschema.validate(data.schema, input.document)
  # [
  #   {
  #     "error": "vegetables.0: veggieName is required",
  #     "type": "required",
  #     "field": "vegetables.0",
  #     "desc": "veggieName is required"
  #   },
  #   ...
  # ]

  count(errors) == 0
  # ...
}

What do you think?

srenatus · 2022-11-29T10:28:16Z

What do you think?

This looks good to me -- but ultimately, you're the ones who'd like to use it. 😎 I assume it would fit to your requirements?

olegroom · 2022-11-29T12:44:41Z

I assume it would fit to your requirements?

Exactly. We use OPA to check different user's json files and in case of errors, return understandable error message for the user.

srenatus · 2022-11-29T12:56:22Z

@anderseknert any objections from your end? ☝️

anderseknert · 2022-11-29T13:00:52Z

LGTM 👍

rbuckland · 2022-11-29T23:08:23Z

This would be awesome. Spot on

anderseknert · 2023-03-06T10:48:40Z

4 years later 😅 But I'm happy to see this fixed in #5486

Will be included in the next OPA release (v0.50.0) 🎉

rbuckland changed the title ~~Ass support JSON Schema Validation~~ Add support JSON Schema Validation May 23, 2019

tsandall added the design label May 29, 2019

benc-uk mentioned this issue Jun 2, 2020

[Proposal][WIP] Auth and Policy Design dapr/dapr#1617

Closed

15 tasks

OR13 mentioned this issue Jul 9, 2020

Examine other Rules Expression Language before building our own for Submission Requirement decentralized-identity/presentation-exchange#22

Closed

aavarghese mentioned this issue Feb 2, 2021

Adding schema type checking support for OPA (opa eval -s) #3060

Merged

itaysk mentioned this issue Feb 2, 2021

Create rego code-documentation support to support generating documentation about policies #2180

Closed

aavarghese mentioned this issue Feb 3, 2021

Adding rule annotations with schema type checking support for OPA #3123

Merged

stale bot added the inactive label Nov 22, 2021

This was referenced Jan 17, 2022

[Snyk] Security upgrade node-fetch from 2.6.0 to 3.1.1 qsays/opa#7

Open

[Snyk] Security upgrade node-fetch from 2.6.1 to 3.1.1 laozhudetui/opa#6

Open

[Snyk] Security upgrade node-fetch from 2.6.1 to 3.1.1 verygoodsecurity/opa#9

Closed

philipaconrad added this to Backlog in Open Policy Agent via automation Sep 30, 2022

philipaconrad mentioned this issue Sep 30, 2022

annotations: Allow applying schemas to rule/function outputs #5201

Open

anderseknert added the builtins label Oct 2, 2022

stale bot removed the inactive label Oct 2, 2022

anderseknert changed the title ~~Add support JSON Schema Validation~~ Add built-in funtion to support JSON Schema Validation Oct 2, 2022

srenatus mentioned this issue Nov 28, 2022

built-in function to check is schema valid. is_schema_valid(jsonObj, jsonShema) returning true/false #5417

Closed

snyk-bot mentioned this issue Feb 8, 2023

[Snyk] Upgrade node-fetch from 2.6.1 to 2.6.8 pcaruana/opa#2

Open

This was referenced Feb 26, 2023

Add JSON Schema validation built-ins #5702

Closed

#1449: JSON Schema validation and verification #5486

Merged

anderseknert closed this as completed Mar 6, 2023

Open Policy Agent automation moved this from Backlog to Done Mar 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add built-in funtion to support JSON Schema Validation #1449

Add built-in funtion to support JSON Schema Validation #1449

rbuckland commented May 23, 2019

srenatus commented May 23, 2019

BenderScript commented May 23, 2019

rbuckland commented May 24, 2019

BenderScript commented May 24, 2019

tsandall commented May 29, 2019

timothyhinrichs commented Jun 22, 2019 •

edited

anderseknert commented Jun 2, 2021

stale bot commented Nov 22, 2021

jkulvich commented Nov 28, 2022 •

edited

anderseknert commented Nov 28, 2022

srenatus commented Nov 28, 2022

anderseknert commented Nov 28, 2022

olegroom commented Nov 28, 2022

anderseknert commented Nov 28, 2022

anderseknert commented Nov 28, 2022

jkulvich commented Nov 29, 2022

srenatus commented Nov 29, 2022

jkulvich commented Nov 29, 2022

srenatus commented Nov 29, 2022

olegroom commented Nov 29, 2022 •

edited

srenatus commented Nov 29, 2022

anderseknert commented Nov 29, 2022

rbuckland commented Nov 29, 2022

anderseknert commented Mar 6, 2023

Add built-in funtion to support JSON Schema Validation #1449

Add built-in funtion to support JSON Schema Validation #1449

Comments

rbuckland commented May 23, 2019

Desired Feature

srenatus commented May 23, 2019

BenderScript commented May 23, 2019

rbuckland commented May 24, 2019

BenderScript commented May 24, 2019

tsandall commented May 29, 2019

timothyhinrichs commented Jun 22, 2019 • edited

anderseknert commented Jun 2, 2021

stale bot commented Nov 22, 2021

jkulvich commented Nov 28, 2022 • edited

anderseknert commented Nov 28, 2022

srenatus commented Nov 28, 2022

anderseknert commented Nov 28, 2022

olegroom commented Nov 28, 2022

anderseknert commented Nov 28, 2022

anderseknert commented Nov 28, 2022

jkulvich commented Nov 29, 2022

srenatus commented Nov 29, 2022

jkulvich commented Nov 29, 2022

srenatus commented Nov 29, 2022

olegroom commented Nov 29, 2022 • edited

srenatus commented Nov 29, 2022

anderseknert commented Nov 29, 2022

rbuckland commented Nov 29, 2022

anderseknert commented Mar 6, 2023

timothyhinrichs commented Jun 22, 2019 •

edited

jkulvich commented Nov 28, 2022 •

edited

olegroom commented Nov 29, 2022 •

edited