Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#1449: JSON Schema validation and verification #5486

Merged
merged 7 commits into from
Mar 2, 2023

Conversation

jkulvich
Copy link
Contributor

@jkulvich jkulvich commented Dec 15, 2022

Added new built-in functions:

# Accepts 1 argument which can be string or object and checks if it is valid JSON schema.
jsonschema.is_valid(schema): bool

# Accepts 1 argument which can be string or object and checks if it is valid JSON schema.
# Returns string in case of error or empty string otherwise.
jsonschema.validate(schema): string

# Accepts 2 arguments both can be string or object and verifies if the document matches the JSON schema.
# Returns true if the document matches the schema.
json.is_match_schema(document, schema): bool

# Accepts 2 arguments both can be string or object and verifies if the document matches the JSON schema.
# Returns an array of errors or empty array.
# Returns an empty array if no errors are found.
# In case of internal error returns empty array.
json.match_schema(document, schema): []struct{ error: string, type: string, field: string, desc: string }

schema and document can be JSON string or object.

Signed-off-by: Yuri Kulagin jkulvichi@gmail.com

@netlify
Copy link

netlify bot commented Dec 15, 2022

Deploy Preview for openpolicyagent ready!

Name Link
🔨 Latest commit 1330cdf
🔍 Latest deploy log https://app.netlify.com/sites/openpolicyagent/deploys/6400791c359a4c000826ae42
😎 Deploy Preview https://deploy-preview-5486--openpolicyagent.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

Copy link
Contributor

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing. Some inline comments, and some tests would be required, too. See test/cases/testdata

topdown/jsonschema.go Outdated Show resolved Hide resolved
topdown/jsonschema.go Outdated Show resolved Hide resolved
Signed-off-by: Yuri Kulagin <jkulvichi@gmail.com>
@jkulvich
Copy link
Contributor Author

@srenatus, I have covered the functions with tests, and added a bit more comments. Can I have your review? :)

Copy link
Contributor

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I wasn't explicit enough -- it would be preferable to have the tests in test/cases/testdata, which is a bunch of yaml files. They are also used by other projects to ensure conformance when the builtins are implemented in other ways -- e.g. in Wasm, or in IR-based projects.

That said, I'm not completely certain about the error-returning there; I don't believe we've done that in any other place. @philipaconrad, what do you think? I suppose having some alignment between the graphql schema verification builtins we already have, and the new ones here, would make sense.

topdown/jsonschema.go Outdated Show resolved Hide resolved
topdown/jsonschema.go Outdated Show resolved Hide resolved
topdown/jsonschema.go Outdated Show resolved Hide resolved
topdown/jsonschema.go Outdated Show resolved Hide resolved
topdown/jsonschema.go Outdated Show resolved Hide resolved
topdown/jsonschema_test.go Outdated Show resolved Hide resolved
ast/builtins.go Outdated Show resolved Hide resolved
),
}

// JSONMatchSchema returns empty array if the document matches the JSON schema,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both json.is_match_schema and json.match_schema then? It seems like this rego snippet would do to implement the former:

is_match_schema(doc, schema) if json.match_schema(doc, schema) == []

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Do we also need jsonschema.is_valid?
This can be implemented in the same way using jsonschema.verify.

@philipaconrad
Copy link
Contributor

philipaconrad commented Jan 17, 2023

That said, I'm not completely certain about the error-returning there; I don't believe we've done that in any other place.

@srenatus You are correct-- no other builtins return error messages as part of their normal output. We've had requests in the past for something like this, for example as part of the GraphQL builtins.

I believe we've avoided implementing any such feature so far, because doing so could expose a lot of unnecessary implementation details, and returning error messages does not mesh with the existing style of error-handling/validation logic for policies. 🙁

@ashutosh-narkar
Copy link
Member

@jkulvich these builtins would be a great addition! When you get a chance please do address the comments in this review and also about adding tests in test/cases/testdata as suggested here. Please let us know if can help. Thanks for contribution!

Signed-off-by: Yuri Kulagin <jkulvichi@gmail.com>
@jkulvich
Copy link
Contributor Author

jkulvich commented Jan 27, 2023

I believe we've avoided implementing any such feature so far, because doing so could expose a lot of unnecessary implementation details, and returning error messages does not mesh with the existing style of error-handling/validation logic for policies. 🙁

@philipaconrad, indeed, I have not seen previously implemented functions return error information.

However, in our case, it is required to provide the user not only a binary decision of the rule, but also technical info about why the error occurred.

For this reason, simplified is_ functions are proposed. However, they can be replaced with snippets, as @srenatus mentioned

Signed-off-by: Yuri Kulagin <jkulvichi@gmail.com>
asJSON, err = ast.JSON(value)
if err != nil {
return nil, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably use gojsonschema.NewGoLoader

// Returns an array of errors or empty array.
// Returns an empty array if no errors are found.
// In case of internal error returns empty array.
func builtinJSONMatchSchema(_ BuiltinContext, operands []*ast.Term, iter func(*ast.Term) error) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to specify in the docs that the first argument is the document to match with the schema and second is the schema itself.

Comment on lines 103 to 113
// Make new schema instance to provide validations.
schema, err := gojsonschema.NewSchema(schemaLoader)
if err != nil {
return err
}

// Use the schema instance to validate the document.
result, err := schema.Validate(documentLoader)
if err != nil {
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also do if you like:

gojsonschema.Validate(schemaLoader, documentLoader)

@ashutosh-narkar
Copy link
Member

However, in our case, it is required to provide the user not only a binary decision of the rule, but also technical info about why the error occurred.

Directly returning an error string from the builtin does not seem the right approach. How about returning an object from the builtin and one of the fields could be an error field whose value will be an object.

@ashutosh-narkar
Copy link
Member

@jkulvich if you like we are happy to carry forward your great work and add tests etc. These are useful builtins and if you need our help to move this along we're happy to do it. Let us know what you think. Thanks!

Copy link
Contributor

@johanfylling johanfylling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken the liberty to update this PR according to review comments.

@@ -8726,6 +8753,26 @@
},
"wasm": true
},
"jsonschema.verify": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this perhaps be json.verify_schema?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes json.match_schema and json.verify_schema would be better.

@mjungsbluth
Copy link
Contributor

To link this to the actual issue, this PR implements issue #1449 .

* Merging `jsonschema.is_valid` into `jsonschema.verify`, and `json.is_match_schema` into `json.match_schema`
* Adding yaml test cases in testdata

Signed-off-by: Johan Fylling <johan.dev@fylling.se>
Copy link
Member

@ashutosh-narkar ashutosh-narkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. Few comments.

ast/builtins.go Outdated
),
),
}, nil)).
Description("`output` is of the form `[match, errors]`. If the document is valid given the schema, then `match` is `true`, and `errors` is an empty array. Otherwise, `match` is `true` and `errors` is an array of objects describing the error(s)."),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean "Otherwise, match is false" ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, of course 🙄 .

@@ -8726,6 +8753,26 @@
},
"wasm": true
},
"jsonschema.verify": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes json.match_schema and json.verify_schema would be better.


// builtinJSONSchemaValidate accepts 1 argument which can be string or object and checks if it is valid JSON schema.
// Returns array [false, <string>] with error string at index 1, or [true, ""] with empty string at index 1 otherwise.
func builtinJSONSchemaValidate(_ BuiltinContext, operands []*ast.Term, iter func(*ast.Term) error) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: lets update the function name to builtinJSONSchemaVerify

[...]*ast.Term{ast.StringTerm("error"), ast.StringTerm(re.String())},
[...]*ast.Term{ast.StringTerm("type"), ast.StringTerm(re.Type())},
[...]*ast.Term{ast.StringTerm("field"), ast.StringTerm(re.Field())},
[...]*ast.Term{ast.StringTerm("desc"), ast.StringTerm(re.Description())},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both re.String() and re.Description()? I mean do they contain different enough that can help in debug etc. or is one them enough ? https://sourcegraph.com/github.com/open-policy-agent/opa@7b8fa09ff8fb3d447abeefa5d257fbc00e4a2ee6/-/blob/internal/gojsonschema/result.go?L159

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the string representation contains field and description, which we include in the error object; but it also includes context and value. Unless the formatting of the error string makes it much more readable, perhaps we could skip including it, and instead include context and value(?).
Although, I don't see harm in including it, as the formatting of the string might make the data more human readable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good 👍


// Check that schema is correct and parses without errors.
if _, err = gojsonschema.NewSchema(loader); err != nil {
return iter(newResultTerm(false, ast.StringTerm("jsonschema: "+err.Error())))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of returning a string here, what if we return an object with an error field describing the error similar to what http.send does?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, we're returning an array with a boolean and an error string, similar (but kinda reversed) to what graphql.parse_and_verify does. E.g.

deny[err] {
   [false, err] = jsonschema.verify(input.schema)
}

We could return an object instead, if that is preferable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine. My reasoning behind using an object was it gives us flexibility to add more fields if needed.

Signed-off-by: Johan Fylling <johan.dev@fylling.se>

// Check that schema is correct and parses without errors.
if _, err = gojsonschema.NewSchema(loader); err != nil {
return iter(newResultTerm(false, ast.StringTerm("jsonschema: "+err.Error())))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine. My reasoning behind using an object was it gives us flexibility to add more fields if needed.

@johanfylling johanfylling merged commit 334f27e into open-policy-agent:main Mar 2, 2023
@rbuckland
Copy link

Yes well done. This will be super helpful.

@hajdukda
Copy link

Is there any official documentation in OPA about using this feature ?

@jkulvich
Copy link
Contributor Author

jkulvich commented Aug 19, 2023

Is there any official documentation in OPA about using this feature ?

@hajdukda, please, follow next OPA docs sections json.match_schema, and json.verify_schema

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants