-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add meta schema: a JSON Schema which matches recap schemas #346
Conversation
tests/spec/invalid_field_type.json
Outdated
{ | ||
"name": "string_field", | ||
"doc": "doc", | ||
"type": "invalidtype", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included this in the initial draft PR to show what the failing test UX is like:
E jsonschema.exceptions.ValidationError: {'name': 'string_field', 'doc': 'doc', 'type': 'invalidtype', 'bytes': 255} is not valid under any of the given schemas
E
E Failed validating 'oneOf' in schema['items']:
E {'oneOf': [{'allOf': [{'$ref': '#/$defs/RecapNull'},
E {'$ref': '#/$defs/RecapCommonFields'}]},
E {'allOf': [{'$ref': '#/$defs/RecapBool'},
E {'$ref': '#/$defs/RecapCommonFields'}]},
E {'allOf': [{'$ref': '#/$defs/RecapInt'},
E {'$ref': '#/$defs/RecapCommonFields'}]},
E {'allOf': [{'$ref': '#/$defs/RecapFloat'},
E {'$ref': '#/$defs/RecapCommonFields'}]},
E {'allOf': [{'$ref': '#/$defs/RecapString'},
E {'$ref': '#/$defs/RecapCommonFields'}]},
E {'allOf': [{'$ref': '#/$defs/RecapBytes'},
E {'$ref': '#/$defs/RecapCommonFields'}]},
E {'allOf': [{'$ref': '#/$defs/RecapList'},
E {'$ref': '#/$defs/RecapCommonFields'}]},
E {'allOf': [{'$ref': '#/$defs/RecapMap'},
E {'$ref': '#/$defs/RecapCommonFields'}]},
E {'allOf': [{'$ref': '#/$defs/RecapStruct'},
E {'$ref': '#/$defs/RecapCommonFields'}]},
E {'allOf': [{'$ref': '#/$defs/RecapEnum'},
E {'$ref': '#/$defs/RecapCommonFields'}]},
E {'allOf': [{'$ref': '#/$defs/RecapUnion'},
E {'$ref': '#/$defs/RecapCommonFields'}]}]}
E
E On instance[0]:
E {'bytes': 255,
E 'doc': 'doc',
E 'name': 'string_field',
E 'type': 'invalidtype'}
.venv/lib/python3.10/site-packages/jsonschema/validators.py:1302: ValidationError
==================================================================== short test summary info =====================================================================
FAILED tests/unit/meta_schema/test_json_schema.py::test_valid_schemas[tests/spec/invalid_field_type.json] - jsonschema.exceptions.ValidationError: {'name': 'string_field', 'doc': 'doc', 'type': 'invalidtype', 'bytes': 255} is not valid under any of the given schemas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Can you add functionality to let us ignore expected failed tests (i.e. xfail)? I think it complicates the logic in test_json_schema.py, but it'll be useful as we build out and validate the harness, I think. If it's a pain, though, we can hack it (e.g. .json.failing filename)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the harness to find .unimplemented.invalid.json
files and expect that they will throw as invalid schemas, but mark that test as expected to fail while unimplemented.
@@ -0,0 +1,377 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this JSON Schema is reviewed and approved in this PR, I will remove this from this PR and add it to recap-website/specs/type/0.2.0.json
, then switch from a file reference to a network call.
I'm including it in this PR draft to make it easier to iterate on the spec and tests in parallel.
tests/0.2.0.json
Outdated
"$ref": "#/$defs/RecapNull" | ||
}, | ||
{ | ||
"$ref": "#/$defs/RecapCommonFields" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose to put the common fields alias
and doc
in its own def instead of repeating them in each of the base types (like RecapNull
here).
I also chose to put the allOf
in this RecapType
oneOf
union instead of inside each of the base types. I did this both to mirror the DRY way the existing spec introduces these fields, and to emphasize the unique fields when looking at the base types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up repeating myself so that the error messages are easier to read when validation fails.
tests/0.2.0.json
Outdated
"minimum": 1, | ||
"maximum": 64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the correct ranges for some of these integer values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integer bits
field is a 32-bit signed int, so min is 1 and max is 2_147_483_647.
Same for bits
in float.
bytes
for string and bytes is 64-bit signed. The attributes
section in the spec has some info there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated max for these fields
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I have represented Logical types in this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No prob.
I think there are two things to do.
- Allow "logical" to be defined as a string attribute in RecapCommonFields
- Add defs for built-in logical types (the ones defined in the spec)
I think (1) is worth doing in this PR. I think (2) should be put in a follow-on GH issue. I think it requires a bit more thought, since other non-builtin logical types might exist, and those wouldn't get validated (beyond just checking that logical
is set).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done 1 and made a note to make an issue for 2 after this PR is finished
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good. I left a couple comments here and there, but I agree with your decisions.
One thing I didn't catch that I think is a common gotcha: disallowing alias
to be set when type
reference an alias. See the "But alias inheritance is not allowed" section of the spec for an example.
tests/spec/invalid_field_type.json
Outdated
{ | ||
"name": "string_field", | ||
"doc": "doc", | ||
"type": "invalidtype", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Can you add functionality to let us ignore expected failed tests (i.e. xfail)? I think it complicates the logic in test_json_schema.py, but it'll be useful as we build out and validate the harness, I think. If it's a pain, though, we can hack it (e.g. .json.failing filename)
One other thought: I could see an argument for being more explicit about the recap types (removing allOf/oneOf), even if it means repeating yourself. The argument for this would be that it makes the JSON schema way more readable to humans. I'm indifferent on that argument at this point, but I could see the spec being a bit confusing to humans that are trying to read (vs. validators like |
Re: the |
…nd end range ints
…d of factored out for better error messages, fix the definition based on examples
Did this by not repeating the
I ended up doing as I was iterating on this because the validation errors were harder to understand. |
This is awesome 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 This looks great.
Only request:
- Can you move unit/meta_schema/test_json_schema.py to tests/spec
- And then move spec/*.json to a subdir like
schemas
or something? - And then update the .github/workflows/ci.yaml to have an extra job called
spec-tests
that runspdm run pytest tests/spec -vv
I want to keep the spec enforcement separate from the standard unit tests.
…he same as jsonschema; make aliases FQNs
I also realized that these tests should test that recap is validating the schemas the same way as the JSON Schema, so I added the recap validation to each. This made me realize that there's no reason we shouldn't take every positive and negative validation example in the recap test suites and evaluate that against the jsonschema validation as well. Given the current structure of the tests, how feasible does that sound? |
Ahh, yes I've been thinking about this. What I was thinking was to replace the proto/avro/json schema/recap CST tests with this suite. You've already begun to define Recap CSTs (in JSON). We should be able to define matching .proto, .avsc, and .schema.json files that we expect to get when we call types.py :: .from_dict followed by *Converter.from_recap(). We could then go the other way as well, from proto/avsc/json schema back to Recap's CST to verify bidirectional conversion is as expected. This doesn't let us verify every detail we might want to verify at the AST level, so I think we'll still want to have some unit tests for that part. It should get us VERY comfortable with Recap's behavior, though. WDYT? Not for this PR, but more GH issues to open if you're into it. |
💥 Merged! Thanks!! Follow-on things I found:
|
Another thought: this test suite could be extended to read specs defined in YAML or TOML. There ¿might? be some situations where those languages handle values differently like |
Yup, totally. I think it's worth defining in YAML, TOML, and JSON, and validating the CST using the python-jsonschema lib. One gotcha with TOML, in particular, is that it doesn't have a null AFAIK. Makes null defaults tricky. At least we added |
What
This adds a JSON Schema describing valid recap schemas.
Why
There is a robust ecosystem around JSON Schema that provides tools like validation, example generation, code generation, etc out of the schema definitions. Defining a JSON Schema definition for Recap will allow Recap and its users to more easily hook into that ecosystem.
Closes #342
How
I manually created the JSON Schema file by reading closely the Recap Spec.
Known limitations
As mentioned in the issue, I don't know of a way to require that the
alias
strings are referencing defined types elsewhere in the document, so some invalid documents will pass. No documents that are valid should fail.Validation
In order to help iterate on the JSON Schema definition (which I expect isn't perfect), I test up some unit tests. I define a few valid recap schema documents and use jsonschema to test the documents against the schema definition.
Potential future work