New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-42446: Add RSP schema validation #31
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #31 +/- ##
==========================================
+ Coverage 92.62% 93.35% +0.73%
==========================================
Files 18 20 +2
Lines 1952 2092 +140
Branches 381 420 +39
==========================================
+ Hits 1808 1953 +145
+ Misses 87 80 -7
- Partials 57 59 +2 ☔ View full report in Codecov by Sentry. |
2c86670
to
175602b
Compare
cf991f3
to
e3f16f3
Compare
1234303
to
bd48618
Compare
823f4bc
to
5f2f878
Compare
ad492fc
to
4249f76
Compare
9e31cdd
to
e475f12
Compare
fb7f697
to
2ff6a17
Compare
This adds a new validation module which initially contains a schema extension for RSP data. The rules were modeled after the script in the sdm_schemas repository under tap-schema/validator. The base Pydantic data model classes are sub-classed to implement additional requirements.
The compatible and read_compatible fields were updated to be empty lists by default rather than None.
The first argument in the validators was changed to type Any for all relevant functions. This is the recommended pattern from the Pydantic documentation.
The List type is considered immutable, which means polymorphism does not work for typing. Instead use the Sequence type which is not considered immutable and accepts child classes in the declaration.
By default, the description field is optional. A flag was added to the BaseObject.Config class to allow a user to make the field required. A command-line option was added supporting this feature.
This allows objects within the schema to be retrieved by their ID using subscripting. Whether an ID is present in the schema can be checked using "id in schema" syntax.
Requested by @gpdf in: #31 (comment)
This type requires that a description be non-empty after it is stripped of whitespace. Several tests were added to confirm that this works correctly for RSP columns and tables.
The Pydantic documentation indicates that using the nested Config class is deprecated in favor of using ConfictDict. The flag for requiring a description is moved into its own renamed nested class called ValidationConfig.
Always strip whitespace from all strings and use Field rather than constr for defining description fields.
2ff6a17
to
60060cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for responses to comments!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor comments, but I see that you merged already, so it is all fine.
This make changes suggested in review comments here: #31 (review) The PR was closed and merged before the review occurred, so this is being done on a separate branch.
This make changes suggested in review comments here: #31 (review) The PR was closed and merged before the review occurred, so this is being done on a separate branch.
This make changes suggested in review comments here: #31 (review) The PR was closed and merged before the review occurred, so this is being done on a separate branch.
These should all be resolved on this PR. |
This make changes suggested in review comments here: #31 (review) The PR was closed and merged before the review occurred, so this is being done on a separate branch.
RSP schema validation is implemented using a set of extension classes in a new validation module. The validation rules were based on those in the sdm_schemas repository under
tap-schema/validator
.These new rules check that:
Tests for the new module and its functionality were added under
tests/test_validation.py
.A schema type can be selected from the command-line now using a syntax such as:
I tested the new validation on the above schema and found some expected errors where at least one TAP_SCHEMA principal column needed to be defined and was missing.
I have also added a flag that will make description required for all objects in the schema. It can be enabled as follows:
This will require descriptions on every object, including the schema itself, tables, columns, and constraints. (RSP validation will only add the extra description requirement for columns.)