Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-42446: Add RSP schema validation #31

Merged
merged 13 commits into from Feb 15, 2024
Merged

DM-42446: Add RSP schema validation #31

merged 13 commits into from Feb 15, 2024

Conversation

JeremyMcCormick
Copy link
Collaborator

@JeremyMcCormick JeremyMcCormick commented Jan 12, 2024

RSP schema validation is implemented using a set of extension classes in a new validation module. The validation rules were based on those in the sdm_schemas repository under tap-schema/validator.

These new rules check that:

  • All columns have a non-empty description
  • All tables have a non-empty description, a TAP_SCHEMA table index, and at least one column flagged as TAP_SCHEMA principal
  • The schema has a non-empty description and a set of unique (non-duplicated) TAP_SCHEMA table indices

Tests for the new module and its functionality were added under tests/test_validation.py.

A schema type can be selected from the command-line now using a syntax such as:

felis validate -s RSP ../sdm_schemas/yml/dp02_dc2.yaml

I tested the new validation on the above schema and found some expected errors where at least one TAP_SCHEMA principal column needed to be defined and was missing.

I have also added a flag that will make description required for all objects in the schema. It can be enabled as follows:

felis validate --require-description ../sdm_schemas/yml/dp02_dc2.yaml

This will require descriptions on every object, including the schema itself, tables, columns, and constraints. (RSP validation will only add the extra description requirement for columns.)

Copy link

codecov bot commented Jan 12, 2024

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (5837a4e) 92.62% compared to head (60060cf) 93.35%.

Files Patch % Lines
python/felis/datamodel.py 95.00% 1 Missing and 1 partial ⚠️
python/felis/validation.py 97.72% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #31      +/-   ##
==========================================
+ Coverage   92.62%   93.35%   +0.73%     
==========================================
  Files          18       20       +2     
  Lines        1952     2092     +140     
  Branches      381      420      +39     
==========================================
+ Hits         1808     1953     +145     
+ Misses         87       80       -7     
- Partials       57       59       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@JeremyMcCormick JeremyMcCormick marked this pull request as draft January 12, 2024 23:29
@JeremyMcCormick JeremyMcCormick marked this pull request as ready for review January 13, 2024 00:31
@JeremyMcCormick JeremyMcCormick force-pushed the tickets/DM-42446 branch 2 times, most recently from cf991f3 to e3f16f3 Compare January 13, 2024 00:33
@JeremyMcCormick JeremyMcCormick force-pushed the tickets/DM-42446 branch 2 times, most recently from 1234303 to bd48618 Compare January 29, 2024 22:34
@JeremyMcCormick JeremyMcCormick force-pushed the tickets/DM-42446 branch 6 times, most recently from 823f4bc to 5f2f878 Compare January 29, 2024 23:49
@JeremyMcCormick JeremyMcCormick removed the request for review from timj January 30, 2024 00:00
@JeremyMcCormick JeremyMcCormick force-pushed the tickets/DM-42446 branch 9 times, most recently from ad492fc to 4249f76 Compare February 2, 2024 21:26
@JeremyMcCormick JeremyMcCormick force-pushed the tickets/DM-42446 branch 3 times, most recently from 9e31cdd to e475f12 Compare February 12, 2024 19:54
This adds a new validation module which initially contains a schema
extension for RSP data. The rules were modeled after the script in
the sdm_schemas repository under tap-schema/validator. The base
Pydantic data model classes are sub-classed to implement additional
requirements.
The compatible and read_compatible fields were updated to be empty
lists by default rather than None.
The first argument in the validators was changed to type Any for all
relevant functions. This is the recommended pattern from the Pydantic
documentation.
The List type is considered immutable, which means polymorphism does
not work for typing. Instead use the Sequence type which is not
considered immutable and accepts child classes in the declaration.
By default, the description field is optional. A flag was added to
the BaseObject.Config class to allow a user to make the field required.
A command-line option was added supporting this feature.
This allows objects within the schema to be retrieved by their ID using
subscripting. Whether an ID is present in the schema can be checked
using "id in schema" syntax.
This type requires that a description be non-empty after it is stripped
of whitespace. Several tests were added to confirm that this works
correctly for RSP columns and tables.
The Pydantic documentation indicates that using the nested Config class
is deprecated in favor of using ConfictDict. The flag for requiring a
description is moved into its own renamed nested class called
ValidationConfig.
Always strip whitespace from all strings and use Field rather than
constr for defining description fields.
Copy link

@gpdf gpdf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for responses to comments!

@JeremyMcCormick JeremyMcCormick merged commit 12de453 into main Feb 15, 2024
10 checks passed
@JeremyMcCormick JeremyMcCormick deleted the tickets/DM-42446 branch February 15, 2024 00:03
Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor comments, but I see that you merged already, so it is all fine.

python/felis/cli.py Show resolved Hide resolved
python/felis/cli.py Show resolved Hide resolved
python/felis/datamodel.py Show resolved Hide resolved
python/felis/datamodel.py Show resolved Hide resolved
python/felis/datamodel.py Show resolved Hide resolved
python/felis/datamodel.py Show resolved Hide resolved
python/felis/validation.py Show resolved Hide resolved
python/felis/validation.py Show resolved Hide resolved
python/felis/validation.py Show resolved Hide resolved
tests/test_cli.py Show resolved Hide resolved
JeremyMcCormick added a commit that referenced this pull request Feb 15, 2024
This make changes suggested in review comments here:

#31 (review)

The PR was closed and merged before the review occurred, so this is
being done on a separate branch.
JeremyMcCormick added a commit that referenced this pull request Feb 15, 2024
This make changes suggested in review comments here:

#31 (review)

The PR was closed and merged before the review occurred, so this is
being done on a separate branch.
JeremyMcCormick added a commit that referenced this pull request Feb 15, 2024
This make changes suggested in review comments here:

#31 (review)

The PR was closed and merged before the review occurred, so this is
being done on a separate branch.
@JeremyMcCormick
Copy link
Collaborator Author

Few minor comments, but I see that you merged already, so it is all fine.

These should all be resolved on this PR.

JeremyMcCormick added a commit that referenced this pull request Feb 15, 2024
This make changes suggested in review comments here:

#31 (review)

The PR was closed and merged before the review occurred, so this is
being done on a separate branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants