You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a user, it would be useful to be able to validate whether my metadata is formatted correctly
Expected behavior
Add validate method
Validation consists of validating three separate parts of the metadata. The full details are in the Additional context section.
Validating the columns
Validating the keys
Validating the constraints
If the metadata is not valid: Raises an InvalidMetadataError with a description of all the errors found.
>>> metadata.validate()
InvalidMetadataError: The metadata is not valid
Error: Invalid values ("pii") for datetime column "start_date".
Error: Invalid regex format string "[A-{6}"for text column "user_id"
Error: Unknown key value 'uuid'. Keys should be columns that exist in the table.
Error: A Unique constraint is being applied to column "user_id". This column is already a key for that table.
Error: Invalid increment value (0.5) in a FixedIncrements constraint. Increments must be positive integers.
Additional context
Column validation: Each sdtype has different validation rules. They are listed below
numerical
Required attributes are: representation
Throw an error if any attributes besides representation are present Error: Invalid values ("pii") for numerical column "age".
datetime
Required attributes are: datetime_format
"datetime_format" must be a valid, parsable format string Error: Invalid datetime format string "%O" for datetime column "start_date"
There should no other attributes present Error: Invalid values ("pii") for datetime column "start_date".
categorical
Required attributes are either: order or order_by
"order" and "order_by" cannot both be present. You can only have 0 or 1 of these attributes Error: Categorical column "education" has both an "order" and "order_by" attribute. Only 1 is allowed.
If present, "order_by" must be either set to "numerical_value" or "alphabetical" Error: Unknown ordering method "testing" provided for categorical column "education". Ordering method must be "numerical_value" or "alphabetical"
If present, "order" must be a list with 1 or more elements Error: Invalid order value provided for categorical column "education". The "order" must be a list with 1 or more elements.
No other attributes can be present Error: Invalid values ("pii") for categorical column "education".
boolean
No required attributes
Throw an error if any attributes are present Error: Invalid value ("pii") for boolean column "is_subscribed".
text
Required attributes are: regex_format
"regex_format" is present but the string isn't a valid regex string that can be parsed Error: Invalid regex format string "[A-{6}" for text column "user_id"
Throw an error if any other attributes are present Error: Invalid values ("pii") for text column "user_id".
Real World (Semantic) Types (ie. phone_number)
No required parameters
pii is an optional parameter
If "pii" exists but it is not True or False, throw an error Error: Invalid pii value provided for phone_number column "user_cell". The "pii" value must be set to True or False.
Throw an error if any other attributes are present Error: Invalid values ("datetime_format") for phone_number column "user_cell".
raise a warning if the sdtype isn't fully supported. Warning: sdtype 'location' is not fully supported. The SDV will model this as a categorical variable. Warning: sdtype 'location' is not fully supported. The SDV will anonymize this column using random characters.
Key validation
"primary_key" must be a string or list of strings
"sequence_key" must a string or list of strings
"alternate_keys" must be a list of strings or a nested list of strings
"sequence_index" must be a string
The strings must correspond to the column names as specified in the other part of the Metadata Error: Unknown key value 'uuid'. Keys should be columns that exist in the table.
"sequence_index" cannot be the same as "sequence_key" Error: sequence_index and sequence_key have the same value ('patient_id'). These columns must be different.
Each of the column names in "column_names" must be a column that is present in the "columns" specification Error: A Unique constraint is being applied to invalid column names ("age", "weight"). The columns must exist in the table.
"column_names" must include at least 1 column that is NOT a primary key or alternate key. Primary keys and alternate keys will already be guaranteed to be unique, so there's no need to add it in as a constraint. Error: A Unique constraint is being applied to column "age". This column is already a key for that table.
FixedCombinations
Each of the column names in "colum_names" must be a column that is present in the "columns" specification Error: A FixedCombinations constraint is being applied to invalid column names ("C", "D"). The columns must exist in the table.
Inequality
The string in "high_column_name" and "low_column_name" must be a column that is present in the "columns" specification Error: An Inequality constraint is being applied to invalid column names ("C", "D"). The columns must exist in the table.
Both high and low columns must be either type "numerical" or type "datetime" Error: An Inequality constraint is being applied to mismatched sdtypes ("C", "D"). Both columns must be either numerical or datetime.
ScalarInequality
"column_name" must refer to a column in the table Error: A ScalarInequality constraint is being applied to invalid column names ("C"). The columns must exist in the table.
The "value" must make sense based on the column type
If the column is "numerical", then "value" must be an int or float
If the column is "datetime", then "value" must be a datetime string of the right format
No other types are compatible Error: A ScalarInequality constraint is being applied to mismatched sdtypes. Numerical columns must be compared to integer or float values. Datetimes column must be compared to datetime strings.
Range
The strings in each of the column names must be a column that is present in the "columns" specification Error: A Range constraint is being applied to invalid column names ("C", "D"). The columns must exist in the table.
All columns must be either type "numerical" or type "datetime" Error: A Range constraint is being applied to mismatched sdtypes ("C", "D", "E"). All columns must be either numerical or datetime.
ScalarRange
"column_name" must refer to a column in the table Error: A ScalarRange constraint is being applied to invalid column names ("C"). The columns must exist in the table.
The high and low values must make sense based on the column type
If the column is "numerical", then the values must be floats/ints
If the column is "datetime", then the values must be a datetime string of the right format
No other types are compatible Error: A ScalarRange constraint is being applied to mismatched sdtypes. Numerical columns must be compared to integer or float values. Datetimes column must be compared to datetime strings.
Positive
"column_name" must refer to a column in the table Error: A Positive constraint is being applied to invalid column names ("C"). The columns must exist in the table.
Column name must type "numerical" Error: A Positive constraint is being applied to an invalid column ("C"). This constraint is only defined for numerical columns.
Negative
"column_name" must refer to a column in the table Error: A Negative constraint is being applied to invalid column names ("C"). The columns must exist in the table.
Column name must type "numerical" Error: A Negative constraint is being applied to an invalid column ("C"). This constraint is only defined for numerical columns.
FixedIncrements
Column name should refer to a column defined in the metadata Error: A FixedIncrements constraint is being applied to invalid column names ("C"). The columns must exist in the table.
OneHotEncoding
Column names must be valid columns (present in the "columns" part of the metadata) Error: A OneHotEncoding constraint is being applied to invalid column names ("C", "D", "E"). The columns must exist in the table.
CustomConstraint
Column names must be valid columns (present in the "columns part of the metadata) Error: A <module>.<name> constraint is being applied to invalid column names ("C", "D"). The columns must exist in the table.
Misc
If the constraint isn't found, throw an error: Error: Invalid constraints ('Other').
The text was updated successfully, but these errors were encountered:
Problem Description
As a user, it would be useful to be able to validate whether my metadata is formatted correctly
Expected behavior
validate
methodAdditional context
section.InvalidMetadataError
with a description of all the errors found.Additional context
Column validation: Each
sdtype
has different validation rules. They are listed belownumerical
representation
representation
are presentError: Invalid values ("pii") for numerical column "age".
datetime
datetime_format
Error: Invalid datetime format string "%O" for datetime column "start_date"
Error: Invalid values ("pii") for datetime column "start_date".
categorical
order
ororder_by
Error: Categorical column "education" has both an "order" and "order_by" attribute. Only 1 is allowed.
Error: Unknown ordering method "testing" provided for categorical column "education". Ordering method must be "numerical_value" or "alphabetical"
Error: Invalid order value provided for categorical column "education". The "order" must be a list with 1 or more elements.
Error: Invalid values ("pii") for categorical column "education".
boolean
Error: Invalid value ("pii") for boolean column "is_subscribed".
text
regex_format
Error: Invalid regex format string "[A-{6}" for text column "user_id"
Error: Invalid values ("pii") for text column "user_id".
Real World (Semantic) Types
(ie.phone_number
)pii
is an optional parameterError: Invalid pii value provided for phone_number column "user_cell". The "pii" value must be set to True or False.
Error: Invalid values ("datetime_format") for phone_number column "user_cell".
sdtype
isn't fully supported.Warning: sdtype 'location' is not fully supported. The SDV will model this as a categorical variable.
Warning: sdtype 'location' is not fully supported. The SDV will anonymize this column using random characters.
Key validation
Error: Unknown key value 'uuid'. Keys should be columns that exist in the table.
Error: sequence_index and sequence_key have the same value ('patient_id'). These columns must be different.
Constraint validation
_validate_inputs
method and surface those errors Add _validate_inputs class method to each constraint #878Unique
Error: A Unique constraint is being applied to invalid column names ("age", "weight"). The columns must exist in the table.
Error: A Unique constraint is being applied to column "age". This column is already a key for that table.
FixedCombinations
Error: A FixedCombinations constraint is being applied to invalid column names ("C", "D"). The columns must exist in the table.
Inequality
Error: An Inequality constraint is being applied to invalid column names ("C", "D"). The columns must exist in the table.
Error: An Inequality constraint is being applied to mismatched sdtypes ("C", "D"). Both columns must be either numerical or datetime.
ScalarInequality
Error: A ScalarInequality constraint is being applied to invalid column names ("C"). The columns must exist in the table.
Error: A ScalarInequality constraint is being applied to mismatched sdtypes. Numerical columns must be compared to integer or float values. Datetimes column must be compared to datetime strings.
Range
Error: A Range constraint is being applied to invalid column names ("C", "D"). The columns must exist in the table.
Error: A Range constraint is being applied to mismatched sdtypes ("C", "D", "E"). All columns must be either numerical or datetime.
ScalarRange
Error: A ScalarRange constraint is being applied to invalid column names ("C"). The columns must exist in the table.
Error: A ScalarRange constraint is being applied to mismatched sdtypes. Numerical columns must be compared to integer or float values. Datetimes column must be compared to datetime strings.
Positive
Error: A Positive constraint is being applied to invalid column names ("C"). The columns must exist in the table.
Error: A Positive constraint is being applied to an invalid column ("C"). This constraint is only defined for numerical columns.
Negative
Error: A Negative constraint is being applied to invalid column names ("C"). The columns must exist in the table.
Error: A Negative constraint is being applied to an invalid column ("C"). This constraint is only defined for numerical columns.
FixedIncrements
Error: A FixedIncrements constraint is being applied to invalid column names ("C"). The columns must exist in the table.
Error: A OneHotEncoding constraint is being applied to invalid column names ("C", "D", "E"). The columns must exist in the table.
CustomConstraint
Error: A <module>.<name> constraint is being applied to invalid column names ("C", "D"). The columns must exist in the table.
Error: Invalid constraints ('Other').
The text was updated successfully, but these errors were encountered: