Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validate method to MultiTableMetadata #888

Closed
amontanez24 opened this issue Jul 8, 2022 · 0 comments
Closed

Add validate method to MultiTableMetadata #888

amontanez24 opened this issue Jul 8, 2022 · 0 comments
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@amontanez24
Copy link
Contributor

Problem Description

As a user, it would be useful to make sure the multi-table metadata I've specified is formatted correctly.

Expected behavior

  • Add a validate method
  • The validation should have 2 steps
    1. First, run the SingleTableMetadata.validate method for each table and group the error messages.
    2. Validate the relationships (described in detail in the Additional context section)
  • If any errors show it should raise a InvalidMetadataError with all of the errors grouped by table
>>> metadata.validate()
InvalidMetadataError: The metadata is not valid

Table: 'users'
Error: Invalid values ("pii") for datetime column "start_date".
Error: A Unique constraint is being applied to column "age". This column is already a key for that table.

Table: 'transactions'
Error: Invalid regex format string "[A-{6}" for text column "transaction_id"
Error: Unknown key value 'ttid'. Keys should be columns that exist in the table.
Error: Invalid increment value (0.5) in a FixedIncrements constraint. Increments must be positive integers.

Relationships:
Error: Relationship between tables ('users', 'transactions') contains an unknown primary key 'userr_id'.
Error: The relationships in the dataset are disjointed. Tables ('sessions') are not connected to any of the other tables.

Additional context

Relationship validation

  • "parent_table_name" and "child_table_name" should refer to tables that are defined in the "tables" portion of the metadata
    Error: Relationship contains an unknown table 'userss'.

  • "parent_primary_key" value should be the same as "primary_key" in the table definition
    Error: Relationship between tables ('users', 'transactions') contains an unknown primary key 'userr_id'.

  • "child_foreign_key" should refer to column(s) that are defined in the "tables" portion of the metadata
    Error: Relationship between tables ('users', 'transactions') contains an unknown foreign key 'transactions_idd'.

  • The length of "child_foreign_key" should match the length of "parent_primary_key" (you cannot have a single column foreign key referencing a multi-column composite primary key)
    Error: Relationship between tables ('users', 'transactions') is invalid. Primary key has length 2 but the foreign key has length 1.

  • The sdtype & attributes of the primary key should exactly match the sdtype & attributes of the foreign key
    Error: Relationship between tables ('users', 'transactions') is invalid. The primary and foreign key columns are not the same type.

    • If it's a composite key then match by the order in the list; eg. check that the 0-index has the same definitions, then the 1-index
  • The relationships should be acyclic (eg. you cannot have a cycle of dependencies A → B → C → A)
    Error: The relationships in the dataset describe a circular dependency between tables ('users', 'sessions', 'transactions').

  • All tables in the metadata must be connected. Only need to throw a max of 1 error of this type.
    Error: The relationships in the dataset are disjointed. Tables ('users') are not connected to any of the other tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants