Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi table quality report should handle multi-foreign keys (to same parent) #406

Closed
npatki opened this issue Jul 26, 2023 · 0 comments · Fixed by #495
Closed

Multi table quality report should handle multi-foreign keys (to same parent) #406

npatki opened this issue Jul 26, 2023 · 0 comments · Fixed by #495
Assignees
Labels
data:multi-table Related to multi-table, relational datasets feature:reports Related to any of the generated reports feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Jul 26, 2023

Problem Description

Currently the Cardinality property in the multi-table quality report assumes that there is only 1 connection between every parent and child table. This is not always true.

It's possible that a child table has multiple foreign keys that point to the same primary key column in the parent. For example: I can have a parent table banks and a child table transactions. Then for bank-to-bank transactions, there should be 2 foreign keys in transactions that point point to banks (they represent the payor and payee).

Expected behavior

The Quality Report should be updated to account for this case.

In get_details, we expect to show a DataFrame for each breakdown. This table should include a Foreign Key column to distinguish relationships that have the same parent and child tables. (Note that we can still use table_name to select the portions of the dataframe that match either the parent or child table.)

image

In get_visualization, each bar is currently labeled with child and parent. We should also update it with the name of the foreign key. Eg. transactions (payor) -> banks

image
@npatki npatki added feature request Request for a new feature new Label applied to new issues data:multi-table Related to multi-table, relational datasets feature:reports Related to any of the generated reports and removed new Label applied to new issues labels Jul 26, 2023
@amontanez24 amontanez24 added this to the 0.12.0 milestone Sep 14, 2023
@npatki npatki removed this from the 0.12.0 milestone Oct 24, 2023
@amontanez24 amontanez24 added this to the 0.13.0 milestone Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data:multi-table Related to multi-table, relational datasets feature:reports Related to any of the generated reports feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants