Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visualize cardinality of foreign key columns #283

Closed
npatki opened this issue Dec 14, 2022 · 1 comment · Fixed by #384
Closed

Visualize cardinality of foreign key columns #283

npatki opened this issue Dec 14, 2022 · 1 comment · Fixed by #384
Assignees
Labels
feature request Request for a new feature

Comments

@npatki
Copy link
Contributor

npatki commented Dec 14, 2022

I'm filing this issue on behalf of a user request on our Slack.

Problem Description

Currently, users are able to plot the data in statistical columns such as numerical, categorical, etc. (utils.get_relationship_plot) only supports columns that are numerical, categorical, boolean or datetime.

It would be nice to support a visualization for the foreign key/primary key relationship -- when it comes to the cardinality.

Expected behavior

Create a new visualization utils.get_cardinality_plot. This should plot the cardinality (# of children) that each parent row has, colored by real vs. synthetic data.

Parameters:

  • (required) real_data: A dictionary mapping each table name to a pandas.DataFrame containing the data. This dictionary corresponds to real data.
  • (required) synthetic_data: A dictionary mapping each table name to a pandas.DataFrame containing the data. This dictionary corresponds to synthetic data.
  • (required) child_table_name: The string name of the child table
  • (required) parent_table_name: The string name of the parent table
  • (required) child_foreign_key: The string name of the child's foreign key column (that links to the parent's primary key)
  • (required) metadata: A dictionary of Multi Table Metadata

Output: A plotly.Figure object with a bar graph. The graph shows the # of children that each parent row has. The color represents real vs. synthetic data.

from sdmetrics.reports import utils

fig = utils.get_cardinality_plot(
    real_data=real_tables,
    synthetic_data=synthetic_tables,
    parent_table_name='users',
    child_table_name='transactions',
    child_foreign_key='user_id',
    metadata=my_multi_table_metadata
)

fig.show()
@npatki npatki added the feature request Request for a new feature label Dec 14, 2022
@npatki npatki changed the title Visualize cardinality when given foreign key columns Visualize cardinality of foreign key columns Dec 20, 2022
@npatki
Copy link
Contributor Author

npatki commented Jun 29, 2023

Example

See an example below for the final visualization.

image

Code is available in this private notebook

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants