Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summarized metadata visualization #1525

Closed
amontanez24 opened this issue Aug 1, 2023 · 0 comments · Fixed by #1534
Closed

Summarized metadata visualization #1525

amontanez24 opened this issue Aug 1, 2023 · 0 comments · Fixed by #1534
Labels
feature:metadata Related to describing the dataset feature request Request for a new feature
Milestone

Comments

@amontanez24
Copy link
Contributor

amontanez24 commented Aug 1, 2023

Description

In many enterprise datasets, there are a very large number of columns. It is infeasible to print out or even visualize the full list of columns. In this case, it would be nice to summarize the metadata in a concise way, allowing to you to see, at-a-glance, what it contains.

API

In visualize, update the show_table_details parameter to accept 3 values:

  • (default) 'full': Show the full column names and sdtype (equivalent to today's show_table_details=True)
  • None: Don't show any of the column names (equivalent to today's show_table_details=False)
  • 'summarized': Show the number of columns for each sdtype (new!)
metadata.visualize(show_table_details='summarized')

Output can show only the total number of columns of each type. Something like this (not exactly!).
image

Other Details

  • The main sdtype are: id, numerical, categorical, datetime and boolean.
  • The user may have other sdtypes the represent PII values. Count these all under other.

Backwards Compatibility

If a user still inputs show_table_details=True or show_table_details=False, do not crash. Instead:

  1. Convert it to the corresponding new parameter name ('full' or None) and
  2. Throw a FutureWarning
# if show_table_details=True
FutureWarning: Using True or False for 'show_table_details' is deprecated. Use table_details='full' to show all table details.
# if show_table_details=False
FutureWarning: Using True or False for 'show_table_details' is deprecated. Use table_details=None to hide table details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:metadata Related to describing the dataset feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant