Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update TableFormat metric to TableStructure + fix its computation #518

Closed
npatki opened this issue Nov 14, 2023 · 0 comments · Fixed by #499
Closed

Update TableFormat metric to TableStructure + fix its computation #518

npatki opened this issue Nov 14, 2023 · 0 comments · Fixed by #499
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Nov 14, 2023

Problem Description

This metric is checking the names of the columns as well as the dtypes of the data. The dual-purpose is confusing, especially since (a) it makes the computation more difficult, and (b) there are other metrics that look for the validity of the data.

Let's simplify the metric.

Expected behavior

  1. Rename the metric to TableStructure
  2. Stop checking for dtypes in the this metric. We only need to check for the column names.
    • We can get rid of the parameter for ignoring dtypes
    • The 'Structure' property in the Diagnostic Report no longer has to compute or pass the parameter in anymore
  3. Fix the computation to be
score = (R intersection S) / (R union S)

Where R is the set of real column names and S is the set of synthetic column names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants