Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ColumnPairTrends score depends on the data index #582

Closed
R-Palazzo opened this issue Jun 7, 2024 · 0 comments · Fixed by #594
Closed

ColumnPairTrends score depends on the data index #582

R-Palazzo opened this issue Jun 7, 2024 · 0 comments · Fixed by #594
Assignees
Labels
bug Something isn't working
Milestone

Comments

@R-Palazzo
Copy link
Contributor

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • SDMetrics version: 0.14.1

Error Description

By design, the metrics and property score should be independent of the indexes of the real and synthetic data.
However, this is currently not the case for the ColumnPairTrends property as shown below. The issue comes from the discretization step when numerical and datetime columns are converted to categorical, the indexes are not preserved.

Steps to reproduce

The code below should output a metric score of 1.0 since real and synthetic data are the same (the only have different indexes).

import pandas as pd
from sdmetrics.reports.single_table._properties import ColumnPairTrends

real_data = pd.DataFrame({
    'A': [1, 2, 3],
    'B': ['a', 'b', 'a']
}, index=[0, 1, 2])

synthetic_data = pd.DataFrame({
    'A': [1, 2, 3],
    'B': ['a', 'b', 'a']
}, index=[0, 4, 2])

metadata = {
    'columns': {
        'A': {'sdtype': 'numerical'},
        'B': {'sdtype': 'categorical'}
    }
}

property = ColumnPairTrends()
property._generate_details(real_data, synthetic_data, metadata)

The current output is:

Screenshot 2024-06-07 at 10 42 22
@R-Palazzo R-Palazzo added the bug Something isn't working label Jun 7, 2024
@R-Palazzo R-Palazzo changed the title ColumnPairTrends property depends on the data index ColumnPairTrends score depends on the data index Jun 7, 2024
@R-Palazzo R-Palazzo self-assigned this Jun 20, 2024
@R-Palazzo R-Palazzo modified the milestones: 0.14.1, 0.14.2 Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant