Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better data validation message for auto_assign_transformers #1509

Closed
npatki opened this issue Jul 18, 2023 · 1 comment · Fixed by #2021
Closed

Better data validation message for auto_assign_transformers #1509

npatki opened this issue Jul 18, 2023 · 1 comment · Fixed by #2021
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Jul 18, 2023

Problem Description

If I use the auto_assign_transformers functionality with invalid data*, then I receive an error that doesn't really make sense.

*Invalid data is any data that does not match the metadata

from sdv.single_table import GaussianCopulaSynthesizer
from sdv.metadata import SingleTableMetadata
import numpy as np
import pandas as pd

metadata = SingleTableMetadata.load_from_dict({
    'columns': {
        'a': { 'sdtype': 'categorical' },
    }
})

synthesizer = GaussianCopulaSynthesizer(metadata)

# input data that does not match the metadata
data = pd.DataFrame({'b': list(np.random.choice(['M', 'F'], size=10)) })
synthesizer.auto_assign_transformers(data)

Output:

AttributeError: 'NoneType' object has no attribute 'get'

Expected behavior

I expect an error that is more descriptive to the problem. We should re-use the error message from using fit on invalid data.

synthesizer.fit(data)
InvalidDataError: The provided data does not match the metadata:
The columns ['b'] are not present in the metadata.

The metadata columns ['a'] are not present in the data.

Additional context

It appears that fit (and fit_processed_data) are actually running a validation check between the data and metadata. It seems that the auto_assign_transformers method is NOT running the check.

Should we run the check in this method? If so, maybe the fit functions don't need it (since they internally call this method first).

@npatki npatki added the feature request Request for a new feature label Jul 18, 2023
@srinify
Copy link
Contributor

srinify commented Apr 2, 2024

Potentially related to #1883

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants