Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error message when trying to sample before fitting (single table) #1978

Closed
npatki opened this issue May 2, 2024 · 0 comments · Fixed by #1992
Closed

Improve error message when trying to sample before fitting (single table) #1978

npatki opened this issue May 2, 2024 · 0 comments · Fixed by #1992
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented May 2, 2024

Problem Description

If I accidentally try to sample some synthetic data before fitting my synthesizer, I currently get an error message that is not helpful in diagnosing the problem. SDV synthesizers should be able to identify that the synthesizer has not yet been fit to raise a more appropriate error message and description.

Current error message: This error indicates that sampling has been attempted.

NotFittedError: 

During handling of the above exception, another exception occurred:

NotFittedError                            Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/sdv/single_table/utils.py](https://localhost:8080/#) in handle_sampling_error(is_tmp_file, output_file_path, sampling_error)
    110 
    111     if error_msg:
--> 112         raise type(sampling_error)(error_msg + '\n' + str(sampling_error))
    113 
    114     raise sampling_error

NotFittedError: Error: Sampling terminated. Partial results are stored in a temporary file: .sample.csv.temp. This file will be overridden the next time you sample. Please rename the file if you wish to save these results.

Expected behavior

Instead of attempting to sample, the single table synthesizer should check first whether the synthesizer has been fitted. If it has not been fitted, we should proactively show a SamplingError explaining to the user what they must do.

from sdv.datasets.demo import download_demo
from sdv.single_table import GaussianCopulaSynthesizer

data, metadata = download_demo(
    modality='single_table',
    dataset_name='fake_hotel_guests'
)

synth = GaussianCopulaSynthesizer(metadata)
synth.sample(10)
SamplingError: This synthesizer has not been fitted. Please fit your synthesizer first before 
sampling synthetic data.

Additional context

This issue is meant for single-table synthesizers that require fitting -- so it applies to: GaussianCopula, CTGAN, CopulaGAN, and TVAE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants