Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing error message if the user forgets to add a sequence_key when using PARSynthesizer #1883

Closed
srinify opened this issue Apr 2, 2024 · 0 comments · Fixed by #1909
Closed
Assignees
Labels
data:sequential Related to timeseries datasets feature request Request for a new feature
Milestone

Comments

@srinify
Copy link

srinify commented Apr 2, 2024

Problem Description

If a user tries to run preprocess() or fit() using PARSynthesizer without specifying a sequence_key, they currently get a less useful error:

Screenshot 2024-04-02 at 11 30 56 AM

Expected behavior

We should instead be returning:

SynthesizerInputError: The PARSythesizer is designed for multi-sequence data, identifiable through a sequence key. Your metadata does not include a sequence key.

Additional context

Code to reproduce this error.

from sdv.sequential import PARSynthesizer
from sdv.metadata.single_table import SingleTableMetadata
import pandas as pd

data_json = {
    'transaction_date': ['2024-01-01', '2024-02-01', '2024-03-01', '2024-01-01', '2024-02-01', '2024-03-01'],
    'key': [0, 0, 0, 1, 1, 1]
}

df1 = pd.DataFrame(data_json)
metadata1 = SingleTableMetadata()
metadata1.detect_from_dataframe(df1)

synth1 = PARSynthesizer(metadata1)
synth1.preprocess(df1)
@srinify srinify added feature request Request for a new feature new Automatic label applied to new issues data:sequential Related to timeseries datasets labels Apr 2, 2024
@srinify srinify changed the title Better error message if the user forgets to add a sequence_key when using PARSynthesizer Missing error message if the user forgets to add a sequence_key when using PARSynthesizer Apr 2, 2024
@npatki npatki removed the new Automatic label applied to new issues label Apr 4, 2024
@amontanez24 amontanez24 added this to the 1.12.0 milestone Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data:sequential Related to timeseries datasets feature request Request for a new feature
Projects
None yet
4 participants