Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quality report is printing out a long warning message (hundreds of lines) #448

Closed
npatki opened this issue Sep 13, 2023 · 0 comments · Fixed by #449
Closed

Quality report is printing out a long warning message (hundreds of lines) #448

npatki opened this issue Sep 13, 2023 · 0 comments · Fixed by #449
Assignees
Labels
bug Something isn't working
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Sep 13, 2023

Environment Details

  • SDMetrics version: 0.11.0 (latest)
  • Python version: 3.10
  • Operating System: Linux (Colab Notebook)

Error Description

When running the quality report, the "Column Pair Trends" property is now producing a warning message that goes on for hundreds of lines. It has to do with converting datetime columns to Unix timestamps.

Steps to reproduce

The code below uses the SDV to evaluate quality, which calls the Quality Report.

from sdv.evaluation.multi_table import evaluate_quality
from sdv.datasets.demo import download_demo
from sdv.metadata import MultiTableMetadata
from sdv.multi_table import HMASynthesizer

data, metadata = download_demo(
    modality='multi_table',
    dataset_name='fake_hotels'
)

metadata.update_column(
    table_name='guests',
    column_name='credit_card_number',
    sdtype='credit_card_number'
)

synth = HMASynthesizer(metadata)
synth.fit(data)
synthetic_data = synth.sample()

quality_report = evaluate_quality(
    real_data=data,
    synthetic_data=synthetic_data,
    metadata=metadata)

Output:

Generating report ...
(1/3) Evaluating Column Shapes: : 100%|██████████| 15/15 [00:00<00:00, 345.64it/s]
(2/3) Evaluating Column Pair Trends: :  20%|██        | 11/55 [00:00<00:00, 95.07it/s]/usr/local/lib/python3.10/dist-packages/sdmetrics/reports/single_table/_properties/column_pair_trends.py:58: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '[1609027200000000000 1609286400000000000 1600300800000000000
 1609113600000000000 1586044800000000000 1602979200000000000
 1606003200000000000 1583280000000000000 1578268800000000000
 1579651200000000000 1592179200000000000 1603411200000000000
 1583539200000000000 1597190400000000000 1601510400000000000
 1595116800000000000 1594684800000000000 1583107200000000000
 1594857600000000000 1590883200000000000 1590796800000000000
 1602547200000000000 1590105600000000000 1599955200000000000
 1595808000000000000 1606694400000000000 1598400000000000000
 1593907200000000000 1607385600000000000 1604534400000000000
...
 '] has dtype incompatible with datetime64[ns], please explicitly cast to a compatible dtype first.
  data.loc[~pd.isna(data[column_name]), column_name] = pd.to_numeric(
/usr/local/lib/python3.10/dist-packages/sdmetrics/reports/single_table/_properties/column_pair_trends.py:58: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '[1609113600000000000 1609372800000000000 1600473600000000000

Additional Context

We think this is because we are only overriding the non-null items to numerical values. Meanwhile, the null items are still left as pd.NaT which is a datetime type. This makes the column mixed type (datetime and numeric)

@npatki npatki added the bug Something isn't working label Sep 13, 2023
@amontanez24 amontanez24 added this to the 0.11.1 milestone Sep 13, 2023
@amontanez24 amontanez24 self-assigned this Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants