Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning printed too many times (RuntimeWarning: invalid value encountered in scalar divide ....) #364

Closed
npatki opened this issue Jun 8, 2023 · 0 comments · Fixed by #371
Assignees
Labels
bug There is an error in the code that needs to be fixed
Milestone

Comments

@npatki
Copy link

npatki commented Jun 8, 2023

Environment Details

  • SDV version: 1.2.0 (latest)
  • Python version: 3.10
  • Operating System: Darwin (MacOS)

Error Description

Sometimes, I see a RuntimeWarning repeated many times during the fitting phase.

  1. In HMASynthesizer, it interrupts the progress bar
  2. The warning is not useful to me. It seems to be related to the mathematics in copulas, so there's nothing I can do to get rid of it.

We should silence this warning since the software is still working as intended. We can consider logging it (logger.INFO) instead.

Root Cause

I suspect this is coming from the Gaussian Copula synthesizer. In this synthesizer, we are silencing warnings coming from scipy in this line. For some reason, the RuntimeWarning is still coming through.

This only appears to happen for the 'truncnorm' distribution.

Steps to reproduce

from sdv.datasets.demo import download_demo
from sdv.multi_table import HMASynthesizer

real_data, metadata = download_demo(
  modality='multi_table', dataset_name='fake_hotels')   

synthesizer = HMASynthesizer(metadata)

synthesizer.set_table_parameters(
  table_name='hotels',
  table_parameters={ 'default_distribution': 'truncnorm'})   

synthesizer.fit(real_data)

Output:

Preprocess Tables: 100%|███████████████| 2/2 [00:00<00:00, 19.58it/s]

Learning relationships:
(1/1) Tables 'hotels' and 'guests' ('hotel_id'): 100%|███████████████| 10/10 [00:01<00:00,  6.78it/s]

Modeling Tables:   0%|                                                                                                                                        | 0/1 [00:00<?, ?it/s]/Users/npatki/Documents/DataCebo/SDV/misc/env/lib/python3.10/site-packages/copulas/univariate/truncated_gaussian.py:45: RuntimeWarning: invalid value encountered in scalar divide
  a = (self.min - loc) / scale
/Users/npatki/Documents/DataCebo/SDV/misc/env/lib/python3.10/site-packages/copulas/univariate/truncated_gaussian.py:46: RuntimeWarning: divide by zero encountered in scalar divide
  b = (self.max - loc) / scale
/Users/npatki/Documents/DataCebo/SDV/misc/env/lib/python3.10/site-packages/copulas/univariate/truncated_gaussian.py:45: RuntimeWarning: divide by zero encountered in scalar divide
  a = (self.min - loc) / scale
/Users/npatki/Documents/DataCebo/SDV/misc/env/lib/python3.10/site-packages/copulas/univariate/truncated_gaussian.py:46: RuntimeWarning: invalid value encountered in scalar divide
  b = (self.max - loc) / scale
Modeling Tables: 100%|███████████████| 1/1 [00:00<00:00, 13.64it/s]
@npatki npatki added the bug There is an error in the code that needs to be fixed label Jun 8, 2023
@npatki npatki transferred this issue from sdv-dev/SDV Sep 26, 2023
@amontanez24 amontanez24 added this to the 0.10.1 milestone Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug There is an error in the code that needs to be fixed
Projects
None yet
3 participants