Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional Sampling: batch_size is being set to None by default? #889

Closed
npatki opened this issue Jul 8, 2022 · 0 comments · Fixed by #901
Closed

Conditional Sampling: batch_size is being set to None by default? #889

npatki opened this issue Jul 8, 2022 · 0 comments · Fixed by #901
Labels
bug Something isn't working data:single-table Related to tabular datasets feature:sampling Related to generating synthetic data after a model is built
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Jul 8, 2022

Environment Details

  • SDV version: 0.16.0.dev1

Error Description

There is a warning printed out when the model is unable to conditional sample all the expected rows. It seems to indicate that the batch_size parameter is being set to None by default. This isn't very illustrative of what's happening.

(This may be related to #886.)

Sampling conditions:  96%|█████████▌| 192/200 [01:21<00:03,  2.37it/s]
/usr/local/lib/python3.7/dist-packages/sdv/tabular/utils.py:216: UserWarning: Only able to sample 192 rows
for the given conditions. To sample more rows, try increasing `max_tries_per_batch` (currently: 100) or
increasing `batch_size` (currently: None. Note that increasing these values will also increase the sampling time.
  warnings.warn(user_msg)

Expected Output

The warning message should be more descriptive. Changing the batch_size won't solve this issue bc we will not set it to be larger than num_rows.

Sampling conditions:  96%|█████████▌| 192/200 [01:21<00:03,  2.37it/s]
UserWarning: Only able to sample 192 rows for the given conditions. To sample more rows, try increasing 
`max_tries_per_batch` (currently: 100). Note that increasing this value will also increase the sampling time. warnings.warn(user_msg)
@npatki npatki added bug Something isn't working data:single-table Related to tabular datasets feature:sampling Related to generating synthetic data after a model is built labels Jul 8, 2022
@pvk-developer pvk-developer added this to the 0.16.0 milestone Jul 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data:single-table Related to tabular datasets feature:sampling Related to generating synthetic data after a model is built
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants