Enable Batch Sampling + Progress Bar #693

npatki · 2022-01-27T17:01:56Z

Problem Description

Sampling the full number of rows is most efficient, but batch sampling is a useful feature for progress tracking & memory consumption. Let's enable batch sampling for each of our sampling methods (#690, #691, #692).

Expected behavior

In sample():

Add batch_size param (default: same as num_rows to yield only 1 batch)

In sample_conditions() and sample_remaining_columns():

Batch size determined by existing batch_size_per_try param

For all methods:

Add output_file_path: Name of file to write to (default: None)
Show progress bar when sampling
Periodically write to output_file_path. If None, then periodically write to a temp file.

# works for all methods: sample, sample_conditions, sample_remaining_columns

# show progress bar while sampling
# write to a temp file that we can later delete
>>> synthetic_data = model.sample(num_rows=1000)
76%|████████████████████████████         | 756/1000 [00:33<00:10, 229.00it/s]

# write to file path while also returning the samples; show progress
>>> synthetic_data = model.sample(num_rows=1000, output_file_path="./results/sample.csv")
76%|████████████████████████████         | 756/1000 [00:33<00:10, 229.00it/s]

Error States

When the system crashes or the user exits in the middle of sampling.

# works for all methods: sample, sample_conditions, sample_remaining_columns

# Partial results available in requested file path
>>> synthetic_data = model.sample(output_file_path='./results/synthetic.csv')

76%|████████████████████████████         | 756/1000 [00:33<00:10, 229.00it/s]
^C
Error: Sampling terminated. Partial results are stored in './results/synthetic.csv'

# If no file path, partial results are in a temp file
# Temp file will be overwritten on next sample, so tell the user to save it
>>> synthetic_data = model.sample()
^C
Error: Sampling terminated. Partial results are stored in a temporary file: '.sample.csv.temp'.
This file will be overridden the next time you sample. Rename the file if you wish to save
these results.

The text was updated successfully, but these errors were encountered:

npatki added feature request Request for a new feature data:single-table Related to tabular datasets labels Jan 27, 2022

npatki mentioned this issue Jan 27, 2022

Is it possible to generate data with new set of primary keys? #686

Closed

katxiao mentioned this issue Feb 10, 2022

Enable batch sampling #709

Merged

katxiao mentioned this issue Feb 28, 2022

Add default output file behavior #719

Merged

katxiao self-assigned this Mar 3, 2022

katxiao closed this as completed Mar 4, 2022

katxiao added this to the 0.14.0 milestone Mar 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Batch Sampling + Progress Bar #693

Enable Batch Sampling + Progress Bar #693

npatki commented Jan 27, 2022 •

edited

Loading

Enable Batch Sampling + Progress Bar #693

Enable Batch Sampling + Progress Bar #693

Comments

npatki commented Jan 27, 2022 • edited Loading

Problem Description

Expected behavior

Error States

npatki commented Jan 27, 2022 •

edited

Loading