If no filepath is provided, do not create a file during sample
#2042
Labels
feature request
Request for a new feature
feature:sampling
Related to generating synthetic data after a model is built
Milestone
Problem Description
As described in issue #1310, the single table
sample
method has the following functionality:output_filepath
, the sampling function will write to the filepath after every batch, ensuring that you do not lose previously sampled data if there is a crashoutput_filepath
, then the sampling function will still save batched results to.sample.csv.temp
(still want to ensure that you have access to some data in the event of a crash). Once the sampling completes, we delete this file.Problems:
output_filepath=None
, SDV creates a temporary file anyway. This is unintuitive for users who assume thatNone
means that no output filepath will be provided. It is also unintuitive that SDV deletes the file after-the-fact..sample.csv.temp
, which is unexpected. This was meant to give the user some data rather than nothing. But practice there are always 0 rows in this file, because the defaultbatch_size
is kept equal to the sample size.Expected behavior
output_filepath=None
, we should no longer create or write to a temporary file. This will align with the user expectations and it will enable more forms of usage. In the event of a crash, update the error message. There will no longer be a file to reference.output_filepath
is provided, we should create and periodically write to the file as we do today. In the event of a crash, the error message should continue to ask the user to check the file.Default: We can make the default
None
. In the future, we should figure out a better default forbatch_size
for very large samples. In such a case, it would make sense to have a output filepath supplied as a default.The text was updated successfully, but these errors were encountered: