Do not include the original real data in the trained model .pkl file #1156
Labels
feature request
Request for a new feature
resolution:resolved
The issue was fixed, the question was answered, etc.
A further consideration would be to not include the original real data in the trained model .pkl file. If a user only needs to supply final synthetic data, for example in the form of a .csv, then it is not a problem. But if they wish to supply a trained model .pkl file to another user so they can generate however much synthetic data they want, then it is a potential problem that the original real PII data is accessible from the .pkl
Here is an example that replicates the point
which outputs:
containing the original names ["Peter", "John", "Mary", "Susan"]
Originally posted by @PJPRoche in #439 (comment)
The text was updated successfully, but these errors were encountered: