Is it possible to generate data with new set of primary keys? #686

abimarticio · 2022-01-21T05:10:58Z

I want to generate a synthetic data that I can append to my original data. Is there a way to exclude the primary keys that is already in the original data when generating synthetic data?

I want to generate a large number of data, but I have limited resources so I am planning to generate small sets of data and just append it to my original data.

npatki · 2022-01-27T17:29:13Z

Hi @abimarticio,

Is there a way to exclude the primary keys that is already in the original data when generating synthetic data?

This isn't explicitly supported, so I've filed #697 for tracking progress.

In the meantime, have found that the default work for most cases: If your primary key is a string, the SDV will generate 'a', 'b', 'c', etc. by default. (If it's numerical, it'll generate 0, 1, 2,...). Is this causing conflicts for you?

Other options:

Write metadata and specify a different regex for the primary key (see guide)
Manually overwrite the column after sampling with whatever you want

I want to generate a large number of data, but I have limited resources so I am planning to generate small sets of data and just append it to my original data.

I have filed #693 for handling batch sampling internally -- so you would not have to do this manually. You can follow that issue for updates.

Let me know if that helps or if you have any follow ups!

abimarticio · 2022-01-28T07:44:08Z

Hi @npatki! Thank you very much for the reply. This is really helpful.

For the meantime what I did was overwrite the column after sampling and create mapping for the foreign keys. But I will also try the first option you suggested. And will also wait and follow the issue you filed.

Thank you very much!

npatki · 2022-02-01T18:52:08Z

Glad it helped! I'll close this issue in favor of the new feature request. Feel free to re-open if you have any follow ups.

npatki added the question General question about the software label Jan 27, 2022

npatki added the under discussion Issue is currently being discussed label Jan 27, 2022

npatki removed the under discussion Issue is currently being discussed label Feb 1, 2022

npatki closed this as completed Feb 1, 2022

katxiao added this to the 0.14.0 milestone Mar 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to generate data with new set of primary keys? #686

Is it possible to generate data with new set of primary keys? #686

abimarticio commented Jan 21, 2022

npatki commented Jan 27, 2022 •

edited

Loading

abimarticio commented Jan 28, 2022

npatki commented Feb 1, 2022

Is it possible to generate data with new set of primary keys? #686

Is it possible to generate data with new set of primary keys? #686

Comments

abimarticio commented Jan 21, 2022

npatki commented Jan 27, 2022 • edited Loading

abimarticio commented Jan 28, 2022

npatki commented Feb 1, 2022

npatki commented Jan 27, 2022 •

edited

Loading