Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If using regex to generate values, scramble them #1921

Closed
npatki opened this issue Apr 17, 2024 · 0 comments · Fixed by #1932
Closed

If using regex to generate values, scramble them #1921

npatki opened this issue Apr 17, 2024 · 0 comments · Fixed by #1932
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Apr 17, 2024

Problem Description

For any column that has sdtype id with a provided regex, the SDV currently generates the regex in sequential (alphanumeric) order. The resulting data doesn't look realistic.

Expected behavior

For any id columns with a regex, ensure that the synthetic data does not have values in sequential order. A simple way to do this is to scramble them.

In technical terms: We assign RDT's RegexGenerator to accomplish this. By default, we should assign the RegexGenerator with generation_order='scrambled' to these columns.

Additional context

  • This change only applies if a column is sdtype 'id' AND there is a user-provided 'regex_format'. If there is no regex provided in the metadata, then no changes will be made for this issue
  • This issue depends on RDT changes to add the generation_order parameter. See RDT issue #800
@npatki npatki added the feature request Request for a new feature label Apr 17, 2024
@amontanez24 amontanez24 added this to the 1.12.1 milestone Apr 17, 2024
@amontanez24 amontanez24 self-assigned this Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants