Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure unicity on primary keys on different calls #63

Closed
ManuelAlvarezC opened this issue Sep 24, 2018 · 0 comments · Fixed by #99
Closed

Ensure unicity on primary keys on different calls #63

ManuelAlvarezC opened this issue Sep 24, 2018 · 0 comments · Fixed by #99
Assignees
Labels
internal The issue doesn't change the API or functionality question General question about the software
Milestone

Comments

@ManuelAlvarezC
Copy link
Contributor

Currently, primary keys are generated using exrex module and the regex from the meta.json file.The way it's implemented, if we sample a single time, we are guaranteed that the primary keys will be unique, however, if we sample more than once, it's possible to obtain again keys that have been returned in the previous call.

Should we ensure uniqueness in this scenario?
Note that if we do this, we will only be able to sample as many rows as different matches the regex allows, afterwards we'll need a way to reset the database before sampling anything else.

For example, if we had a dataset consisting of a single table, with a single column, which is the primary key with regex [1-5]{1}, then the following could happen:

>>>...
>>>first_samples = sampler.sample_all(num_rows=3)
>>>first_samples.T
   primary_key
0            1
1            2
2            3

# Then it's no guaranteed that if we sample a single row more, it's primary key will be neither 4 or 5
>>> second_sample = sampler.sample_all(num_rows=1)
>>> second_sample
   primary_key
0            3
@ManuelAlvarezC ManuelAlvarezC added question General question about the software internal The issue doesn't change the API or functionality labels Sep 24, 2018
@ManuelAlvarezC ManuelAlvarezC added this to the 0.1.2 milestone Apr 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal The issue doesn't change the API or functionality question General question about the software
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant