You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, primary keys are generated using exrex module and the regex from the meta.json file.The way it's implemented, if we sample a single time, we are guaranteed that the primary keys will be unique, however, if we sample more than once, it's possible to obtain again keys that have been returned in the previous call.
Should we ensure uniqueness in this scenario?
Note that if we do this, we will only be able to sample as many rows as different matches the regex allows, afterwards we'll need a way to reset the database before sampling anything else.
For example, if we had a dataset consisting of a single table, with a single column, which is the primary key with regex [1-5]{1}, then the following could happen:
>>>...
>>>first_samples=sampler.sample_all(num_rows=3)
>>>first_samples.Tprimary_key011223# Then it's no guaranteed that if we sample a single row more, it's primary key will be neither 4 or 5>>>second_sample=sampler.sample_all(num_rows=1)
>>>second_sampleprimary_key03
The text was updated successfully, but these errors were encountered:
Currently, primary keys are generated using
exrex
module and the regex from themeta.json
file.The way it's implemented, if we sample a single time, we are guaranteed that the primary keys will be unique, however, if we sample more than once, it's possible to obtain again keys that have been returned in the previous call.Should we ensure uniqueness in this scenario?
Note that if we do this, we will only be able to sample as many rows as different matches the regex allows, afterwards we'll need a way to reset the database before sampling anything else.
For example, if we had a dataset consisting of a single table, with a single column, which is the primary key with regex
[1-5]{1}
, then the following could happen:The text was updated successfully, but these errors were encountered: