Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In upgrade_metadata index primary keys should be converted to text #1318

Closed
npatki opened this issue Mar 17, 2023 · 0 comments
Closed

In upgrade_metadata index primary keys should be converted to text #1318

npatki opened this issue Mar 17, 2023 · 0 comments
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Mar 17, 2023

Environment Details

  • SDV version: 1.0.0b0
  • Python version: 3.7
  • Operating System: Linux (Colab Notebook)

Background

I have the following metadata (written in the old format, pre-SDV 1.0):

{
  "primary_key": "user_id",
  "fields": {
    "user_id": { "type": "id", "subtype": "integer" },
    ...
  }
}

This conveys that the user id is an index that increments 0, 1, 2, ...

Observed

When I call updgrade_metadata, I get the following:

{
  "primary_key": "user_id",
  "columns": {
    "user_id": { "sdtype": "numerical" },
    ...
  }
}

Technically this works, but it is weird to make this a "numerical" sdtype because it's not actually a statistically valid numerical column. (taking an average, correlation, etc. of this column does not make sense).

Expected

Instead, I propose the upgrade_metadata script should instead make this a "text" sdtype with a Regex that generates indices. This follows our expectation: ID columns will have structured text.

{
  "primary_key": "user_id",
  "columns": {
    "user_id": { "sdtype": "text", "regex_format": "\d{30}" },
    ...
  }
}

When applied to an int column, the synthesizer will return integer values that increase with every row: 0, 1, 2, ... (up to a max of 10^30 rows, which is enough for most purposes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants