Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate enum classes as part of DataSchema #118

Open
koperagen opened this issue Jun 6, 2022 · 1 comment
Open

Generate enum classes as part of DataSchema #118

koperagen opened this issue Jun 6, 2022 · 1 comment
Labels
research This requires a deeper dive to gather a better understanding
Milestone

Comments

@koperagen
Copy link
Collaborator

Imagine a CSV with a "day_of_week" column with string values like "monday", "friday", etc. If you could convert this column to an enum, you could use the help of completion to, for example, filter it.

It can be done the same way as generating data schemas:

  1. after cell execution in the notebooks
  2. on data schema import in gradle project

There are some design questions:

  1. What if i don't need an enum?

  2. What about normalization? "monday", "Monday" aren't the same thing.
    In jupyter, you can normalize values however you want and get a nice enum.
    In gradle project code generation happens once, in build time, so your values have to be normalized. How?

  3. How many values in the enum is too much?

  4. What if not all possible values are present in the column? What should happen if generated schema knows about 2 enum values, but the actual column at runtime has more?

@zaleslaw zaleslaw added this to the 0.11.0 milestone Apr 25, 2023
@zaleslaw zaleslaw self-assigned this Apr 25, 2023
@zaleslaw zaleslaw added the research This requires a deeper dive to gather a better understanding label Apr 25, 2023
@zaleslaw zaleslaw modified the milestones: 0.11.0, Backlog Apr 25, 2023
@zaleslaw
Copy link
Collaborator

I need to answer this question

@zaleslaw zaleslaw removed their assignment Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
research This requires a deeper dive to gather a better understanding
Projects
None yet
Development

No branches or pull requests

2 participants