Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with case sensitivity in enums #399

Closed
robbertvanwaveren opened this issue Aug 28, 2023 · 5 comments
Closed

Issue with case sensitivity in enums #399

robbertvanwaveren opened this issue Aug 28, 2023 · 5 comments

Comments

@robbertvanwaveren
Copy link

Describe the bug
python Fields that represent enums values are translated directly to UPPERCASE regardless of their original casing.
This results in duplicate fields if original values from the schema only differ in casing.

I encounted this is issue in our from standards derived avro models that use SI units that have quite some cases where this is an issue like M(ega) vs m(illi), M(onth) vs m(inute) etc.

{
  "name": "UnitMultiplier",
  "type": "enum",
  "symbols": [
    "p",
    "P",
     ....
  ]
}

results in

class UnitMultiplier(enum.Enum):
    P = "p"
    P = "P"
    ... 

Perhaps rename the non-original uppercase P field to 'P_' or something when duplicates are found.

To Reproduce
See above

Expected behavior
Support for case-sensitive uniqueness such that any valid avro schema results in a valid generated python model.

@marcosschroh
Copy link
Owner

Hi @robbertvanwaveren

It makes sense. I think we should use the values that are in the symbols directly without any transformation. Currently what is happening is that we are doing an uppercase of the key

@marcosschroh
Copy link
Owner

Looking at the enum module documentation the keys are always in uppercase (which makes sense) so we should keep doing the same. If we go with your solution of adding _ we can have the situations like:

{
  "name": "UnitMultiplier",
  "type": "enum",
  "symbols": [
    "p",
    "P",
    "P_",
     ....
  ]
}

and the result should be:

class UnitMultiplier(enum.Enum):
    P = "p"
    P_ = "P"
    P__ = "P_"
    ... 

which is confusing. Ideally symbols should be proper defined and descriptive but sometimes we can not do anything.
I will think another strategy.

@marcosschroh
Copy link
Owner

After taking a look the only way to solve this is using the symbol as it is in the schema when it is repeated. Following your example the generated enum will be:

class UnitMultiplier(enum.Enum):
    P = "P"
    p = "p"
    ... 

For future references to other uses: Avoid this because it is hard to understand and the enum won't generate the proper uppercase

The documentation: https://marcosschroh.github.io/dataclasses-avroschema/pr-preview/pr-402/model_generator/#enums-and-case-sensitivity

@robbertvanwaveren
Copy link
Author

I'm afraid your fix only works if the uppercase symbol comes before the lowercase.

marcosschroh added a commit that referenced this issue Aug 31, 2023
Co-authored-by: Marcos Schroh <marcos.schroh@kpn.com>
@marcosschroh
Copy link
Owner

@robbertvanwaveren The next release is fixing it! Thanks for reporting it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants