Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avro_schema method does not respect original ordering of unions or enums #413

Closed
robbertvanwaveren opened this issue Sep 7, 2023 · 6 comments

Comments

@robbertvanwaveren
Copy link

Describe the bug
The ordering in enums and unions is directly used to serialize into the proper bytes, just like the ordering of fields.
However it seems ordering is not respected.

To Reproduce

Generate a model from the following avro schema and use the avro_schema method to retrieve back the schema. I expect the fields, symbols and types in their arrays to remain ordered exactly the same. If not, the resulting avro will not be binary compatible with the original schema.

{
  "type": "record",
  "name": "TestModel",
  "fields": [
    {
      "name": "enum_field",
      "type": {
        "type": "enum",
        "name": "SomeEnum",
        "symbols": [
          "a",
          "A",
          "B",
          "b"
        ],
        "default": "Unknown"
      }
    },
    {
      "name": "optional_field",
      "type": [
        "null",
        "string"
      ],
      "default": null
    }
  ]
}

However it seems enum symbols are ordered by casing and union types are ordered randomly.

Expected behavior
See above

@marcosschroh
Copy link
Owner

I can not reproduce this issue.

The schema is wrong, as the default Unknown is not include in the symbols. In any case if you change to a proper default the generated model matches with the schema.

schema:

from dataclasses_avroschema import ModelGenerator

model_generator = ModelGenerator()

schema = {
  "type": "record",
  "name": "TestModel",
  "fields": [
    {
      "name": "enum_field",
      "type": {
        "type": "enum",
        "name": "SomeEnum",
        "symbols": [
          "a",
          "A",
          "B",
          "b"
        ],
        "default": "a"
      }
    },
    {
      "name": "optional_field",
      "type": [
        "null",
        "string"
      ],
      "default": None
    }
  ]
}

result = model_generator.render(schema=schema)

print(result)

generates:

from dataclasses_avroschema import AvroModel
import dataclasses
import enum
import typing


class SomeEnum(enum.Enum):
    A = "A"
    B = "B"
    a = "a"
    b = "b"


@dataclasses.dataclass
class TestModel(AvroModel):
    enum_field: SomeEnum = SomeEnum.A
    optional_field: typing.Optional[str] = None

Then getting the schema matches the initial one:

print(TestModel.avro_schema_to_python())

{'type': 'record', 'name': 'TestModel', 'fields': [{'name': 'enum_field', 'type': {'type': 'enum', 'name': 'SomeEnum', 'symbols': ['A', 'B', 'a', 'b']}, 'default': 'A'}, {'name': 'optional_field', 'type': ['null', 'string'], 'default': None}]}

Are you using the latest version @robbertvanwaveren ?

@marcosschroh
Copy link
Owner

Closing it because it is not reproducible

@robbertvanwaveren
Copy link
Author

If you look at the generated schema or python file you'll see that the ordering of the enum changed from aABb to ABab. This means that when serialized the value B becomes 0x02 instead of 0x04 and deserialization of that avro with the original schema will interpret the sent value as A.

The same happens with unions, but that seems to occur less frequent.

@marcosschroh
Copy link
Owner

Ah I see @robbertvanwaveren

Working in the fix

@marcosschroh
Copy link
Owner

With the latest changes should be fixed for enums.

@robbertvanwaveren
Copy link
Author

I will check it out asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants