Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecordField default_to_avro produces non serializable output #382

Closed
dada-engineer opened this issue Aug 5, 2023 · 9 comments · Fixed by #383
Closed

RecordField default_to_avro produces non serializable output #382

dada-engineer opened this issue Aug 5, 2023 · 9 comments · Fixed by #383

Comments

@dada-engineer
Copy link
Contributor

dada-engineer commented Aug 5, 2023

Describe the bug
When having a nested model the avro_schema function of the model fails if the nested attribute contains a non json serializable field e.g. datetime.

To Reproduce

import dataclasses
from dataclasses_avroschema import AvroModel
from datetime import datetime

@dataclasses.dataclass
class Metadata(AvroModel):
     event_time: datetime = dataclasses.field(default_factory=datetime.utcnow, default=dataclasses.MISSING)

@dataclasses.dataclass
class Record(AvroModel):
    metadata: Metadata = dataclasses.field(default_factory=lambda: Metadata(), default=dataclasses.MISSING)

Record()
# Out: Record(metadata=Metadata(event_time=datetime.datetime(2023, 8, 5, 7, 32, 27, 798392)))

Record.avro_schema()

# Fails with: TypeError: Object of type datetime is not JSON serializable

Expected behavior
Should produce the avor schema with default of datetime as string

As for my understanding it is due to RecordField.default_to_avro simply calling the to_dict function on the value provided.
As far as I can tell the default_to_avro is only called on rendering, so it should be fine to do json.loads(value.to_json()) here.

Edit: fixed suggested fix to use to_json()

@dada-engineer
Copy link
Contributor Author

Maybe not so urgent as this one should think if a dynamic value such as the result of datetime.now (probably often a default_factory) is really what you want in the schema (changes every time it is generated)

@michal-rogala
Copy link
Contributor

michal-rogala commented Aug 9, 2023

We are seeing the same issue with static Decimal fields. It stopped working on 0.45.0 -> 0.45.1 version change.

Our schema:

 minimum_step: Decimal = Field(
        default=Decimal("0.10"),
    )

The 'default' key of OrderedDict AVRO schema contains:
'minimum_step': Decimal('0.10'),

which is not JSON serializable

@dada-engineer
Copy link
Contributor Author

@michal-rogala what would you expect to happen? Parse as string into avro as it is more reliable than a float or render a float from decimal? Is the decimal type rendering a number?

@michal-rogala
Copy link
Contributor

String is the most reliable way - I believe this issue is only about schema definition, right? I tested your fix and it works in our case too, but I didn't check what was the decimal representation there.

@dada-engineer
Copy link
Contributor Author

I am sorry, checked avro spec and this projects spec and both are rendering it to bytes, I have no issues using decimal defaults. can you comment an example that raises?

@michal-rogala
Copy link
Contributor

Yeah, I believe this bug can occur when there is a several nested records and the lowest one has a default decimal value. I tried to follow an execution trace and while individual sub-records schema are generated properly - the "stiched" final one has this issue.

@michal-rogala
Copy link
Contributor

@dabdada

ok, I did some testing and the problem happens when there is a default factory specified for a subrecord:

class Level1(AvroBaseModel):
    minimum_step: Decimal = Field(
        default=Decimal("0.10"),
    )

class Level2(AvroBaseModel):
    l1: Level1 = Field(default_factory=Level1)

print(Level2.avro_schema())

@dada-engineer
Copy link
Contributor Author

K thank you. Can you check if this is fixed by my branch as well? I do believe so

@michal-rogala
Copy link
Contributor

yes, it seems so :)

dada-engineer added a commit to dada-engineer/dataclasses-avroschema that referenced this issue Aug 14, 2023
marcosschroh added a commit that referenced this issue Aug 15, 2023
* fix: record fields avro schema creation (#382)

* fix: render avro data types correctly for nested default values (#382)

* fix: convert record field default values to avro compatable value (#382)

* chore: use field_map to get recordfield value default (#382)

---------

Co-authored-by: Marcos Schroh <2828842+marcosschroh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants