-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
default_factory values leaking into compiled schema #400
Comments
@michal-rogala that is the expected behavior. You should not do it if you want always the same default value (which I expect you want). If you want a default value which is a default uuid just return the |
What we want to achieve is to have an application automatically fill non-optional values - like UUID or timestamps (which cannot be static by design) when object is being generated. Those values should not propagate to schema "default" field because it does not make sense in that case (like timestamp). One way to work around this is to fill those values in custom constructor but it does not scale with objects with thousands nested fields. Do you have an idea how to achieve that? Maybe we could implement annotation to field to skip adding those values to schema. |
@michal-rogala Ah now I understand what you need. There is not a proper way to do it, but: Do you register the schema every time that you encounter a new class with a different default value? I am asking because even though the default value changes every time still should be possible to |
So far we don't use schema registry - I was wondering if this behaviour is normal - passing factory as default value implies value is dynamic (uuid/timestamp, etc) and putting it into default value in schema seemed odd. I believe this issue can be closed now :). Thanks for explanation. |
You're welcome
I agree that is odd but if you do not register a schema every time then it is fine. Even if the field has a default value in the schema it won't be a problem because you also have it in the If you register a schema, then having a default value on fields like updated_on ( Let's close this issue for now. If you or someone else experience another issue or believes that this is not the right way to go let's re open it. |
Didn't take too long to reopen this :). Now with serious use case for that. Background: schemaless_read/write does not support backward schema compatibility - adding optional field to the schema results with fastavro exception when trying to deserialize older message. This is a known limitation and design problem with fastavro. Solution to that is to use regular fastavro read/write which embeds original writer schema along with message. I deliberately omit solution with schema repository because it requires additional infrastructure and schema fields indicating version - and with existing systems it's often too late. And according to "flawed" Confluent docs - such compatibility should be achievable without schema repository. Using Problem: Using custom serialize/deserialize methods using fastavro read/write is possible - but as schema contains unnecessary "default" fields (timestamps, uuids, etc) embedded schema is larger - adding considerable overhead to message size. Solution: when using default_factory, have an option to decide if this value should be included in the schema |
We can add an extra option to the |
yes, this would would work - or an option to define it at field level - if you have nested dataclasses it would be difficult to manage Meta class. |
That sounds good but it would make much more sense to configure that in the Field, whether the default_factory value can be exported. |
Hello! I've encountered the same issue and found this discussion helpful. 😉
That's true, that everything goes smoothly if schema is published only once. However, in my own (albeit unrepresentative) experience with living in Kubernetes or its analogs, this behavior will just continuously bump the schema version to an excessive level. This is because pods are supposed to be restarted frequently, and each restart will publish a new default value. Moreover, the same behavior occurs for any restarts of the process with the producer (e.g., local runs against a development cluster). In the case of Kafka, the producer has to push its schema to the schema registry after initialization to get the schema ID. As far as I know, this behavior is default for Java producers and the confluent-kafka-python producer. So, I agree with @michal-rogala. @marcosschroh, what is the specific use case for utilizing the value from the factory? In my past projects, I used Would it be better to retain the option to initialize the default value for the schema from the factory, but turn it off by default? |
Hi all, Sorry for the delay. The reason to have Example: Have a schema with @dataclasses.dataclass
class UserAdvance(AvroModel):
...
favourites_numbers: typing.List[int] = dataclasses.field(default_factory=lambda: [7, 13]) resulting in: {
"type": "record",
"name": "UserAdvance",
"fields": [
...
{
"name": "favourites_numbers",
"type": {
"type": "array",
"items": "long",
"name": "favourites_number"
},
"default": [7, 13]
}
],
} Then the question is: Should If end users want to have a default value for Because it is not possible to use Then:
Example: @dataclasses.dataclass
class UUIDLogicalTypes(AvroModel):
event_uuid: uuid.UUID = dataclasses.field(default_factory=uuid.uuid4)
In [9]: UUIDLogicalTypes()
Out[9]: UUIDLogicalTypes(event_uuid=UUID('c0c48c38-6cf1-4fb8-ab13-dd3e9563dfd9')) The resulting schema must be: "type": "record",
"name": "UUIDLogicalTypes",
"fields": [
{
"name": "uuid_1",
"type": {"type": "string", "logicalType": "uuid"}
}
]
}
@dataclasses.dataclass
class UUIDLogicalTypes(AvroModel):
event_uuid: uuid.UUID = uuid.UUID('c0c48c38-6cf1-4fb8-ab13-dd3e9563dfd9') What do you think? |
Thank you for the detailed answer! Now I see what the case is: Then, applying And I am strongly agree on types like My only concern then remains whether the same behavior should be applied to from typing import List
from dataclasses_avroschema import AvroBaseModel
from pydantic import Field
# This just works and should be passed to schema
class PydanticUserAdvance(AvroBaseModel):
...
favourites_numbers: List[int] = Field(default=[7, 13])
...
# At first glance, it should work as in dataclasses, but why schema generator should consider special cases
# to skip/pass default-factory's value to the schema?
class PydanticUserAdvance(AvroBaseModel):
...
favourites_numbers: List[int] = Field(default_factory=lambda: [7, 13]) What are your thoughts on this? |
I think it won't be a problem Dataclasses:
Pydantic:
This leads us to: Usually, for For |
Hi all, I tried to follow the approach that I describe I followed the marshmallow_dataclass approach, which is what @joaoe suggested. It will be possible to use the config Let me know whether this satisfy your requirements. |
Non-optional field with default_factory leaks default value into compiled schema:
results in schema:
As default value is dynamically generated - it makes no sense to have it statically compiled into schema. Also - does having "default" value baked in, make the field optional from application point of view?
cc: @ddevlin
The text was updated successfully, but these errors were encountered: