-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update for Avro 1.10 #4
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed review! A few questions mainly for my own edification.
super.merge(default_value) | ||
extensions = {} | ||
extensions[:default] = field.default if field.default? | ||
if field.respond_to?(:aliases) && field.aliases && !field.aliases.empty? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this respond_to?
check if we're only supporting Avro 1.10+?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct we can skip this one for Avro 1.10+
# TODO: Override normalize_named_type once the Avro Ruby gem supports aliases | ||
# def normalized_named_type(schema, attributes = {}) | ||
def add_logical_type(schema, serialized) | ||
if schema.respond_to?(:logical_type) && schema.logical_type == DECIMAL_LOGICAL_TYPE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this respond_to?
check? Looks like that method exists on Avro::Schema
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're correct here too, we can drop it.
def add_logical_type(schema, serialized) | ||
if schema.respond_to?(:logical_type) && schema.logical_type == DECIMAL_LOGICAL_TYPE | ||
extensions = { logicalType: DECIMAL_LOGICAL_TYPE } | ||
extensions[:precision] = schema.precision if schema.respond_to?(:precision) && schema.precision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the respond_to?
checks on this line and the next line because decimals can be Avro::Schema::BytesSchema
or Avro::Schema::FixedSchema
right? Do we need to handle the Avro::Schema::FixedSchema#size
attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixed size
attribute is already handled as part of the standard schema normalization: https://github.com/apache/avro/blob/master/lang/ruby/lib/avro/schema_normalization.rb#L54
And yes, we need the respond_to?
checks because the fixed type supports the decimal logical type in the Avro spec, but this is not yet implemented in the Ruby library. The other option would to check that the logical_type
is decimal
and the type
is bytes
for now. I implemented it this way so that no update would be required to support fixed decimals once the Ruby implementation catches up.
end | ||
end | ||
|
||
def normalize_named_type(schema, attributes = {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the respond_to?
checks in here if schema
will be a Avro::Schema::NamedSchema
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The respond_to?(:default)
check is necessary because only the enum schema has a default. I implemented it this way to be permissive if a default
attribute is added to other schemas in the future.
It could instead be changed to check that schema.type_sym = :enum
.
The check for the :fullname_aliases
is not necessary if we are only supporting Avro 1.10+.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! There was no rush on it.
super.merge(default_value) | ||
extensions = {} | ||
extensions[:default] = field.default if field.default? | ||
if field.respond_to?(:aliases) && field.aliases && !field.aliases.empty? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct we can skip this one for Avro 1.10+
# TODO: Override normalize_named_type once the Avro Ruby gem supports aliases | ||
# def normalized_named_type(schema, attributes = {}) | ||
def add_logical_type(schema, serialized) | ||
if schema.respond_to?(:logical_type) && schema.logical_type == DECIMAL_LOGICAL_TYPE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're correct here too, we can drop it.
def add_logical_type(schema, serialized) | ||
if schema.respond_to?(:logical_type) && schema.logical_type == DECIMAL_LOGICAL_TYPE | ||
extensions = { logicalType: DECIMAL_LOGICAL_TYPE } | ||
extensions[:precision] = schema.precision if schema.respond_to?(:precision) && schema.precision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixed size
attribute is already handled as part of the standard schema normalization: https://github.com/apache/avro/blob/master/lang/ruby/lib/avro/schema_normalization.rb#L54
And yes, we need the respond_to?
checks because the fixed type supports the decimal logical type in the Avro spec, but this is not yet implemented in the Ruby library. The other option would to check that the logical_type
is decimal
and the type
is bytes
for now. I implemented it this way so that no update would be required to support fixed decimals once the Ruby implementation catches up.
end | ||
end | ||
|
||
def normalize_named_type(schema, attributes = {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The respond_to?(:default)
check is necessary because only the enum schema has a default. I implemented it this way to be permissive if a default
attribute is added to other schemas in the future.
It could instead be changed to check that schema.type_sym = :enum
.
The check for the :fullname_aliases
is not necessary if we are only supporting Avro 1.10+.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, has there been any movement on getting canonical resolution added to the Ruby Avro library? Is there an issue we can link to from this gem's README?
Good question. I opened a ticket for it a while ago: https://issues.apache.org/jira/browse/AVRO-2258 There is another ticket (https://issues.apache.org/jira/browse/AVRO-2299) that I believe aims to generalize the normalization. I brought up the resolution canonical form in that context. That has been hanging out for a while. I should probably take a look again: apache/avro#805 |
Ruby Avro v1.10 now supports some features that affect resolution: aliases, enum defaults, decimal logical types.
I debated maintaining compatibility with earlier versions, but this gem has changed very little. Instead I think it makes sense to focus on just the current major Avro release.
As a result of these changes, some schemas could get a different fingerprint. But since these features were not previously supported by Ruby, it is unlikely that many schemas will be affected if only Ruby is used with Avro.
If the updated canonical form/fingerprint is used with a schema registry, then the impact may be that an additional version is registered for an existing schema but the change should not introduce any incompatibility between the versions.
If a schema was previously registered with a compatibility break, then depending on how the "after compatibility" was set, it may be necessary to register a compatibility break with the new fingerprint.
The tests for this gem are pretty repetitive between the canonical form and the fingerprint. I kept the separation for now, but it may make sense to consolidate them in a later change.