Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update for Avro 1.10 #4

Merged
merged 3 commits into from
Jul 20, 2020
Merged

Update for Avro 1.10 #4

merged 3 commits into from
Jul 20, 2020

Conversation

tjwp
Copy link
Collaborator

@tjwp tjwp commented Jul 10, 2020

Ruby Avro v1.10 now supports some features that affect resolution: aliases, enum defaults, decimal logical types.

I debated maintaining compatibility with earlier versions, but this gem has changed very little. Instead I think it makes sense to focus on just the current major Avro release.

As a result of these changes, some schemas could get a different fingerprint. But since these features were not previously supported by Ruby, it is unlikely that many schemas will be affected if only Ruby is used with Avro.

If the updated canonical form/fingerprint is used with a schema registry, then the impact may be that an additional version is registered for an existing schema but the change should not introduce any incompatibility between the versions.

If a schema was previously registered with a compatibility break, then depending on how the "after compatibility" was set, it may be necessary to register a compatibility break with the new fingerprint.

The tests for this gem are pretty repetitive between the canonical form and the fingerprint. I kept the separation for now, but it may make sense to consolidate them in a later change.

@tjwp tjwp requested a review from jturkel July 16, 2020 13:31
Copy link
Member

@jturkel jturkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed review! A few questions mainly for my own edification.

super.merge(default_value)
extensions = {}
extensions[:default] = field.default if field.default?
if field.respond_to?(:aliases) && field.aliases && !field.aliases.empty?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this respond_to? check if we're only supporting Avro 1.10+?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct we can skip this one for Avro 1.10+

# TODO: Override normalize_named_type once the Avro Ruby gem supports aliases
# def normalized_named_type(schema, attributes = {})
def add_logical_type(schema, serialized)
if schema.respond_to?(:logical_type) && schema.logical_type == DECIMAL_LOGICAL_TYPE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this respond_to? check? Looks like that method exists on Avro::Schema.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're correct here too, we can drop it.

def add_logical_type(schema, serialized)
if schema.respond_to?(:logical_type) && schema.logical_type == DECIMAL_LOGICAL_TYPE
extensions = { logicalType: DECIMAL_LOGICAL_TYPE }
extensions[:precision] = schema.precision if schema.respond_to?(:precision) && schema.precision
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the respond_to? checks on this line and the next line because decimals can be Avro::Schema::BytesSchema or Avro::Schema::FixedSchema right? Do we need to handle the Avro::Schema::FixedSchema#size attribute?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixed size attribute is already handled as part of the standard schema normalization: https://github.com/apache/avro/blob/master/lang/ruby/lib/avro/schema_normalization.rb#L54

And yes, we need the respond_to? checks because the fixed type supports the decimal logical type in the Avro spec, but this is not yet implemented in the Ruby library. The other option would to check that the logical_type is decimal and the type is bytes for now. I implemented it this way so that no update would be required to support fixed decimals once the Ruby implementation catches up.

end
end

def normalize_named_type(schema, attributes = {})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the respond_to? checks in here if schema will be a Avro::Schema::NamedSchema?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The respond_to?(:default) check is necessary because only the enum schema has a default. I implemented it this way to be permissive if a default attribute is added to other schemas in the future.

It could instead be changed to check that schema.type_sym = :enum.

The check for the :fullname_aliases is not necessary if we are only supporting Avro 1.10+.

Copy link
Collaborator Author

@tjwp tjwp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! There was no rush on it.

super.merge(default_value)
extensions = {}
extensions[:default] = field.default if field.default?
if field.respond_to?(:aliases) && field.aliases && !field.aliases.empty?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct we can skip this one for Avro 1.10+

# TODO: Override normalize_named_type once the Avro Ruby gem supports aliases
# def normalized_named_type(schema, attributes = {})
def add_logical_type(schema, serialized)
if schema.respond_to?(:logical_type) && schema.logical_type == DECIMAL_LOGICAL_TYPE
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're correct here too, we can drop it.

def add_logical_type(schema, serialized)
if schema.respond_to?(:logical_type) && schema.logical_type == DECIMAL_LOGICAL_TYPE
extensions = { logicalType: DECIMAL_LOGICAL_TYPE }
extensions[:precision] = schema.precision if schema.respond_to?(:precision) && schema.precision
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixed size attribute is already handled as part of the standard schema normalization: https://github.com/apache/avro/blob/master/lang/ruby/lib/avro/schema_normalization.rb#L54

And yes, we need the respond_to? checks because the fixed type supports the decimal logical type in the Avro spec, but this is not yet implemented in the Ruby library. The other option would to check that the logical_type is decimal and the type is bytes for now. I implemented it this way so that no update would be required to support fixed decimals once the Ruby implementation catches up.

end
end

def normalize_named_type(schema, attributes = {})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The respond_to?(:default) check is necessary because only the enum schema has a default. I implemented it this way to be permissive if a default attribute is added to other schemas in the future.

It could instead be changed to check that schema.type_sym = :enum.

The check for the :fullname_aliases is not necessary if we are only supporting Avro 1.10+.

Copy link
Member

@jturkel jturkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, has there been any movement on getting canonical resolution added to the Ruby Avro library? Is there an issue we can link to from this gem's README?

@tjwp
Copy link
Collaborator Author

tjwp commented Jul 20, 2020

Out of curiosity, has there been any movement on getting canonical resolution added to the Ruby Avro library? Is there an issue we can link to from this gem's README?

Good question. I opened a ticket for it a while ago: https://issues.apache.org/jira/browse/AVRO-2258

There is another ticket (https://issues.apache.org/jira/browse/AVRO-2299) that I believe aims to generalize the normalization. I brought up the resolution canonical form in that context. That has been hanging out for a while. I should probably take a look again: apache/avro#805

@tjwp tjwp merged commit ff51299 into master Jul 20, 2020
@tjwp tjwp deleted the tjwp-avro-1.10 branch July 20, 2020 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants