Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding or removing a default does not register a new schema version #38

Closed
Bandes opened this issue Mar 8, 2017 · 6 comments
Closed

Comments

@Bandes
Copy link

Bandes commented Mar 8, 2017

I've been noticing some confusing behavior with the schema registry. Specifically, we have some schemas which have added default values for some fields. Our expectation was that adding these values would register a new version with the registry. But it appears that the defaults are being stripped out, and the schema is being equated to the version with no defaults and therefore not getting registered.

Is this the expected behavior? If so, how is it possible to maintain FULL-TRANSITIVE compatibility? It was our understanding that defaults were required for that compatibility level.

Steps to reproduce -

  • Register a schema with all required fields and no defaults (id = 1)
  • Add defaults to one or more fields
  • Attempt to register the new schema
  • Returned id will still be 1
@tjwp
Copy link
Collaborator

tjwp commented Mar 9, 2017

That's a great question.

From the perspective of why is this happening, the answer is that this schema registry internally identifies schemas based on the fingerprint of the Parsing Canonical Form that is defined in the Avro specification.

Fingerprints based on these schemas can be used to "uniquely identify schemas", but this uniqueness only applies to a reader's ability to distinguish between different schemas used to write data.

The Parsing Canonical Form discards any attributes that are not relevant to parsing data, and this includes default and aliases. The exclusion of defaults is explicitly discussed in AVRO-2002.

Basing the schema identification on the fingerprint for this form was likely a mistake.

Like us, you probably want the assurance from the schema registry that you're not inadvertently introducting compatibility breaks. Right now, I don't see a workaround for that.

If your reader has a local copy of the schema it is using and it is only getting the writer's schema from the schema registry, which is the usual setup, then it should not matter that the schema from the registry does not include the default.

Obviously we should try to fix this. I think that a solution would look like something along these lines:

  • define a modification of the canonical form that also includes the resolution attributes, i.e. default. (Note: aliases are not currently supported in the Avro Ruby implementation -- so that's a separate issue.)
  • add a new fingerprint, based on this resolution canonical form, for schemas
  • add a configuration setting, settable via the environment, that controls which fingerprint version is used, including an 'all' option during the transition to generate and check both versions
  • set the fingerprint version setting to all and restart the app
  • provide a rake task to populate the new fingerprint for all existing schemas
  • update any clients that specify the fingerprint
  • set the fingerprint version setting to 2 and restart the app

Something like that would work for our usage of the registry. How does that sound for you?

Are you using the endpoint that checks by fingerprint to see if a schema is registered? If you happen to be using avromatic too then it uses it by default.

I should have some time to work on implementing a fix tomorrow (3/10).

@Bandes
Copy link
Author

Bandes commented Mar 9, 2017

I suspected that it had to do with the canonical parsing, good to know! (And thank you for the reference links.)

We are using AvroTurf as our client, I don't think it uses the fingerprint endpoint - do we need to be using that endpoint in order to take advantage of your fix? (We are not using avromatic and would prefer not to add it if we don't have to)

@tjwp
Copy link
Collaborator

tjwp commented Mar 9, 2017

AvroTurf does not use the fingerprint endpoint. avromatic monkey patches it into the schema registry client that AvroTurf provides.

You do not need to use the fingerprint endpoint to take advantage of the proposed fix. In fact, not using it makes the rollout of the fix easier because the change will just be internal to the server; no need to coordinate with client versions.

I'll let you know when a fix is available.

@tjwp
Copy link
Collaborator

tjwp commented Mar 13, 2017

@Bandes I have a branch and PR#39 to address this issue.

Could you take a look at the README section on updating, and let me know if that will work for you?

@Bandes
Copy link
Author

Bandes commented Mar 13, 2017

Yes, I tried out the branch with a copy of our data and I think that it will solve our problems, thank you!

@tjwp
Copy link
Collaborator

tjwp commented Mar 16, 2017

@Bandes Thank you for testing that branch! I've now merged the change to master. We're using this in production now and I'm going to close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants