Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema evolution support #103

Closed
sudeepdino008 opened this issue Apr 18, 2023 · 7 comments
Closed

schema evolution support #103

sudeepdino008 opened this issue Apr 18, 2023 · 7 comments

Comments

@sudeepdino008
Copy link

Hi, this is a question around schema evolution support in avrora. I want to basically:

  • delete older fields in the reader schema
  • add new fields with default values in the reader schema

I tried the first one in erlavro, and it didn't seem to work there. Perhaps the schema evolution support isn't there in avrora as well given that erlavro doesn't have it? Is there a way to get around it?

I've added the issue in erlavro too:
klarna/erlavro#117

@Strech
Copy link
Owner

Strech commented Apr 18, 2023

Hey @sudeepdino008 thanks for the question. From the top of my head it should work, but I might need to check it. I think if you introduce new schema and mark it as BACKWARD or BACKWARD_TRANSITIVE it should work because you will have to update consumers first and they will read the latest schema from the registry I guess.

Could you try to emulate it in that order to prove that update of the consumer didn't help:

  1. Register full version of the schema
  2. Use writer to write a message
  3. Use reader to read the message (should use the full schema)
  4. Register new version of the schema with fields removed
  5. Update (restart, or drop cache) of the reader
  6. Use writer to write a message (with new schema)
  7. Use reader from step 5 and read new message (should use new schema)

I think that's the way it should work and of course you will have to configure schema registry compatibility for the schema before registering new (or if it will be correct by default you are good).

And if it will not work – let's dig deeper

@sudeepdino008
Copy link
Author

sudeepdino008 commented Apr 21, 2023

Hi @Strech,

I've created this test - https://github.com/sudeepdino008/avrora/blob/master/test/avrora/schema/evolution_test.exs
Let me know if I'm not thinking about it right. This should be possible to do, right?

  1) test from_json/2 schema evolution (Avrora.Schema.EvolutionTest)
     test/avrora/schema/evolution_test.exs:10
     ** (MatchError) no match of right hand side value: {:error, :schema_mismatch}
     code: {:ok, dpayload} = Avrora.Codec.Plain.decode(epayload, schema: newschema)
     stacktrace:
       test/avrora/schema/evolution_test.exs:27: (test)

@sudeepdino008
Copy link
Author

It it helps, my use case is that I'm updating the schema on the reader side (with backwards compatibility ensured -- so only changes like deleting field, or adding new fields with default value), and expect the reader to be able to read data which is avro encoded by an older schema.

@sudeepdino008
Copy link
Author

Ok, so I came to know about schema registries. Can you tell if I'm understanding this correctly?

the reader must know about the schema with which the writer encoded some data. You can't expect the reader to be able to use a latest schema which evolved in backwards-compatible way, to decode data which is being encoded by the reader using some earlier schema version.

Avro needs schema registries to support evolution decoding. The reader and writer both have access to such a registry, and "register" new schema versions while also maintaining a local cache. The encoding also adds a "schema version identifier", which is used on the reader side to figure out the schema version, and fetch it from cache/schema registry.

However, this means that the read can happen off the older schema version. If the evolved schema had a field deleted OR a field added with default value (since evolution is backwards compatible), this modification has to be done and maintained separately from the decoding process.

@Strech
Copy link
Owner

Strech commented Apr 26, 2023

Yes, the writer will register (or you will register separately) new schema in the schema registry and it will be given an ID. Such ID is stored later in the binary message writer will generate/obtain.

When reader will try to decode such message it will check for the ID if schema registry is enabled.

But you can emulate evolution by scenario I presented to you when local cache is busted in order to obtain the latest version of the schema from local file.

@Strech
Copy link
Owner

Strech commented Apr 27, 2023

You are welcome

@sudeepdino008
Copy link
Author

sudeepdino008 commented Apr 28, 2023

Thank you for the time :)
I'm experimenting with the OCF format + hooks in avrora now to skip certain fields in latest schema (which is linked to the original usecase)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants