fix(ingestion): avoid Avro sample double-deserialization in Kafka topics#28309
Conversation
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
There was a problem hiding this comment.
Pull request overview
This PR fixes Kafka sample-data decoding for Avro topics by preventing a second Avro deserialization when the Kafka DeserializingConsumer has already returned a decoded Python object (e.g., dict). It also makes the non-Avro fallback more robust for bytes-like vs already-decoded values, and adds unit tests covering the expected behaviors.
Changes:
- Update
CommonBrokerSource.decode_messageto short-circuit Avro decoding whenrecordis already deserialized (non-bytes-like). - Harden non-Avro decoding to support both bytes-like payloads and already-decoded objects.
- Add unit tests for Avro decoded vs Avro bytes paths and for non-Avro bytes/decoded handling.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| ingestion/src/metadata/ingestion/source/messaging/common_broker_source.py | Avoids Avro double-deserialization and improves bytes-like handling in message decoding for sample data. |
| ingestion/tests/unit/source/messaging/test_common_broker_source.py | Adds focused tests validating Avro short-circuiting, Avro bytes deserialization, and non-Avro decoding behavior. |
|
@open-metadata/ingestion @ayush-shah could someone please add safe to test on this PR so CI can run? Thanks! |
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
🟡 Playwright Results — all passed (15 flaky)✅ 4144 passed · ❌ 0 failed · 🟡 15 flaky · ⏭️ 90 skipped
🟡 15 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
6ea0b0d to
7d2a778
Compare
Code Review ✅ ApprovedUpdates CommonBrokerSource.decode_message to skip AvroDeserializer for pre-deserialized payloads while maintaining raw bytes support. Unit tests confirm correct handling of both Avro and non-Avro data formats. OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
|
@harshach @ayush-shah all checks are green on latest commit 7d2a778. Could you please re-review and merge when you get a moment? |
|
Failed to cherry-pick changes to the 1.12.10 branch. |
|
Failed to cherry-pick changes to the 1.13 branch. |



Related Issue
Closes #28195
What Changed
CommonBrokerSource.decode_messageto handle already-deserialized Avro payloads (e.g.dict) without re-runningAvroDeserializer.AvroDeserializerValidation
python -m ruff check ingestion/src/metadata/ingestion/source/messaging/common_broker_source.py ingestion/tests/unit/source/messaging/test_common_broker_source.pypython -m compileall ingestion/src/metadata/ingestion/source/messaging/common_broker_source.py ingestion/tests/unit/source/messaging/test_common_broker_source.pyNotes
pytestexecution in this local environment is blocked becauseingestion/src/metadata/generatedis not present in this checkout, causingmetadata.generatedimport failures before tests start. CI should execute full project checks as usual oncesafe to testis applied.