Skip to content

fix(ingestion): avoid Avro sample double-deserialization in Kafka topics#28309

Merged
ulixius9 merged 2 commits into
open-metadata:mainfrom
PRADDZY:hackathon/fix-kafka-avro-sample-decode
Jun 2, 2026
Merged

fix(ingestion): avoid Avro sample double-deserialization in Kafka topics#28309
ulixius9 merged 2 commits into
open-metadata:mainfrom
PRADDZY:hackathon/fix-kafka-avro-sample-decode

Conversation

@PRADDZY
Copy link
Copy Markdown
Contributor

@PRADDZY PRADDZY commented May 20, 2026

Related Issue

Closes #28195

What Changed

  • Updated CommonBrokerSource.decode_message to handle already-deserialized Avro payloads (e.g. dict) without re-running AvroDeserializer.
  • Kept Avro bytes path intact for raw payloads.
  • Hardened non-Avro fallback to handle both bytes-like and already-decoded payloads.
  • Added focused unit tests for:
    • Avro decoded payloads skipping re-deserialization
    • Avro bytes payloads still using AvroDeserializer
    • Non-Avro bytes/decoded payload handling

Validation

  • python -m ruff check ingestion/src/metadata/ingestion/source/messaging/common_broker_source.py ingestion/tests/unit/source/messaging/test_common_broker_source.py
  • python -m compileall ingestion/src/metadata/ingestion/source/messaging/common_broker_source.py ingestion/tests/unit/source/messaging/test_common_broker_source.py

Notes

  • Full pytest execution in this local environment is blocked because ingestion/src/metadata/generated is not present in this checkout, causing metadata.generated import failures before tests start. CI should execute full project checks as usual once safe to test is applied.

@PRADDZY PRADDZY requested a review from a team as a code owner May 20, 2026 16:01
Copilot AI review requested due to automatic review settings May 20, 2026 16:01
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Kafka sample-data decoding for Avro topics by preventing a second Avro deserialization when the Kafka DeserializingConsumer has already returned a decoded Python object (e.g., dict). It also makes the non-Avro fallback more robust for bytes-like vs already-decoded values, and adds unit tests covering the expected behaviors.

Changes:

  • Update CommonBrokerSource.decode_message to short-circuit Avro decoding when record is already deserialized (non-bytes-like).
  • Harden non-Avro decoding to support both bytes-like payloads and already-decoded objects.
  • Add unit tests for Avro decoded vs Avro bytes paths and for non-Avro bytes/decoded handling.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
ingestion/src/metadata/ingestion/source/messaging/common_broker_source.py Avoids Avro double-deserialization and improves bytes-like handling in message decoding for sample data.
ingestion/tests/unit/source/messaging/test_common_broker_source.py Adds focused tests validating Avro short-circuiting, Avro bytes deserialization, and non-Avro decoding behavior.

@PRADDZY
Copy link
Copy Markdown
Contributor Author

PRADDZY commented May 20, 2026

@open-metadata/ingestion @ayush-shah could someone please add safe to test on this PR so CI can run? Thanks!

@harshach harshach added the safe to test Add this label to run secure Github workflows on PRs label May 20, 2026
harshach
harshach previously approved these changes May 20, 2026
@harshach harshach added the To release Will cherry-pick this PR into the release branch label May 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

🟡 Playwright Results — all passed (15 flaky)

✅ 4144 passed · ❌ 0 failed · 🟡 15 flaky · ⏭️ 90 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 298 0 1 4
🟡 Shard 2 779 0 2 11
🟡 Shard 3 773 0 6 8
🟡 Shard 4 787 0 1 12
✅ Shard 5 773 0 0 47
🟡 Shard 6 734 0 5 8
🟡 15 flaky test(s) (passed on retry)
  • Features/EntityRenameConsolidation.spec.ts › Glossary - rename then update description should preserve terms (shard 1, 1 retry)
  • Features/BulkImportWithDotInName.spec.ts › Database service with dot in name - export and reimport (shard 2, 1 retry)
  • Features/KnowledgeCenter.spec.ts › Article mentions in description should working for Knowledge Center (shard 2, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Features/Table.spec.ts › Table pagination with sorting should works (shard 3, 1 retry)
  • Features/Table.spec.ts › Tags term should be consistent for search (shard 3, 1 retry)
  • Flow/ExploreDiscovery.spec.ts › Should not display soft deleted assets in search suggestions (shard 3, 1 retry)
  • Flow/LineageSettings.spec.ts › Verify global lineage config (shard 3, 1 retry)
  • Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Chart (shard 4, 1 retry)
  • Pages/GlossaryImportExport.spec.ts › Glossary CSV import preserves typed relations (shard 6, 1 retry)
  • Pages/Lineage/DataAssetLineage.spec.ts › Column lineage for searchIndex -> dashboard (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/ODCSImportExport.spec.ts › Multi-object ODCS contract - object selector shows all schema objects (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

@PRADDZY PRADDZY force-pushed the hackathon/fix-kafka-avro-sample-decode branch from 6ea0b0d to 7d2a778 Compare May 21, 2026 09:23
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented May 21, 2026

Code Review ✅ Approved

Updates CommonBrokerSource.decode_message to skip AvroDeserializer for pre-deserialized payloads while maintaining raw bytes support. Unit tests confirm correct handling of both Avro and non-Avro data formats.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link
Copy Markdown

@PRADDZY
Copy link
Copy Markdown
Contributor Author

PRADDZY commented May 21, 2026

@harshach @ayush-shah all checks are green on latest commit 7d2a778. Could you please re-review and merge when you get a moment?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@ayush-shah ayush-shah requested a review from a team June 2, 2026 14:01
@ulixius9 ulixius9 merged commit 014a7f9 into open-metadata:main Jun 2, 2026
45 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Failed to cherry-pick changes to the 1.12.10 branch.
Please cherry-pick the changes manually.
You can find more details here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Failed to cherry-pick changes to the 1.13 branch.
Please cherry-pick the changes manually.
You can find more details here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kafka Avro deserializer fails generating sample data

4 participants