feat(clickhouse): support cross-database and dictionary lineage (#26095)#27551
feat(clickhouse): support cross-database and dictionary lineage (#26095)#27551mohitjeswani01 wants to merge 3 commits intoopen-metadata:mainfrom
Conversation
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
There was a problem hiding this comment.
Pull request overview
Adds missing ClickHouse lineage capabilities to address #26095 by introducing cross-database lineage (Trino-style matching) and dictionary-based lineage derived from system.dictionaries.
Changes:
- Added ClickHouse dictionary discovery (ingested as
TableType.External) and a newsystem.dictionariesquery for lineage extraction. - Implemented ClickHouse cross-database lineage resolution using
crossDatabaseServiceNames. - Added unit tests for dictionary source-string parsing.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| ingestion/src/metadata/ingestion/source/database/clickhouse/lineage.py | Implements cross-database lineage and dictionary lineage extraction, plus parsing helper. |
| ingestion/src/metadata/ingestion/source/database/clickhouse/metadata.py | Registers dictionary engine objects as TableType.External during metadata ingestion. |
| ingestion/src/metadata/ingestion/source/database/clickhouse/utils.py | Adds SQLAlchemy inspector/dialect helpers to list dictionary names. |
| ingestion/src/metadata/ingestion/source/database/clickhouse/queries.py | Refactors query strings formatting and adds CLICKHOUSE_DICTIONARY_LINEAGE. |
| ingestion/src/metadata/ingestion/source/database/clickhouse/usage.py | Minor formatting-only change. |
| ingestion/tests/unit/topology/database/test_clickhouse_lineage.py | Adds unit tests for _parse_clickhouse_dict_source. |
…d duplicate ingestion
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
Code Review ✅ Approved 2 resolved / 2 findingsAdds cross-database and dictionary lineage support for Clickhouse. Resolves issues regarding dictionary table duplication and generator handling in lineage collection. ✅ 2 resolved✅ Bug: Dictionary tables may be duplicated in regular_tables list
✅ Edge Case: yield_dictionary_lineage
|
| Compact |
|
Was this helpful? React with 👍 / 👎 | Gitar
|
@harshach all bot comments addressed could you please add a |
Description:
Fixes #26095
What changes did you make?
metadata.pyandutils.pyto ingest ClickHouseDictionaryengines asTableType.External.CLICKHOUSE_DICTIONARY_LINEAGEto querysystem.dictionariesand a robust regex parser (_parse_clickhouse_dict_source) to extract the upstream database and table/view from theSOURCE()clause.Source.ViewLineageedges.yield_cross_database_lineage()inClickhouseLineageSourcefollowing the established Trino pattern to resolve FQNs fromcrossDatabaseServiceNames.Why did you make them?
To resolve the missing cross-database lineage blocker (#26095) and to fulfill the explicit hackathon request from @agusosimani to support upstream lineage for ClickHouse dictionaries (e.g., correctly mapping
geo_location_dictto its source viewgeo_locations).How did you test your changes?
test_clickhouse_lineage.pyachieving 100% pass rate.Screenshots of passing test suite:


Type of change:
Checklist:
I have read the CONTRIBUTING document.
My PR title is
Fixes <issue-number>: <short explanation>I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.
I have added tests around the new logic.
The issue properly describes why the new feature is needed, what's the goal, and how we are building it. Any discussion
or decision-making process is reflected in the issue.
I have added tests around the new logic.