fix: resolve path-based lineage for Databricks external tables (#27561)#27648
Conversation
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
…-FQN resolution Reverts the path-based fallback in DATABRICKS_GET_TABLE_LINEAGE and DATABRICKS_GET_COLUMN_LINEAGE queries since DatabricksClient lacks the external_path_to_fqn map needed to resolve paths to FQNs. Without this map, relaxing the IS NOT NULL constraints creates dict keys containing None values that never match downstream lookups.
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
Code Review ✅ Approved 1 resolved / 1 findingsResolves path-based lineage for Databricks external tables by enabling path fallback during column lineage caching. No issues found. ✅ 1 resolved✅ Bug: DatabricksClient column lineage caching ignores path fallback
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
Describe your changes:
Fixes #27561
External tables in Databricks are referenced using cloud storage paths (e.g.
delta.\abfss://...`) instead of table names. In this case, Databricks system tables populatesource_path/target_pathand leavesource_table_full_name/target_table_full_name` as null. The lineage processor was filtering out these rows entirely, resulting in missing lineage for all external tables.Changes:
databricks/queries.py+unitycatalog/queries.py: Addedsource_pathandtarget_pathto SELECT; relaxed WHERE filter from hardIS NOT NULLon name columns to(name IS NOT NULL OR path IS NOT NULL)databricks/client.py: Passsource_pathandtarget_paththrough the lineage cache dictunitycatalog/lineage.py: Build a reversepath → table_fqnmap from the external locations cache; fall back to path resolution whenfull_nameis null; ensure_cache_external_locations()runs before_cache_lineage()so the reverse map is availabletest_unity_catalog_lineage.py: Updated mock row definitions to include path fields; added tests for path resolution, unresolvable path skipping, and reverse map constructionType of change:
Checklist:
Fixes #27561: resolve path-based lineage for Databricks external tables