Skip to content

Fixes #27419: Trino cross-database lineage for case-insensitive table names#27495

Open
hassaansaleem28 wants to merge 13 commits intoopen-metadata:mainfrom
hassaansaleem28:fix/27419-lineage-ingestion
Open

Fixes #27419: Trino cross-database lineage for case-insensitive table names#27495
hassaansaleem28 wants to merge 13 commits intoopen-metadata:mainfrom
hassaansaleem28:fix/27419-lineage-ingestion

Conversation

@hassaansaleem28
Copy link
Copy Markdown
Contributor

@hassaansaleem28 hassaansaleem28 commented Apr 17, 2026

Describe your changes:

Fixes #27419

Trino lowercases identifiers in query history, but OpenMetadata was matching cross-database table names too strictly. That caused the upstream Postgres CUSTOMER table to be dropped from the lineage graph.

What I worked on

I worked on Trino lineage cross-database matching because Trino lowercases identifiers while OpenMetadata was comparing table names too strictly. I added a regression test and updated the matching logic so the Postgres CUSTOMER table now links into the Trino lineage graph instead of being dropped.

Before

  • The lineage graph stopped at customer.
  • The upstream Postgres CUSTOMER node was missing.
Screenshot from 2026-04-17 21-49-05

After

  • Cross-database lineage now matches table names case-insensitively.
  • The lookup is scoped to the correct schema, so we avoid broad database-wide scans.
  • The full path now appears in the UI: CUSTOMER -> customer -> customer_copy.
Screenshot from 2026-04-18 06-03-17

What changed

  • Added a regression test for the exact Trino/Postgres case-mismatch scenario.
  • Updated Trino lineage matching to resolve schema and table names case-insensitively.
  • Kept the fallback schema-scoped and cached to avoid extra metadata queries.

Validation

  • Targeted Trino lineage unit test passed.
  • Live ingestion verification confirmed the lineage edge appears in the UI.

Type of change:

  • Bug fix

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes #27419: Trino cross-database lineage for case-insensitive table names
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.
  • I have added a test that covers the exact scenario we are fixing. For complex issues, comment the issue number in the test for future reference.

Summary by Gitar

  • Logic optimization:
    • Scoped the case-insensitive lookup to the schema level using _get_cross_database_schema_fqn to prevent excessive metadata queries.
    • Added cross_database_table_schema_mapping to cache entities during the lineage generation process.
  • Testing improvements:
    • Added test_check_same_table_is_case_insensitive_for_names_and_columns to verify case-insensitive matching logic.
    • Updated test_yield_cross_database_lineage_finds_uppercase_source_table to include schema validation and verify entity filtering via databaseSchema params.

This will update automatically on new commits.

@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment thread ingestion/src/metadata/ingestion/source/database/trino/lineage.py
Comment thread ingestion/src/metadata/ingestion/source/database/trino/lineage.py Outdated
… schema

Signed-off-by: hassaansaleem28 <iamhassaans@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@hassaansaleem28 hassaansaleem28 marked this pull request as ready for review April 18, 2026 01:17
@hassaansaleem28 hassaansaleem28 requested a review from a team as a code owner April 18, 2026 01:17
Copilot AI review requested due to automatic review settings April 18, 2026 01:17
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment thread ingestion/src/metadata/ingestion/source/database/trino/lineage.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Trino cross-database lineage resolution when the upstream/source database uses case-sensitive (e.g., uppercase) identifiers, while Trino normalizes identifiers to lowercase—preventing valid lineage edges from being created.

Changes:

  • Updates Trino cross-database table matching to be case-insensitive for table and column names, with a schema-scoped fallback lookup and caching.
  • Adds a regression unit test covering case-insensitive matching and ensuring schema-level lookup is used for cross-database resolution.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
ingestion/src/metadata/ingestion/source/database/trino/lineage.py Implements case-insensitive matching and schema-scoped cached fallback lookup for cross-database lineage.
ingestion/tests/unit/source/database/trino/test_lineage.py Adds regression tests validating case-insensitive matching and schema-scoped cross-db lookup behavior.

Comment thread ingestion/src/metadata/ingestion/source/database/trino/lineage.py Outdated
Signed-off-by: hassaansaleem28 <iamhassaans@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 18, 2026 01:33
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread ingestion/tests/unit/source/database/trino/test_lineage.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/database/trino/lineage.py Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 18, 2026 01:38
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

Comment thread ingestion/tests/unit/source/database/trino/test_lineage.py
Comment thread ingestion/tests/unit/source/database/trino/test_lineage.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/database/trino/lineage.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/database/trino/lineage.py Outdated
Signed-off-by: hassaansaleem28 <iamhassaans@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot AI review requested due to automatic review settings April 18, 2026 08:24
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment thread ingestion/src/metadata/ingestion/source/database/trino/lineage.py
Comment thread ingestion/src/metadata/ingestion/source/database/trino/lineage.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/database/trino/lineage.py Outdated
Signed-off-by: hassaansaleem28 <iamhassaans@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot AI review requested due to automatic review settings April 19, 2026 01:06
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread ingestion/src/metadata/ingestion/source/database/trino/lineage.py Outdated
Comment thread ingestion/tests/unit/source/database/trino/test_lineage.py Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 19, 2026 01:12
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Signed-off-by: hassaansaleem28 <iamhassaans@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Apr 19, 2026

Code Review ✅ Approved 3 resolved / 3 findings

Trino cross-database lineage now correctly handles case-insensitive table matching by addressing schema FQN lookup accuracy and optimizing the cache miss fallback strategy. No remaining issues found.

✅ 3 resolved
Bug: Case-insensitive fallback ignores schema, may match wrong table

📄 ingestion/src/metadata/ingestion/source/database/trino/lineage.py:125-133
_get_case_insensitive_cross_database_table iterates ALL tables in the cross-database and matches only on table name (+ columns). It does not verify the schema segment of the FQN matches. If a Postgres database has schema_a.CUSTOMER and schema_b.CUSTOMER, the fallback will return whichever it encounters first, potentially creating incorrect lineage.

The exact-match path (get_by_name) constructs the full FQN including schema, so it correctly scopes. The fallback should do the same.

Performance: Fallback lists all tables per database on every cache miss

📄 ingestion/src/metadata/ingestion/source/database/trino/lineage.py:174-186 📄 ingestion/src/metadata/ingestion/source/database/trino/lineage.py:128-129
_get_case_insensitive_cross_database_table calls list_all_entities(entity=Table, params={"database": cross_database_fqn}) for each Trino table that doesn't get an exact FQN match. For databases with thousands of tables and many Trino tables without exact matches, this results in repeated full-database scans via API pagination.

The cache on line 174 prevents re-lookup for the same derived FQN, but different Trino tables produce different derived FQNs, so each unique miss triggers a new full scan of the same cross-database.

Edge Case: Schema FQN lookup may still be case-sensitive

📄 ingestion/src/metadata/ingestion/source/database/trino/lineage.py:125-136
The new _get_cross_database_schema_fqn constructs the schema FQN using the Trino schema name (e.g. trino_table.databaseSchema.name.root), which Trino lowercases. If the cross-database service stores the schema with different casing (e.g. Source_Schema vs source_schema), the list_all_entities(params={"databaseSchema": ...}) call may return no results, causing the fallback to silently find nothing.

This is the same class of case-sensitivity issue that this PR fixes for table names, just one level up. It's an incremental improvement opportunity, not a regression.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@hassaansaleem28
Copy link
Copy Markdown
Contributor Author

hassaansaleem28 commented Apr 19, 2026

Hello @pmbrull @harshach, could you plz add that safe to test label when u get a chance? Thanks !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Respect Trino case insensitivy in Lineage ingestion

2 participants