Fixes #27538: feat(openlineage) add AWS Glue, Kusto, and Cosmos DB dataset naming support by mohittilala · Pull Request #27533 · open-metadata/OpenMetadata

mohittilala · 2026-04-20T05:24:20Z

Describe your changes:

OpenLineage events from AWS Glue EMR, Azure Data Explorer (Kusto), and Azure Cosmos DB use non-standard dataset name formats that the connector couldn't parse, producing no lineage edges. This adds namespace-aware dispatch in _get_table_details to handle each format before falling back to the existing dot-split logic. All new parsers are sourced from OpenLineage's Naming.java and covered by unit tests.

Type of change:

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Improvement

I have added tests around the new logic.
For connector/ingestion changes: I updated the documentation.

…support

gitar-bot · 2026-04-20T05:25:56Z

Code Review ✅ Approved

Expands OpenLineage dataset naming support to include AWS Glue, Kusto, and Cosmos DB. No issues found.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

Copilot

Pull request overview

This PR improves the OpenLineage ingestion connector’s ability to parse non-standard dataset naming formats emitted by AWS Glue EMR, Azure Data Explorer (Kusto), and Azure Cosmos DB, so lineage edges can be created instead of dropped due to unparseable dataset names.

Changes:

Add namespace-aware parsing dispatch in OpenlineageSource._get_table_details for Glue/Kusto/Cosmos naming formats.
Introduce dedicated parsers for Glue (table/{db}/{table}), Kusto ({db}/{table}), and Cosmos (/dbs/{db} + colls/{collection}).
Add unit tests covering the new parsers and namespace dispatch behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
`ingestion/src/metadata/ingestion/source/pipeline/openlineage/metadata.py`	Adds namespace-based dataset-name parsing and new parser helpers for Glue/Kusto/Cosmos.
`ingestion/tests/unit/topology/pipeline/test_openlineage.py`	Adds unit tests validating the new parsing logic and dispatch behavior.

Copilot · 2026-04-20T05:27:25Z

+        if not name.startswith("table/"):
+            return None
+        parts = name[len("table/") :].split("/")
+        if len(parts) < 2:
+            return None
+        return TableDetails(name=parts[-1].lower(), schema=parts[-2].lower())


The Glue name parser can return empty schema/table when the input has empty path segments (e.g., trailing slash table/db/table/ or double slashes). That would later build an invalid FQN and potentially create/lookup wrong entities. Consider filtering out empty segments (or stripping trailing slashes) and returning None when database/table are missing.

Copilot · 2026-04-20T05:27:25Z

+        parts = name.split("/")
+        if len(parts) < 2:
+            return None
+        return TableDetails(name=parts[-1].lower(), schema=parts[-2].lower())


_parse_slash_table_name has the same empty-segment issue as the Glue parser: inputs like db/table/ or db//table can yield an empty schema/table (since it blindly takes the last two split parts). Consider normalizing by stripping/filtering empty segments and returning None when the required parts are missing.

Copilot · 2026-04-20T05:27:26Z

+    def test_parse_cosmos_table_name_happy_path(self):
+        """Cosmos OL naming: db from namespace /dbs/{db}, name colls/{coll} — source: Naming.java CosmosNaming."""
+        result = OpenlineageSource._parse_cosmos_table_name(
+            "azurecosmos://myaccount.documents.azure.com/dbs/mydb",
+            "colls/mycollection",
+        )
+        self.assertEqual(result.name, "mycollection")
+        self.assertEqual(result.schema, "mydb")
+
+    def test_parse_cosmos_table_name_normalizes_to_lowercase(self):
+        """Cosmos database and collection names are normalized to lowercase for FQN matching."""
+        result = OpenlineageSource._parse_cosmos_table_name(
+            "azurecosmos://host/dbs/MyDB", "colls/MyCollection"
+        )
+        self.assertEqual(result.name, "mycollection")
+        self.assertEqual(result.schema, "mydb")
+


The new dataset-name parsers are tested for happy paths, but there are no tests asserting they reject malformed inputs that would currently yield empty schema/table (e.g., trailing slashes) or, for Cosmos, names that don't match the documented colls/{collection} pattern. Adding these negative tests would help prevent incorrect lineage edges when events contain unexpected naming variants.

Copilot · 2026-04-20T05:27:26Z

+        database = match.group(1).lower()
+        collection = name.split("/")[-1].lower() if "/" in name else name.lower()


_parse_cosmos_table_name currently returns a TableDetails for any name value (including ones not in the documented colls/{collection} format). Because _get_table_details dispatches on azurecosmos:// namespace, this can mis-parse unrelated Cosmos dataset names and produce incorrect lineage. Consider validating the name prefix/pattern (e.g., require colls/ with a non-empty collection) and returning None when it doesn't match.

Suggested change

database = match.group(1).lower()

collection = name.split("/")[-1].lower() if "/" in name else name.lower()

collection_match = re.fullmatch(r"colls/([^/]+)", name)

if not collection_match:

return None

database = match.group(1).lower()

collection = collection_match.group(1).lower()

github-actions · 2026-04-20T05:29:59Z

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

sonarqubecloud · 2026-04-20T06:26:54Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
94.6% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-04-20T07:14:21Z

🟡 Playwright Results — all passed (22 flaky)

✅ 3665 passed · ❌ 0 failed · 🟡 22 flaky · ⏭️ 89 skipped

Shard	Passed	Flaky	Skipped
🟡 Shard 1	478	3	4
🟡 Shard 2	652	1	7
🟡 Shard 3	654	5	1
🟡 Shard 4	630	4	27
🟡 Shard 5	610	1	42
🟡 Shard 6	641	8	8

🟡 22 flaky test(s) (passed on retry)

Features/CustomizeDetailPage.spec.ts › Ml Model - customization should work (shard 1, 1 retry)
Pages/Customproperties-part1.spec.ts › Hyperlink (shard 1, 1 retry)
Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 2 retries)
Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 1 retry)
Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 2 retries)
Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
Features/UserProfileOnlineStatus.spec.ts › Should show "Active recently" for users active within last hour (shard 3, 1 retry)
Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
Pages/DataContracts.spec.ts › Create Data Contract and validate for Directory (shard 4, 1 retry)
Pages/Domains.spec.ts › Rename domain with tags and glossary terms preserves associations (shard 4, 1 retry)
Pages/DomainUIInteractions.spec.ts › Add expert to domain via UI (shard 4, 1 retry)
Pages/Glossary.spec.ts › Add and Remove Assets (shard 5, 1 retry)
Features/AutoPilot.spec.ts › Create Service and check the AutoPilot status (shard 6, 1 retry)
Pages/HyperlinkCustomProperty.spec.ts › should display URL when no display text is provided (shard 6, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
Pages/LoginConfiguration.spec.ts › update login configuration should work (shard 6, 1 retry)
Pages/Tag.spec.ts › Verify Owner Add Delete (shard 6, 1 retry)
Pages/UserDetails.spec.ts › Create team with domain and verify visibility of inherited domain in user profile after team removal (shard 6, 1 retry)
Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

feat(openlineage): add AWS Glue, Kusto, and Cosmos DB dataset naming …

a08b714

…support

mohittilala self-assigned this Apr 20, 2026

Copilot AI review requested due to automatic review settings April 20, 2026 05:24

mohittilala requested a review from a team as a code owner April 20, 2026 05:24

mohittilala added enhancement New feature or request Ingestion safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch Openlineage labels Apr 20, 2026

Copilot started reviewing on behalf of mohittilala April 20, 2026 05:24 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

mohittilala temporarily deployed to test April 20, 2026 05:34 — with GitHub Actions Inactive

mohittilala changed the title ~~feat(openlineage): add AWS Glue, Kusto, and Cosmos DB dataset naming support~~ Fixes #27538: feat(openlineage) add AWS Glue, Kusto, and Cosmos DB dataset naming support Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #27538: feat(openlineage) add AWS Glue, Kusto, and Cosmos DB dataset naming support#27533

Fixes #27538: feat(openlineage) add AWS Glue, Kusto, and Cosmos DB dataset naming support#27533
mohittilala wants to merge 1 commit intomainfrom
feat/openlineage-glue-kusto-cosmos-naming

mohittilala commented Apr 20, 2026 •

edited by nikhilchennam

Loading

Uh oh!

gitar-bot bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 20, 2026

Uh oh!

Copilot AI Apr 20, 2026

Uh oh!

Copilot AI Apr 20, 2026

Uh oh!

Copilot AI Apr 20, 2026

Uh oh!

github-actions bot commented Apr 20, 2026

Uh oh!

sonarqubecloud bot commented Apr 20, 2026

Uh oh!

github-actions bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		database = match.group(1).lower()
		collection = name.split("/")[-1].lower() if "/" in name else name.lower()

Conversation

mohittilala commented Apr 20, 2026 • edited by nikhilchennam Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes:

Type of change:

Checklist:

Uh oh!

gitar-bot bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 20, 2026

Uh oh!

sonarqubecloud bot commented Apr 20, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

github-actions bot commented Apr 20, 2026

🟡 Playwright Results — all passed (22 flaky)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mohittilala commented Apr 20, 2026 •

edited by nikhilchennam

Loading

gitar-bot bot commented Apr 20, 2026 •

edited

Loading