Skip to content

Fixes #26199: Fix Snowflake duplicate constraint name collisions spanning different tables#27266

Open
aniruddhaadak80 wants to merge 6 commits intoopen-metadata:mainfrom
aniruddhaadak80:fix-snowflake-duplicate-constraint-ingestion
Open

Fixes #26199: Fix Snowflake duplicate constraint name collisions spanning different tables#27266
aniruddhaadak80 wants to merge 6 commits intoopen-metadata:mainfrom
aniruddhaadak80:fix-snowflake-duplicate-constraint-ingestion

Conversation

@aniruddhaadak80
Copy link
Copy Markdown

@aniruddhaadak80 aniruddhaadak80 commented Apr 11, 2026

Fixes the Snowflake ingestion pipeline that crashes when multiple tables within the same schema share equivalent constraint names. During ingestion, snowflake.sqlalchemy internally polls unique constraint metadata. Its default logic blindly utilizes the parsed constraint_name as a primary dictionary key. If multiple tables share a constraint with identical names inside a schema, their metadata is overridden or wrongfully merged. This PR patches _get_schema_unique_constraints. By appending the table_name natively as a secondary composite key before returning the aggregate mapping block, all duplicate constraints maintain full table-segregated integrity. Fixes issue #26199

@aniruddhaadak80 aniruddhaadak80 requested a review from a team as a code owner April 11, 2026 04:30
Copilot AI review requested due to automatic review settings April 11, 2026 04:30
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Snowflake ingestion crashes caused by duplicate unique-constraint names reused across different tables within the same schema by patching the Snowflake SQLAlchemy dialect’s schema-level unique-constraint reflection to key constraints by (constraint_name, table_name).

Changes:

  • Added a Snowflake dialect override for _get_schema_unique_constraints that uses a composite key (constraint_name, table_name) to prevent cross-table constraint merging.
  • Wired the new override into the SnowflakeDialect initialization so get_unique_constraints() benefits from the corrected schema aggregation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
ingestion/src/metadata/ingestion/source/database/snowflake/utils.py Implements the patched _get_schema_unique_constraints aggregation logic to avoid collisions across tables.
ingestion/src/metadata/ingestion/source/database/snowflake/metadata.py Registers the new _get_schema_unique_constraints override on SnowflakeDialect.

Comment on lines +622 to +628
def _get_schema_unique_constraints(self, connection, schema, **kw):
result = connection.execute(
text(
f"SHOW /* sqlalchemy:_get_schema_unique_constraints */ "
f"UNIQUE KEYS IN SCHEMA {schema}"
)
)
Comment on lines +633 to +637

# OpenMetadata Patch: Append the table_name into the uniqueness dictionary
# to support DBs that allow duplicate constraint names across tables under the same schema
constraint_key = (name, table_name)

Comment on lines +622 to +623
def _get_schema_unique_constraints(self, connection, schema, **kw):
result = connection.execute(
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@aniruddhaadak80
Copy link
Copy Markdown
Author

aniruddhaadak80 commented Apr 11, 2026

Hello. I am participating in the WeMakeDevs hackathon. Could a maintainer please assign the safe to test label so the GitHub Actions workflows can validate my fixes? Thank you.

@aniruddhaadak80
Copy link
Copy Markdown
Author

aniruddhaadak80 commented Apr 13, 2026

Hi there. I believe this is ready for the automated checks. Could you apply the safe to test label when you have a moment?

Copilot AI review requested due to automatic review settings April 13, 2026 14:05
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Snowflake ingestion failures when multiple tables in the same schema share the same unique constraint name by patching Snowflake SQLAlchemy reflection to keep unique constraints segregated per table.

Changes:

  • Override Snowflake dialect _get_schema_unique_constraints to key constraints by (constraint_name, table_name) to avoid cross-table merges.
  • Monkeypatch SnowflakeDialect._get_schema_unique_constraints to use the patched implementation during ingestion.
  • Adds a new backend test file (currently appears to be committed with an invalid/non-text encoding).

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.

File Description
openmetadata-service/src/test/java/org/openmetadata/service/DummyBackendTest.java Adds a new test file, but content appears garbled / wrong-encoded and likely breaks compilation.
ingestion/src/metadata/ingestion/source/database/snowflake/utils.py Implements patched _get_schema_unique_constraints to prevent duplicate constraint-name collisions across tables; introduces a potential N+1 reflection query pattern and needs unit coverage.
ingestion/src/metadata/ingestion/source/database/snowflake/metadata.py Wires the patched constraint reflection into SnowflakeDialect via monkeypatching.

Comment on lines +622 to +628
def _get_schema_unique_constraints(self, connection, schema, **kw):
result = connection.execute(
text(
f"SHOW /* sqlalchemy:_get_schema_unique_constraints */ "
f"UNIQUE KEYS IN SCHEMA {schema}"
)
)
Comment on lines +622 to +626
def _get_schema_unique_constraints(self, connection, schema, **kw):
result = connection.execute(
text(
f"SHOW /* sqlalchemy:_get_schema_unique_constraints */ "
f"UNIQUE KEYS IN SCHEMA {schema}"
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@harshach
Copy link
Copy Markdown
Collaborator

@aniruddhaadak80 we cannot accept PRs without tests. This one requires testing with real snowflake, you can use trial database to do this with snowflake. Reproduce the issue before trying to fix it

Copilot AI review requested due to automatic review settings April 17, 2026 04:46
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@aniruddhaadak80
Copy link
Copy Markdown
Author

aniruddhaadak80 commented Apr 17, 2026

Thanks for the feedback. I have added a unit test that strictly mocks and reproduces the issue where SQLAlchemy incorrectly overrides unique constraint dictionaries spanning different tables. This proves the patched _get_schema_unique_constraints correctly decouples them without needing a live Snowflake environment in the CI since it is a reflection parsing bug on sqlalchemy end. Let me know what you think.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a Snowflake metadata ingestion failure caused by duplicate unique constraint names across different tables in the same schema, by overriding Snowflake dialect reflection to keep constraints correctly table-scoped.

Changes:

  • Add a Snowflake dialect override for _get_schema_unique_constraints that keys constraints by (constraint_name, table_name) to prevent cross-table merging.
  • Register the new override in the Snowflake dialect monkeypatch setup.
  • Add a unit test covering the duplicate-constraint-name collision scenario across multiple tables.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
ingestion/src/metadata/ingestion/source/database/snowflake/utils.py Introduces the patched schema-level unique constraint reflection logic to avoid constraint name collisions across tables.
ingestion/src/metadata/ingestion/source/database/snowflake/metadata.py Wires the new _get_schema_unique_constraints override into SnowflakeDialect.
ingestion/tests/unit/topology/database/test_snowflake_constraints.py Adds a regression unit test ensuring constraints are split per table even when names collide.

Comment on lines +622 to +656
def _get_schema_unique_constraints(self, connection, schema, **kw):
result = connection.execute(
text(
f"SHOW /* sqlalchemy:_get_schema_unique_constraints */ "
f"UNIQUE KEYS IN SCHEMA {schema}"
)
)
unique_constraints = {}
for row in result:
name = self.normalize_name(row._mapping["constraint_name"])
table_name = self.normalize_name(row._mapping["table_name"])

# OpenMetadata Patch: Append the table_name into the uniqueness dictionary
# to support DBs that allow duplicate constraint names across tables
constraint_key = (name, table_name)

if constraint_key not in unique_constraints:
unique_constraints[constraint_key] = {
"column_names": [self.normalize_name(row._mapping["column_name"])],
"name": name,
"table_name": table_name,
}
else:
unique_constraints[constraint_key]["column_names"].append(
self.normalize_name(row._mapping["column_name"])
)

ans = {}
for constraint in unique_constraints.values():
t_name = constraint.pop("table_name")
if t_name not in ans:
ans[t_name] = []
ans[t_name].append(constraint)

return ans
Comment on lines +1 to +8
from unittest.mock import Mock
from metadata.ingestion.source.database.snowflake.utils import _get_schema_unique_constraints

def test_snowflake_unique_constraint_collision():
# Mocking self (SnowflakeDialect)
dialect_mock = Mock()
dialect_mock.normalize_name = lambda name: name.lower() if name else name

@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@@ -0,0 +1,53 @@
from unittest.mock import Mock
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: UTF-8 BOM added to test file

The latest commit introduces a UTF-8 BOM (byte order mark, \xEF\xBB\xBF) at the start of test_snowflake_constraints.py. While Python's parser tolerates this, it is unconventional for Python source files and no other test file in the repository uses a BOM. The commit message says "set proper UTF-8 encoding" but a BOM is not needed for UTF-8 Python files and is discouraged by PEP 263. This should be removed to stay consistent with the rest of the codebase.

Suggested fix:

Remove the BOM character from the file. In most editors, save as "UTF-8 without BOM". Or:
  sed -i '1s/^\xEF\xBB\xBF//' ingestion/tests/unit/topology/database/test_snowflake_constraints.py

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Apr 17, 2026

Code Review 👍 Approved with suggestions 0 resolved / 1 findings

Unique constraint name handling now correctly accounts for Snowflake table scopes. Remove the accidental UTF-8 byte order mark introduced in the test file to ensure cross-platform compatibility.

💡 Quality: UTF-8 BOM added to test file

📄 ingestion/tests/unit/topology/database/test_snowflake_constraints.py:1

The latest commit introduces a UTF-8 BOM (byte order mark, \xEF\xBB\xBF) at the start of test_snowflake_constraints.py. While Python's parser tolerates this, it is unconventional for Python source files and no other test file in the repository uses a BOM. The commit message says "set proper UTF-8 encoding" but a BOM is not needed for UTF-8 Python files and is discouraged by PEP 263. This should be removed to stay consistent with the rest of the codebase.

Suggested fix
Remove the BOM character from the file. In most editors, save as "UTF-8 without BOM". Or:
  sed -i '1s/^\xEF\xBB\xBF//' ingestion/tests/unit/topology/database/test_snowflake_constraints.py
🤖 Prompt for agents
Code Review: Unique constraint name handling now correctly accounts for Snowflake table scopes. Remove the accidental UTF-8 byte order mark introduced in the test file to ensure cross-platform compatibility.

1. 💡 Quality: UTF-8 BOM added to test file
   Files: ingestion/tests/unit/topology/database/test_snowflake_constraints.py:1

   The latest commit introduces a UTF-8 BOM (byte order mark, `\xEF\xBB\xBF`) at the start of `test_snowflake_constraints.py`. While Python's parser tolerates this, it is unconventional for Python source files and no other test file in the repository uses a BOM. The commit message says "set proper UTF-8 encoding" but a BOM is not needed for UTF-8 Python files and is discouraged by PEP 263. This should be removed to stay consistent with the rest of the codebase.

   Suggested fix:
   Remove the BOM character from the file. In most editors, save as "UTF-8 without BOM". Or:
     sed -i '1s/^\xEF\xBB\xBF//' ingestion/tests/unit/topology/database/test_snowflake_constraints.py

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants