Fixes #26199: Fix Snowflake duplicate constraint name collisions spanning different tables#27266
Conversation
…ropping uniqueness in schema
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
There was a problem hiding this comment.
Pull request overview
Fixes Snowflake ingestion crashes caused by duplicate unique-constraint names reused across different tables within the same schema by patching the Snowflake SQLAlchemy dialect’s schema-level unique-constraint reflection to key constraints by (constraint_name, table_name).
Changes:
- Added a Snowflake dialect override for
_get_schema_unique_constraintsthat uses a composite key(constraint_name, table_name)to prevent cross-table constraint merging. - Wired the new override into the SnowflakeDialect initialization so
get_unique_constraints()benefits from the corrected schema aggregation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
ingestion/src/metadata/ingestion/source/database/snowflake/utils.py |
Implements the patched _get_schema_unique_constraints aggregation logic to avoid collisions across tables. |
ingestion/src/metadata/ingestion/source/database/snowflake/metadata.py |
Registers the new _get_schema_unique_constraints override on SnowflakeDialect. |
| def _get_schema_unique_constraints(self, connection, schema, **kw): | ||
| result = connection.execute( | ||
| text( | ||
| f"SHOW /* sqlalchemy:_get_schema_unique_constraints */ " | ||
| f"UNIQUE KEYS IN SCHEMA {schema}" | ||
| ) | ||
| ) |
|
|
||
| # OpenMetadata Patch: Append the table_name into the uniqueness dictionary | ||
| # to support DBs that allow duplicate constraint names across tables under the same schema | ||
| constraint_key = (name, table_name) | ||
|
|
| def _get_schema_unique_constraints(self, connection, schema, **kw): | ||
| result = connection.execute( |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hello. I am participating in the WeMakeDevs hackathon. Could a maintainer please assign the |
|
Hi there. I believe this is ready for the automated checks. Could you apply the |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
There was a problem hiding this comment.
Pull request overview
This PR fixes Snowflake ingestion failures when multiple tables in the same schema share the same unique constraint name by patching Snowflake SQLAlchemy reflection to keep unique constraints segregated per table.
Changes:
- Override Snowflake dialect
_get_schema_unique_constraintsto key constraints by(constraint_name, table_name)to avoid cross-table merges. - Monkeypatch
SnowflakeDialect._get_schema_unique_constraintsto use the patched implementation during ingestion. - Adds a new backend test file (currently appears to be committed with an invalid/non-text encoding).
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| openmetadata-service/src/test/java/org/openmetadata/service/DummyBackendTest.java | Adds a new test file, but content appears garbled / wrong-encoded and likely breaks compilation. |
| ingestion/src/metadata/ingestion/source/database/snowflake/utils.py | Implements patched _get_schema_unique_constraints to prevent duplicate constraint-name collisions across tables; introduces a potential N+1 reflection query pattern and needs unit coverage. |
| ingestion/src/metadata/ingestion/source/database/snowflake/metadata.py | Wires the patched constraint reflection into SnowflakeDialect via monkeypatching. |
| def _get_schema_unique_constraints(self, connection, schema, **kw): | ||
| result = connection.execute( | ||
| text( | ||
| f"SHOW /* sqlalchemy:_get_schema_unique_constraints */ " | ||
| f"UNIQUE KEYS IN SCHEMA {schema}" | ||
| ) | ||
| ) |
| def _get_schema_unique_constraints(self, connection, schema, **kw): | ||
| result = connection.execute( | ||
| text( | ||
| f"SHOW /* sqlalchemy:_get_schema_unique_constraints */ " | ||
| f"UNIQUE KEYS IN SCHEMA {schema}" |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
@aniruddhaadak80 we cannot accept PRs without tests. This one requires testing with real snowflake, you can use trial database to do this with snowflake. Reproduce the issue before trying to fix it |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Thanks for the feedback. I have added a unit test that strictly mocks and reproduces the issue where SQLAlchemy incorrectly overrides unique constraint dictionaries spanning different tables. This proves the patched _get_schema_unique_constraints correctly decouples them without needing a live Snowflake environment in the CI since it is a reflection parsing bug on sqlalchemy end. Let me know what you think. |
There was a problem hiding this comment.
Pull request overview
This PR addresses a Snowflake metadata ingestion failure caused by duplicate unique constraint names across different tables in the same schema, by overriding Snowflake dialect reflection to keep constraints correctly table-scoped.
Changes:
- Add a Snowflake dialect override for
_get_schema_unique_constraintsthat keys constraints by(constraint_name, table_name)to prevent cross-table merging. - Register the new override in the Snowflake dialect monkeypatch setup.
- Add a unit test covering the duplicate-constraint-name collision scenario across multiple tables.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
ingestion/src/metadata/ingestion/source/database/snowflake/utils.py |
Introduces the patched schema-level unique constraint reflection logic to avoid constraint name collisions across tables. |
ingestion/src/metadata/ingestion/source/database/snowflake/metadata.py |
Wires the new _get_schema_unique_constraints override into SnowflakeDialect. |
ingestion/tests/unit/topology/database/test_snowflake_constraints.py |
Adds a regression unit test ensuring constraints are split per table even when names collide. |
| def _get_schema_unique_constraints(self, connection, schema, **kw): | ||
| result = connection.execute( | ||
| text( | ||
| f"SHOW /* sqlalchemy:_get_schema_unique_constraints */ " | ||
| f"UNIQUE KEYS IN SCHEMA {schema}" | ||
| ) | ||
| ) | ||
| unique_constraints = {} | ||
| for row in result: | ||
| name = self.normalize_name(row._mapping["constraint_name"]) | ||
| table_name = self.normalize_name(row._mapping["table_name"]) | ||
|
|
||
| # OpenMetadata Patch: Append the table_name into the uniqueness dictionary | ||
| # to support DBs that allow duplicate constraint names across tables | ||
| constraint_key = (name, table_name) | ||
|
|
||
| if constraint_key not in unique_constraints: | ||
| unique_constraints[constraint_key] = { | ||
| "column_names": [self.normalize_name(row._mapping["column_name"])], | ||
| "name": name, | ||
| "table_name": table_name, | ||
| } | ||
| else: | ||
| unique_constraints[constraint_key]["column_names"].append( | ||
| self.normalize_name(row._mapping["column_name"]) | ||
| ) | ||
|
|
||
| ans = {} | ||
| for constraint in unique_constraints.values(): | ||
| t_name = constraint.pop("table_name") | ||
| if t_name not in ans: | ||
| ans[t_name] = [] | ||
| ans[t_name].append(constraint) | ||
|
|
||
| return ans |
| from unittest.mock import Mock | ||
| from metadata.ingestion.source.database.snowflake.utils import _get_schema_unique_constraints | ||
|
|
||
| def test_snowflake_unique_constraint_collision(): | ||
| # Mocking self (SnowflakeDialect) | ||
| dialect_mock = Mock() | ||
| dialect_mock.normalize_name = lambda name: name.lower() if name else name | ||
|
|
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
| @@ -0,0 +1,53 @@ | |||
| from unittest.mock import Mock | |||
There was a problem hiding this comment.
💡 Quality: UTF-8 BOM added to test file
The latest commit introduces a UTF-8 BOM (byte order mark, \xEF\xBB\xBF) at the start of test_snowflake_constraints.py. While Python's parser tolerates this, it is unconventional for Python source files and no other test file in the repository uses a BOM. The commit message says "set proper UTF-8 encoding" but a BOM is not needed for UTF-8 Python files and is discouraged by PEP 263. This should be removed to stay consistent with the rest of the codebase.
Suggested fix:
Remove the BOM character from the file. In most editors, save as "UTF-8 without BOM". Or:
sed -i '1s/^\xEF\xBB\xBF//' ingestion/tests/unit/topology/database/test_snowflake_constraints.py
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
Code Review 👍 Approved with suggestions 0 resolved / 1 findingsUnique constraint name handling now correctly accounts for Snowflake table scopes. Remove the accidental UTF-8 byte order mark introduced in the test file to ensure cross-platform compatibility. 💡 Quality: UTF-8 BOM added to test file📄 ingestion/tests/unit/topology/database/test_snowflake_constraints.py:1 The latest commit introduces a UTF-8 BOM (byte order mark, Suggested fix🤖 Prompt for agentsOptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
Fixes the Snowflake ingestion pipeline that crashes when multiple tables within the same schema share equivalent constraint names. During ingestion, snowflake.sqlalchemy internally polls unique constraint metadata. Its default logic blindly utilizes the parsed constraint_name as a primary dictionary key. If multiple tables share a constraint with identical names inside a schema, their metadata is overridden or wrongfully merged. This PR patches _get_schema_unique_constraints. By appending the table_name natively as a secondary composite key before returning the aggregate mapping block, all duplicate constraints maintain full table-segregated integrity. Fixes issue #26199