Skip to content

Add methods for data contract retrieval and validation in SDK and API…#26082

Merged
ayush-shah merged 15 commits intoopen-metadata:mainfrom
ayush-shah:datacontracts-sdk
Mar 9, 2026
Merged

Add methods for data contract retrieval and validation in SDK and API…#26082
ayush-shah merged 15 commits intoopen-metadata:mainfrom
ayush-shah:datacontracts-sdk

Conversation

@ayush-shah
Copy link
Member

@ayush-shah ayush-shah commented Feb 25, 2026

This pull request adds several new methods to support advanced data contract management and validation, particularly for working with entities and ODCS YAML, and exposes these capabilities in both the client mixin and the SDK. It also introduces corresponding integration tests to ensure these new features work as expected.

New data contract management and validation methods:

  • Added methods in data_contract_mixin.py to:
    • Retrieve a data contract by entity ID and type (get_data_contract_by_entity_id)
    • Validate a data contract for an entity (validate_data_contract_by_entity_id)
    • Validate data contract creation requests, both as objects and as YAML (validate_data_contract_request, validate_data_contract_request_yaml)
    • Validate ODCS YAML for an entity without importing, and parse ODCS YAML (validate_odcs_yaml, parse_odcs_yaml)
    • Delete all data contract results before a specified timestamp (delete_data_contract_results_before)

SDK enhancements:

  • Exposed the new mixin methods in the SDK class datacontracts.py for easier usage, including get_by_entity, validate_by_entity, validate_request, validate_request_yaml, validate_odcs_yaml, parse_odcs_yaml, and delete_results_before.
  • Updated method signatures to use UUID instead of the previous UuidLike for clarity and type safety [1] [2].

Testing improvements:

  • Added integration tests to verify:
    • Fetching a data contract by entity ID works and is robust to missing associations.
    • Validation of a data contract by entity ID returns the expected result type.
    • Deletion of data contract results before a timestamp returns a boolean and doesn't error.

Other minor changes:

  • Improved imports and typing for better code clarity and correctness [1] [2] [3].

These changes collectively improve the flexibility and robustness of data contract operations, especially for workflows involving entity-based contracts and ODCS YAML.

@github-actions github-actions bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Feb 25, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the ingestion OpenMetadata client mixin and the Python SDK DataContracts entity wrapper to support additional data contract operations (retrieve/validate by entity, validate requests/YAML/ODCS YAML, parse ODCS YAML, and delete results before a timestamp), and adds integration tests covering the new endpoints.

Changes:

  • Added new DataContract endpoints to OMetaDataContractMixin (get/validate by entity, request/YAML validation, ODCS YAML validate/parse, delete results before timestamp).
  • Exposed the new endpoints on the SDK DataContracts class.
  • Added integration tests for the new operations.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
ingestion/src/metadata/ingestion/ometa/mixins/data_contract_mixin.py Adds new REST wrapper methods for entity-based retrieval/validation, request/YAML validation, ODCS YAML utilities, and deletion by timestamp.
ingestion/src/metadata/sdk/entities/datacontracts.py Exposes the new mixin methods through the SDK DataContracts class.
ingestion/tests/integration/ometa/test_ometa_data_contract_api.py Adds integration tests for retrieving/validating contracts by entity and deleting results before a timestamp.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 25, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.13)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (37)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core GHSA-72hv-8253-57qq 🚨 HIGH 2.12.7 2.18.6, 2.21.1, 3.1.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-core GHSA-72hv-8253-57qq 🚨 HIGH 2.13.4 2.18.6, 2.21.1, 3.1.0
com.fasterxml.jackson.core:jackson-core GHSA-72hv-8253-57qq 🚨 HIGH 2.15.2 2.18.6, 2.21.1, 3.1.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.airlift:aircompressor CVE-2025-67721 🚨 HIGH 0.27 2.0.3
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (9)

Package Vulnerability ID Severity Installed Version Fixed Version
apache-airflow CVE-2025-68438 🚨 HIGH 3.1.5 3.1.6
apache-airflow CVE-2025-68675 🚨 HIGH 3.1.5 3.1.6, 2.11.1
cryptography CVE-2026-26007 🚨 HIGH 42.0.8 46.0.5
jaraco.context CVE-2026-23949 🚨 HIGH 6.0.1 6.1.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

@github-actions
Copy link
Contributor

github-actions bot commented Feb 25, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.12)

Vulnerabilities (4)

Package Vulnerability ID Severity Installed Version Fixed Version
libpam-modules CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
libpam-modules-bin CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
libpam-runtime CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
libpam0g CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (38)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core GHSA-72hv-8253-57qq 🚨 HIGH 2.12.7 2.18.6, 2.21.1, 3.1.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-core GHSA-72hv-8253-57qq 🚨 HIGH 2.13.4 2.18.6, 2.21.1, 3.1.0
com.fasterxml.jackson.core:jackson-core GHSA-72hv-8253-57qq 🚨 HIGH 2.15.2 2.18.6, 2.21.1, 3.1.0
com.fasterxml.jackson.core:jackson-core GHSA-72hv-8253-57qq 🚨 HIGH 2.16.1 2.18.6, 2.21.1, 3.1.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.airlift:aircompressor CVE-2025-67721 🚨 HIGH 0.27 2.0.3
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (22)

Package Vulnerability ID Severity Installed Version Fixed Version
Authlib CVE-2026-28802 🚨 HIGH 1.6.6 1.6.7
Werkzeug CVE-2024-34069 🚨 HIGH 2.2.3 3.0.3
aiohttp CVE-2025-69223 🚨 HIGH 3.12.12 3.13.3
aiohttp CVE-2025-69223 🚨 HIGH 3.13.2 3.13.3
apache-airflow CVE-2025-68438 🚨 HIGH 3.1.5 3.1.6
apache-airflow CVE-2025-68675 🚨 HIGH 3.1.5 3.1.6, 2.11.1
azure-core CVE-2026-21226 🚨 HIGH 1.37.0 1.38.0
cryptography CVE-2026-26007 🚨 HIGH 42.0.8 46.0.5
google-cloud-aiplatform CVE-2026-2472 🚨 HIGH 1.130.0 1.131.0
google-cloud-aiplatform CVE-2026-2473 🚨 HIGH 1.130.0 1.133.0
jaraco.context CVE-2026-23949 🚨 HIGH 5.3.0 6.1.0
jaraco.context CVE-2026-23949 🚨 HIGH 6.0.1 6.1.0
protobuf CVE-2026-0994 🚨 HIGH 4.25.8 6.33.5, 5.29.6
pyasn1 CVE-2026-23490 🚨 HIGH 0.6.1 0.6.2
python-multipart CVE-2026-24486 🚨 HIGH 0.0.20 0.0.22
ray CVE-2025-62593 🔥 CRITICAL 2.47.1 2.52.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: usr/bin/docker

Vulnerabilities (3)

Package Vulnerability ID Severity Installed Version Fixed Version
stdlib CVE-2025-68121 🔥 CRITICAL v1.25.5 1.24.13, 1.25.7, 1.26.0-rc.3
stdlib CVE-2025-61726 🚨 HIGH v1.25.5 1.24.12, 1.25.6
stdlib CVE-2025-61728 🚨 HIGH v1.25.5 1.24.12, 1.25.6

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO

No Vulnerabilities Found

Copy link
Contributor

@edg956 edg956 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I'd prefer if we don't ignore errors (even simply logging them), making it harder for client code to identify what went wrong

Comment on lines +164 to +176
def get_data_contract_by_entity_id(
self, entity_id: Uuid, entity_type: str
) -> Optional[DataContract]:
"""
Get the effective data contract for an entity

Args:
entity_id: UUID of the entity
entity_type: Type of the entity (e.g., 'table', 'topic')

Returns:
DataContract if successful, None otherwise
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] I'd prefer to make this meet OpenMetadata.get_by_name implementation allowing client code to choose whether to ignore errors with nullable param when the response is 404 and not ignoring real errors

Comment on lines +480 to +485
except Exception as err:
logger.debug(traceback.format_exc())
logger.warning(
f"Error validating data contract for entity {model_str(entity_id)}: {err}"
)
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] same here

Comment on lines +500 to +503
except Exception as err:
logger.debug(traceback.format_exc())
logger.warning(f"Error validating data contract request: {err}")
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] and here

@ayush-shah ayush-shah marked this pull request as draft March 6, 2026 07:26
)
return None

def validate_data_contract_by_entity_id(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Bug: Unhandled exceptions in validate methods break Optional contract

validate_data_contract_by_entity_id (line 474) and validate_data_contract_request (line 488) lack try/except error handling, unlike every other method in this mixin class. If the API call raises an APIError (e.g., 404, 500, network error), the exception propagates uncaught to callers that expect Optional[DataContractResult] (i.e., they expect None on failure, not an exception).

The SDK methods DataContracts.validate_by_entity and DataContracts.validate_request delegate directly to these, so end users will get unexpected crashes instead of None.

For consistency with the rest of the mixin (e.g., validate_data_contract_request_yaml on line 502 which does wrap in try/except), these should handle errors the same way.

Suggested fix:

def validate_data_contract_by_entity_id(
    self, entity_id: Uuid, entity_type: str
) -> Optional[DataContractResult]:
    """
    Validate a data contract for an entity
    """
    try:
        resp = self.client.post(
            f"{self.get_suffix(DataContract)}/entity/validate"
            f"?entityId={model_str(entity_id)}&entityType={entity_type}"
        )
        if resp:
            return DataContractResult(**resp)
    except Exception as err:
        logger.debug(traceback.format_exc())
        logger.warning(
            f"Error validating data contract for entity {model_str(entity_id)}: {err}"
        )
    return None

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

@gitar-bot
Copy link

gitar-bot bot commented Mar 6, 2026

Code Review ⚠️ Changes requested 5 resolved / 6 findings

Adds data contract retrieval and validation methods to the SDK and API, resolving five issues with exception handling, type hints, and parameter encoding. However, the validate methods lack try/except error handling that other mixin methods implement, breaking the Optional contract pattern.

⚠️ Bug: Unhandled exceptions in validate methods break Optional contract

📄 ingestion/src/metadata/ingestion/ometa/mixins/data_contract_mixin.py:474 📄 ingestion/src/metadata/ingestion/ometa/mixins/data_contract_mixin.py:488

validate_data_contract_by_entity_id (line 474) and validate_data_contract_request (line 488) lack try/except error handling, unlike every other method in this mixin class. If the API call raises an APIError (e.g., 404, 500, network error), the exception propagates uncaught to callers that expect Optional[DataContractResult] (i.e., they expect None on failure, not an exception).

The SDK methods DataContracts.validate_by_entity and DataContracts.validate_request delegate directly to these, so end users will get unexpected crashes instead of None.

For consistency with the rest of the mixin (e.g., validate_data_contract_request_yaml on line 502 which does wrap in try/except), these should handle errors the same way.

Suggested fix
    def validate_data_contract_by_entity_id(
        self, entity_id: Uuid, entity_type: str
    ) -> Optional[DataContractResult]:
        """
        Validate a data contract for an entity
        """
        try:
            resp = self.client.post(
                f"{self.get_suffix(DataContract)}/entity/validate"
                f"?entityId={model_str(entity_id)}&entityType={entity_type}"
            )
            if resp:
                return DataContractResult(**resp)
        except Exception as err:
            logger.debug(traceback.format_exc())
            logger.warning(
                f"Error validating data contract for entity {model_str(entity_id)}: {err}"
            )
        return None
✅ 5 resolved
Bug: Dead code: rest_client check has identical branches

📄 ingestion/src/metadata/sdk/entities/datacontracts.py:271 📄 ingestion/src/metadata/sdk/entities/datacontracts.py:275
In get_by_entity(), both the if rest_client is None branch and the else branch return the exact same expression:

rest_client = getattr(client, "client", None)
if rest_client is None:
    return client.get_data_contract_by_entity_id(entity_id, entity_type)
return client.get_data_contract_by_entity_id(entity_id, entity_type)

This looks like an incomplete refactoring. Other methods in the same file (ODCSExportOperation.execute(), ODCSImportOperation.execute()) properly use the rest_client check — they raise a RuntimeError when rest_client is None and then use rest_client directly for API calls. The BaseEntity._get_rest_client() utility follows the same validated pattern.

This method either:

  1. Should raise RuntimeError when rest_client is None (matching the established pattern), or
  2. Should remove the dead rest_client check entirely, since the mixin method doesn't need the low-level REST client.

Given that get_data_contract_by_entity_id() is a mixin method that uses self.client internally, option 2 (removing the dead code) is likely correct — the other new methods in this PR (e.g., validate_by_entity, validate_request) don't have this check either.

Edge Case: object_name query param not URL-encoded

📄 ingestion/src/metadata/ingestion/ometa/mixins/data_contract_mixin.py:532
The object_name parameter is directly interpolated into the URL without encoding:

url += f"&objectName={object_name}"

Unlike entity_id (UUID, safe characters only) and entity_type (constrained set like "table"/"topic"), object_name is a user-supplied string that could contain spaces, ampersands, or other special characters that would break URL parsing. The codebase already has urllib.parse.quote_plus and urllib.parse.urlencode available (used in es_mixin.py and tests_mixin.py respectively).

While this follows the existing pattern of unencoded f-string interpolation in other mixins, object_name is at higher risk than UUIDs or enum-like type strings.

Quality: validate_data_contract_request typed Any but requires Pydantic model

📄 ingestion/src/metadata/ingestion/ometa/mixins/data_contract_mixin.py:484 📄 ingestion/src/metadata/sdk/entities/datacontracts.py:286
The validate_data_contract_request method accepts create_request: Any but immediately calls create_request.model_dump_json() at line 491, which requires a Pydantic BaseModel. If a non-Pydantic object is passed, the resulting AttributeError will be caught by the broad except Exception and silently swallowed — returning None with only a warning log.

This makes debugging harder for SDK users who pass the wrong type. Consider using a more specific type annotation (e.g., CreateDataContractRequest or BaseModel) to provide clear IDE feedback and fail-fast behavior.

The same Any return type also means callers get no IDE assistance on the response structure.

Quality: validate_by_entity return type widens DataContractResult to Any

📄 ingestion/src/metadata/sdk/entities/datacontracts.py:273 📄 ingestion/src/metadata/ingestion/ometa/mixins/data_contract_mixin.py:466
The mixin method validate_data_contract_by_entity_id correctly returns Optional[DataContractResult], but the SDK wrapper validate_by_entity declares Optional[Any] as its return type. This loses type information for SDK consumers, who won't get IDE autocompletion or type-checker validation on the result.

This is part of a broader pattern where several of the new SDK methods use Optional[Any] return types even when the underlying mixin has a concrete type.

Edge Case: UuidLike alias excludes basic.Uuid, mismatching ensure_uuid

📄 ingestion/src/metadata/sdk/types.py:13 📄 ingestion/src/metadata/sdk/types.py:18
UuidLike is defined as str | UUID, but ensure_uuid(value: UuidLike) explicitly handles _basic.Uuid instances (line 20) and the docstring advertises this. SDK methods like DataContracts.get_by_entity accept UuidLike and pass it through ensure_uuid, so passing a basic.Uuid works at runtime but is a type error according to the annotation.

Including _basic.Uuid in the alias keeps the static type and the runtime behaviour consistent, and lets callers pass schema UUIDs without a type-ignore.

🤖 Prompt for agents
Code Review: Adds data contract retrieval and validation methods to the SDK and API, resolving five issues with exception handling, type hints, and parameter encoding. However, the validate methods lack try/except error handling that other mixin methods implement, breaking the Optional contract pattern.

1. ⚠️ Bug: Unhandled exceptions in validate methods break Optional contract
   Files: ingestion/src/metadata/ingestion/ometa/mixins/data_contract_mixin.py:474, ingestion/src/metadata/ingestion/ometa/mixins/data_contract_mixin.py:488

   `validate_data_contract_by_entity_id` (line 474) and `validate_data_contract_request` (line 488) lack try/except error handling, unlike every other method in this mixin class. If the API call raises an `APIError` (e.g., 404, 500, network error), the exception propagates uncaught to callers that expect `Optional[DataContractResult]` (i.e., they expect `None` on failure, not an exception).
   
   The SDK methods `DataContracts.validate_by_entity` and `DataContracts.validate_request` delegate directly to these, so end users will get unexpected crashes instead of `None`.
   
   For consistency with the rest of the mixin (e.g., `validate_data_contract_request_yaml` on line 502 which does wrap in try/except), these should handle errors the same way.

   Suggested fix:
   def validate_data_contract_by_entity_id(
       self, entity_id: Uuid, entity_type: str
   ) -> Optional[DataContractResult]:
       """
       Validate a data contract for an entity
       """
       try:
           resp = self.client.post(
               f"{self.get_suffix(DataContract)}/entity/validate"
               f"?entityId={model_str(entity_id)}&entityType={entity_type}"
           )
           if resp:
               return DataContractResult(**resp)
       except Exception as err:
           logger.debug(traceback.format_exc())
           logger.warning(
               f"Error validating data contract for entity {model_str(entity_id)}: {err}"
           )
       return None

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 6, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants