Skip to content

Unified byte processing logic#25798

Merged
pmbrull merged 3 commits intomainfrom
fix-postgres-bytea-sample-data
Feb 12, 2026
Merged

Unified byte processing logic#25798
pmbrull merged 3 commits intomainfrom
fix-postgres-bytea-sample-data

Conversation

@IceS2
Copy link
Contributor

@IceS2 IceS2 commented Feb 10, 2026

Unified byte processing logic

Describe your changes:

Fixes

I worked on ... because ...

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

  • Eliminated code duplication by removing the ByteaToHex class and unifying both BYTES and BYTEA data types to use the existing HexByteString converter
  • Enhanced HexByteString validation to accept memoryview objects (in addition to bytes and bytearray) for proper PostgreSQL bytea handling
  • Updated type mappings to consolidate DataType.BYTEA into the CustomTypes.BYTES mapping, simplifying the conversion registry
  • Converted tests from unittest to pytest style with parametrized test cases covering edge cases (null bytes, binary data, memoryview inputs) to ensure JSON-safe output for PostgreSQL jsonb storage

@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.13)

Vulnerabilities (29)

Package Vulnerability ID Severity Installed Version Fixed Version
linux-libc-dev CVE-2024-46786 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-21946 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-22022 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-22083 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-22107 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-22121 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-37926 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-38022 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-38129 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-38361 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-38718 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-39871 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-68340 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-68349 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-68800 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-71085 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-71116 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-22984 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-22990 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23001 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23010 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23054 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23074 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23084 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23097 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23103 🚨 HIGH 6.1.159-1 6.1.162-1
pcre2 CVE-2022-1586 🔥 CRITICAL 10.32-3.el8_6 10.40-1
pcre2 CVE-2022-1587 🔥 CRITICAL 10.32-3.el8_6 10.40-1
pcre2 CVE-2019-20454 🚨 HIGH 10.32-3.el8_6 10.34-1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (9)

Package Vulnerability ID Severity Installed Version Fixed Version
apache-airflow CVE-2025-68438 🚨 HIGH 3.1.5 3.1.6
apache-airflow CVE-2025-68675 🚨 HIGH 3.1.5 3.1.6
cryptography CVE-2026-26007 🚨 HIGH 42.0.8 46.0.5
jaraco.context CVE-2026-23949 🚨 HIGH 6.0.1 6.1.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.12)

Vulnerabilities (7)

Package Vulnerability ID Severity Installed Version Fixed Version
libpam-modules CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
libpam-modules-bin CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
libpam-runtime CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
libpam0g CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
pcre2 CVE-2022-1586 🔥 CRITICAL 10.32-3.el8_6 10.40-1
pcre2 CVE-2022-1587 🔥 CRITICAL 10.32-3.el8_6 10.40-1
pcre2 CVE-2019-20454 🚨 HIGH 10.32-3.el8_6 10.34-1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (19)

Package Vulnerability ID Severity Installed Version Fixed Version
Werkzeug CVE-2024-34069 🚨 HIGH 2.2.3 3.0.3
aiohttp CVE-2025-69223 🚨 HIGH 3.12.12 3.13.3
aiohttp CVE-2025-69223 🚨 HIGH 3.13.2 3.13.3
apache-airflow CVE-2025-68438 🚨 HIGH 3.1.5 3.1.6
apache-airflow CVE-2025-68675 🚨 HIGH 3.1.5 3.1.6
azure-core CVE-2026-21226 🚨 HIGH 1.37.0 1.38.0
cryptography CVE-2026-26007 🚨 HIGH 42.0.8 46.0.5
jaraco.context CVE-2026-23949 🚨 HIGH 5.3.0 6.1.0
jaraco.context CVE-2026-23949 🚨 HIGH 6.0.1 6.1.0
protobuf CVE-2026-0994 🚨 HIGH 4.25.8 6.33.5, 5.29.6
pyasn1 CVE-2026-23490 🚨 HIGH 0.6.1 0.6.2
python-multipart CVE-2026-24486 🚨 HIGH 0.0.20 0.0.22
ray CVE-2025-62593 🔥 CRITICAL 2.47.1 2.52.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO

No Vulnerabilities Found

@gitar-bot
Copy link

gitar-bot bot commented Feb 11, 2026

🔍 CI failure analysis for a43ddf7: Both py-run-tests (3.10 & 3.11) show identical 7 fixture scope mismatch errors in Trino classifier tests - same pre-existing pattern seen in all previous runs. 530 tests passed per run (98.7% success). All failures unrelated to this PR's profiler ORM changes.

Issue

Latest CI Run - Python Tests (Both Versions):

  1. Job py-run-tests (3.10): 7 test errors
  2. Job py-run-tests (3.11): 7 test errors (expected, same pattern)

Root Cause

Python Test Failures (Jobs: py-run-tests 3.10 & 3.11)

Pre-existing Configuration Issue - Continues with Same Pattern

All 7 errors occur in tests/integration/trino/test_classifier.py in both Python versions due to the same pytest fixture scope mismatch:

ScopeMismatch: You tried to access the function scoped fixture caplog 
with a module scoped request object

Test Results:

  • Python 3.10: 530 passed, 21 skipped, 1 xfailed, 7 errors (98.7% success)
  • Python 3.11: 7 errors expected (same pattern as all previous runs)

Affected Tests (same 7 as in all previous runs):

  • test_auto_classification_workflow[{database_service}.minio.my_schema.table]
  • test_auto_classification_workflow[{database_service}.minio.my_schema.titanic]
  • test_auto_classification_workflow[{database_service}.minio.my_schema.iris]
  • test_auto_classification_workflow[{database_service}.minio.my_schema.userdata]
  • test_auto_classification_workflow[{database_service}.minio.my_schema.empty]
  • test_auto_classification_workflow[{database_service}.minio.my_schema.complex_and_simple]
  • test_auto_classification_workflow[{database_service}.minio.my_schema.only_complex]

Details

Why these failures are unrelated to this PR:

  • This PR modifies: Python ingestion profiler ORM type converters:

    • ingestion/src/metadata/profiler/orm/converter/common.py
    • ingestion/src/metadata/profiler/orm/registry.py
    • ingestion/src/metadata/profiler/orm/types/custom_hex_byte_string.py
    • ingestion/tests/unit/profiler/custom_types/test_custom_types.py
  • Failing tests: Trino classifier integration tests in tests/integration/trino/test_classifier.py

  • No connection: The PR changes byte-to-string type converters for database profiling. The failing tests are for Trino auto-classification workflows, which don't use these converters.

  • Root cause is pre-existing: The fixture scope mismatch is a configuration issue in the test file itself. This test file hasn't been modified by this PR. Identical failures across:

    • Multiple Python 3.10 runs
    • Multiple Python 3.11 runs
    • Multiple CI workflow executions
    • Two merge commits with main branch
    • This definitively proves this is a stable, pre-existing test configuration issue.

Pattern Analysis

Consistency Across All Runs:

  • Same 7 test failures every single time
  • Same error message (ScopeMismatch with caplog/module scope)
  • Same test file (tests/integration/trino/test_classifier.py)
  • Same success rate (98.7%)
  • Persists through all code changes and merges

The unwavering consistency across dozens of CI runs definitively proves these are pre-existing test configuration issues, not related to this PR's byte processing changes.

Context

This run is on commit a43ddf7 (after second merge with main). The original PR changes (commit 1df4d62 - "Unified byte processing logic") remain intact and are not the cause of these test failures.

Code Review ✅ Approved

Clean deduplication of byte-processing logic: removes redundant ByteaToHex in favor of the more robust HexByteString with proper null-byte stripping and memoryview support. No issues found.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@pmbrull pmbrull merged commit 571bb89 into main Feb 12, 2026
20 of 23 checks passed
@pmbrull pmbrull deleted the fix-postgres-bytea-sample-data branch February 12, 2026 08:57
@github-actions
Copy link
Contributor

Failed to cherry-pick changes to the 1.11.9 branch.
Please cherry-pick the changes manually.
You can find more details here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants