feat(profiler): enable complex data type profiling (#15627) by david-mamani · Pull Request #27529 · open-metadata/OpenMetadata

david-mamani · 2026-04-19T20:44:43Z

Summary

Resolves #15627
Enable the profiler to compute a safe subset of metrics (nullCount, valuesCount) for complex data types (JSON, arrays, geo, structs) that were previously fully excluded from profiling.

Problem

OpenMetadata's profiler maintained a NOT_COMPUTE set that completely excluded complex data types from profiling. This meant no metrics at all were collected for columns with types like JSON, ARRAY, GEOMETRY, STRUCT, MAP, etc. - even universal metrics like null counts that work on any column type.

Solution

Architecture: Two-tier type classification

NOT_COMPUTE (reduced) - Contains only truly unprofileable types: NullType, UndeterminedType
1. COMPLEX_TYPES (new) - Contains complex types that receive a restricted, safe subset of metrics

Changes

File	Change
orm/registry.py	Split NOT_COMPUTE into NOT_COMPUTE + COMPLEX_TYPES; added COMPLEX_TYPE_METRICS set and is_complex_type() helper
processor/core.py	Updated _prepare_column_metrics() to route complex columns to limited metrics; updated compute_metrics() to skip composed/hybrid metrics only for fully excluded types
metrics/static/unique_count.py	Extended guard to exclude COMPLEX_TYPES (GROUP BY/subqueries unsafe on complex data)

Safe metrics for complex types

nullCount - Works universally via COUNT(*) - COUNT(col)
- valuesCount - Works universally via COUNT(col)

Test coverage

48 test assertions covering:
- NOT_COMPUTE set content (7 tests)
- COMPLEX_TYPES set content (11 tests)
- is_complex_type() helper function (9 tests)
- COMPLEX_TYPE_METRICS content (10 tests)
- Set isolation / no overlap (11 tests)

Risk assessment

Zero risk to existing types: Integer, String, Float, Boolean, Date columns are completely unaffected
- Minimal scope: Only nullCount and valuesCount are enabled for complex types - no numeric, orderable, or aggregation metrics
- Backward compatible: Types previously in NOT_COMPUTE that move to COMPLEX_TYPES will now get MORE metrics, not fewer

github-actions · 2026-04-19T20:45:11Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Split NOT_COMPUTE into NOT_COMPUTE (truly unprofileable: NullType, UndeterminedType) and COMPLEX_TYPES (JSON, arrays, geo, structs, etc.) that receive a restricted safe subset of metrics (nullCount, valuesCount). Changes: - registry.py: new COMPLEX_TYPES set, COMPLEX_TYPE_METRICS, is_complex_type() - core.py: _prepare_column_metrics() now routes complex columns to limited metrics - unique_count.py: guard extended to exclude COMPLEX_TYPES - Added 48 unit tests validating the registry refactoring

github-actions · 2026-04-19T20:49:59Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

github-actions · 2026-04-19T20:50:56Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot

Pull request overview

Enables profiler support for complex column data types by allowing a limited, “safe” subset of metrics (nullCount, valuesCount) to be computed for types that were previously fully excluded.

Changes:

Split the previous “do not compute” type bucket into truly-unprofileable types vs. complex types eligible for limited metrics.
Updated profiler processor column-metric preparation to route complex columns to COMPLEX_TYPE_METRICS.
Updated UniqueCount to skip complex types.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
ingestion/src/metadata/profiler/orm/registry.py	Introduces `COMPLEX_TYPES`, `COMPLEX_TYPE_METRICS`, and `is_complex_type()`; narrows `NOT_COMPUTE`.
ingestion/src/metadata/profiler/processor/core.py	Routes complex columns to limited static metrics and skips composed/hybrid only for fully excluded types.
ingestion/src/metadata/profiler/metrics/static/unique_count.py	Prevents uniqueCount computation on complex types.
ingestion/tests/unit/observability/profiler/test_complex_type_profiling.py	Adds unit coverage for the registry split and safe metrics set.
ingestion/tests/unit/observability/profiler/run_complex_type_tests.py	Adds a standalone runner for manual/local checks.

Copilot · 2026-04-19T20:52:20Z

+class DataType(str, Enum):
+    INT="INT"; BIGINT="BIGINT"; SMALLINT="SMALLINT"; TINYINT="TINYINT"
+    NUMBER="NUMBER"; NUMERIC="NUMERIC"; DECIMAL="DECIMAL"
+    DOUBLE="DOUBLE"; FLOAT="FLOAT"; JSON="JSON"; ARRAY="ARRAY"
+    MAP="MAP"; STRUCT="STRUCT"; UNION="UNION"; SET="SET"
+    GEOGRAPHY="GEOGRAPHY"; GEOMETRY="GEOMETRY"; ENUM="ENUM"
+    STRING="STRING"; TEXT="TEXT"; CHAR="CHAR"; VARCHAR="VARCHAR"
+    BOOLEAN="BOOLEAN"; DATE="DATE"; DATETIME="DATETIME"
+    TIMESTAMP="TIMESTAMP"; TIME="TIME"; BINARY="BINARY"
+    VARBINARY="VARBINARY"; BLOB="BLOB"; BYTEA="BYTEA"
+    MEDIUMTEXT="MEDIUMTEXT"; NULL="NULL"; SUPER="SUPER"
+    INTERVAL="INTERVAL"; XML="XML"; FIXED="FIXED"
+    LONG="LONG"; BYTES="BYTES"
+
+class MetricType(str, Enum):
+    valuesCount="valuesCount"; nullCount="nullCount"
+    nullProportion="nullProportion"; uniqueCount="uniqueCount"
+    distinctCount="distinctCount"; distinctProportion="distinctProportion"
+    min="min"; max="max"; mean="mean"; sum="sum"; stddev="stddev"
+    median="median"; firstQuartile="firstQuartile"
+    thirdQuartile="thirdQuartile"
+    interQuartileRange="interQuartileRange"
+    nonParametricSkew="nonParametricSkew"
+    columnCount="columnCount"; columnNames="columnNames"
+    rowCount="rowCount"; histogram="histogram"
+    uniqueProportion="uniqueProportion"
+    duplicateCount="duplicateCount"
+    nullMissingCount="nullMissingCount"; system="system"
+


This file is not formatted to the repo’s Python formatting standard (Black/pycln/isort run on ingestion/ via make py_format_check). Examples include multiple statements per line and compact Enum definitions; black --check will fail on this file as-is. Please run the formatter and commit the formatted output, or drop this script from the repo if it’s only for local ad-hoc runs.

Copilot · 2026-04-19T20:52:21Z

+Standalone test runner for complex type profiling changes.
+
+Uses a sys.meta_path finder to intercept ALL metadata.generated.*
+imports, returning permissive stubs. The 'metadata' package itself is
+replaced with a bare module so its __init__.py never runs.
+
+Usage: python run_complex_type_tests.py
+See: https://github.com/open-metadata/OpenMetadata/issues/15627
+"""
+
+import sys
+import os
+import logging
+import importlib
+from types import ModuleType
+from enum import Enum
+
+# ── Prevent script dir from shadowing real packages ──────────────────
+_script_dir = os.path.dirname(os.path.abspath(__file__))
+sys.path = [p for p in sys.path if os.path.abspath(p) != _script_dir]
+
+# ── Ensure ingestion/src is on path ──────────────────────────────────
+_src_dir = os.path.normpath(os.path.join(_script_dir, "..", "..", "..", "..", "src"))
+if _src_dir not in sys.path:
+    sys.path.insert(0, _src_dir)
+
+
+# ════════════════════════════════════════════════════════════════════════
+# 1) Install 'metadata' as a bare package (skip its __init__.py)
+# ════════════════════════════════════════════════════════════════════════
+_meta_pkg = ModuleType("metadata")
+_meta_pkg.__path__ = [os.path.join(_src_dir, "metadata")]
+_meta_pkg.__package__ = "metadata"
+sys.modules["metadata"] = _meta_pkg
+
+
+# ════════════════════════════════════════════════════════════════════════
+# 2) Meta-path finder: auto-stub ALL metadata.generated.* imports
+# ════════════════════════════════════════════════════════════════════════
+_null_logger = logging.getLogger("stub")
+
+
+class _StubModule(ModuleType):
+    """A stub module whose attributes are either explicitly set or
+    fall back to a dummy class that has .__name__, is iterable, etc."""
+
+    class _Dummy:
+        __name__ = "_Dummy"
+        def __init_subclass__(cls, **kw): pass
+        def __init__(self, *a, **kw): pass
+        def __call__(self, *a, **kw): return self
+        def __iter__(self): return iter([])
+        def __bool__(self): return False
+        def __str__(self): return "_Dummy"
+        def items(self): return []
+        def values(self): return []
+        def keys(self): return []
+
+    def __getattr__(self, name):
+        # Return the class (not an instance) so it can be used as a
+        # type annotation, base class, or called to construct instances.
+        return _StubModule._Dummy
+
+
+class _GeneratedFinder:
+    """Intercepts `import metadata.generated.*` and returns stubs."""
+    PREFIX = "metadata.generated"
+
+    def find_module(self, fullname, path=None):
+        if fullname == self.PREFIX or fullname.startswith(self.PREFIX + "."):
+            return self
+        return None
+
+    def load_module(self, fullname):
+        if fullname in sys.modules:
+            return sys.modules[fullname]
+        mod = _StubModule(fullname)
+        mod.__path__ = []
+        mod.__package__ = fullname
+        mod.__loader__ = self
+        sys.modules[fullname] = mod
+        return mod
+
+
+sys.meta_path.insert(0, _GeneratedFinder())
+
+
+# ════════════════════════════════════════════════════════════════════════


This script lives under ingestion/tests/unit/ but won’t be picked up by pytest (it doesn’t match the test_*.py pattern) and it duplicates what the proper unit test file already validates. To avoid dead/duplicated test logic, consider removing it or converting its assertions into a real pytest/unittest test module.

Suggested change

Standalone test runner for complex type profiling changes.

Uses a sys.meta_path finder to intercept ALL metadata.generated.*

imports, returning permissive stubs. The 'metadata' package itself is

replaced with a bare module so its __init__.py never runs.

Usage: python run_complex_type_tests.py

See: https://github.com/open-metadata/OpenMetadata/issues/15627

"""

import sys

import os

import logging

import importlib

from types import ModuleType

from enum import Enum

# ── Prevent script dir from shadowing real packages ──────────────────

_script_dir = os.path.dirname(os.path.abspath(__file__))

sys.path = [p for p in sys.path if os.path.abspath(p) != _script_dir]

# ── Ensure ingestion/src is on path ──────────────────────────────────

_src_dir = os.path.normpath(os.path.join(_script_dir, "..", "..", "..", "..", "src"))

if _src_dir not in sys.path:

sys.path.insert(0, _src_dir)

# ════════════════════════════════════════════════════════════════════════

# 1) Install 'metadata' as a bare package (skip its __init__.py)

# ════════════════════════════════════════════════════════════════════════

_meta_pkg = ModuleType("metadata")

_meta_pkg.__path__ = [os.path.join(_src_dir, "metadata")]

_meta_pkg.__package__ = "metadata"

sys.modules["metadata"] = _meta_pkg

# ════════════════════════════════════════════════════════════════════════

# 2) Meta-path finder: auto-stub ALL metadata.generated.* imports

# ════════════════════════════════════════════════════════════════════════

_null_logger = logging.getLogger("stub")

class _StubModule(ModuleType):

"""A stub module whose attributes are either explicitly set or

fall back to a dummy class that has .__name__, is iterable, etc."""

class _Dummy:

__name__ = "_Dummy"

def __init_subclass__(cls, **kw): pass

def __init__(self, *a, **kw): pass

def __call__(self, *a, **kw): return self

def __iter__(self): return iter([])

def __bool__(self): return False

def __str__(self): return "_Dummy"

def items(self): return []

def values(self): return []

def keys(self): return []

def __getattr__(self, name):

# Return the class (not an instance) so it can be used as a

# type annotation, base class, or called to construct instances.

return _StubModule._Dummy

class _GeneratedFinder:

"""Intercepts `import metadata.generated.*` and returns stubs."""

PREFIX = "metadata.generated"

def find_module(self, fullname, path=None):

if fullname == self.PREFIX or fullname.startswith(self.PREFIX + "."):

return self

return None

def load_module(self, fullname):

if fullname in sys.modules:

return sys.modules[fullname]

mod = _StubModule(fullname)

mod.__path__ = []

mod.__package__ = fullname

mod.__loader__ = self

sys.modules[fullname] = mod

return mod

sys.meta_path.insert(0, _GeneratedFinder())

# ════════════════════════════════════════════════════════════════════════

Deprecated standalone test runner.

This file intentionally no longer contains executable test logic.

The canonical assertions for complex type profiling belong in the

proper pytest test module that is already collected by the unit test

suite, which avoids keeping duplicated or non-discoverable tests

under ``ingestion/tests/unit``.

"""

# ════════════════════════════════════════════════════════════════════════

Copilot · 2026-04-19T20:52:21Z

@@ -443,6 +458,28 @@ def _prepare_column_metrics(self) -> List:
                    )
                )

+        # Add safe metrics for complex type columns
+        for column in complex_columns:
+            safe_metrics = [
+                metric
+                for metric in self.metric_filter.get_column_metrics(
+                    StaticMetric,
+                    column,
+                    self.profiler_interface.table_entity.serviceType,
+                )
+                if not metric.is_window_metric()
+                and metric.name() in COMPLEX_TYPE_METRICS
+            ]
+            if safe_metrics:
+                column_metrics_for_thread_pool.append(
+                    ThreadPoolMetrics(
+                        metrics=safe_metrics,
+                        metric_type=MetricTypes.Static,
+                        column=column,
+                        table=self.table,
+                    )
+                )
+


Core behavior change (routing complex columns to COMPLEX_TYPE_METRICS) isn’t exercised by tests. Add/extend an existing profiler unit test to assert that ARRAY/JSON/GEOGRAPHY columns only schedule nullCount & valuesCount and do not schedule query/window metrics (and that regular columns remain unaffected).

Copilot · 2026-04-19T20:52:21Z

@@ -135,8 +143,16 @@ class Dialects(metaclass=EnumAdapter):
    CustomTypes.ARRAY.value.__name__,
    CustomTypes.SQADATETIMERANGE.value.__name__,
    DataType.XML.value,
-    CustomTypes.UNDETERMINED.value.__name__,
 }


COMPLEX_TYPES is missing the Geo type name "GEOGRAPHY" (e.g., created via create_sqlalchemy_type("GEOGRAPHY") in multiple dialects), so GEOGRAPHY columns will be treated as regular types and may still attempt unsafe metrics (e.g., DISTINCT/GROUP BY). Add DataType.GEOGRAPHY.value (or an equivalent "GEOGRAPHY" entry) to COMPLEX_TYPES to ensure they only get COMPLEX_TYPE_METRICS.

Copilot · 2026-04-19T20:52:22Z

+See: https://github.com/open-metadata/OpenMetadata/issues/15627
+"""
+
+import importlib


This test file has an unused import (importlib) which will be flagged by pycln during make py_format_check (it runs on the whole ingestion/ directory). Remove the unused import to keep formatting checks passing.

Suggested change

import importlib

Copilot · 2026-04-19T20:52:22Z

+    def test_sqa_geography_in_complex_types(self):
+        """SQASGeography should be in COMPLEX_TYPES."""
+        self.assertIn(SQASGeography.__name__, COMPLEX_TYPES)
+
+    def test_geometry_in_complex_types(self):
+        """GEOMETRY should be in COMPLEX_TYPES."""
+        self.assertIn("GEOMETRY", COMPLEX_TYPES)
+
+    def test_xml_in_complex_types(self):
+        """XML should be in COMPLEX_TYPES."""
+        self.assertIn("XML", COMPLEX_TYPES)
+
+    def test_datetimerange_in_complex_types(self):
+        """CustomDateTimeRange should be in COMPLEX_TYPES."""
+        self.assertIn(CustomDateTimeRange.__name__, COMPLEX_TYPES)
+


The new registry behavior for complex geo types isn’t covered for the common SQLAlchemy type class name "GEOGRAPHY" (created via create_sqlalchemy_type in Snowflake/Redshift/BigQuery/etc.). Add assertions here that "GEOGRAPHY" is included in COMPLEX_TYPES (and not in NOT_COMPUTE) so the test suite catches regressions for geo columns.

github-actions · 2026-04-20T01:56:14Z

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

github-actions · 2026-04-20T03:59:19Z

🔴 Playwright Results — 1 failure(s), 17 flaky

✅ 3669 passed · ❌ 1 failed · 🟡 17 flaky · ⏭️ 89 skipped

Shard	Passed	Failed	Flaky	Skipped
🔴 Shard 1	478	1	2	4
🟡 Shard 2	652	0	1	7
🟡 Shard 3	654	0	5	1
🟡 Shard 4	631	0	3	27
🟡 Shard 5	610	0	1	42
🟡 Shard 6	644	0	5	8

Genuine Failures (failed on all attempts)

❌ Pages/SearchIndexApplication.spec.ts › Search Index Application (shard 1)

Error: �[2mexpect(�[22m�[31mreceived�[39m�[2m).�[22mtoEqual�[2m(�[22m�[32mexpected�[39m�[2m) // deep equality�[22m

Expected: �[32mStringMatching /success|activeError/g�[39m
Received: �[31m"failed"�[39m

🟡 17 flaky test(s) (passed on retry)

Pages/Customproperties-part1.spec.ts › no duplicate card after update (shard 1, 1 retry)
Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 1 retry)
Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 1 retry)
Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
Flow/CustomizeWidgets.spec.ts › Domains Widget (shard 3, 1 retry)
Flow/SchemaTable.spec.ts › schema table test (shard 3, 1 retry)
Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
Pages/Domains.spec.ts › Domain Rbac (shard 4, 1 retry)
Pages/Entity.spec.ts › Tier Add, Update and Remove (shard 4, 1 retry)
Pages/ExplorePageRightPanel.spec.ts › Should allow Data Consumer to view all tabs for topic (shard 5, 1 retry)
Pages/Lineage/DataAssetLineage.spec.ts › verify create lineage for entity - Container (shard 6, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
Pages/Lineage/PlatformLineage.spec.ts › Verify domain platform view (shard 6, 1 retry)
Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

github-actions · 2026-04-20T05:13:51Z

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

gitar-bot · 2026-04-20T06:40:20Z

Code Review ✅ Approved

Enables complex data type profiling to improve instrumentation depth. No issues found.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Copilot · 2026-04-20T06:44:50Z

+class _GeneratedFinder:
+    """Intercepts `import metadata.generated.*` and returns stubs."""
+
+    PREFIX = "metadata.generated"
+
+    def find_module(self, fullname, path=None):
+        if fullname == self.PREFIX or fullname.startswith(self.PREFIX + "."):
+            return self
+        return None
+
+    def load_module(self, fullname):
+        if fullname in sys.modules:
+            return sys.modules[fullname]
+        mod = _StubModule(fullname)
+        mod.__path__ = []
+        mod.__package__ = fullname
+        mod.__loader__ = self
+        sys.modules[fullname] = mod
+        return mod
+


_GeneratedFinder implements the deprecated find_module/load_module import hook API, which is discouraged and can break with newer Python importlib behavior. If this runner is kept, consider using importlib.abc.MetaPathFinder + importlib.abc.Loader (find_spec/exec_module) instead.

Copilot · 2026-04-20T06:44:50Z

+See: https://github.com/open-metadata/OpenMetadata/issues/15627
+"""
+
+import importlib


importlib is imported but never used in this test module. Please remove the unused import to keep the test clean.

Suggested change

import importlib

Copilot · 2026-04-20T06:44:50Z

+def _bootstrap_generated_stub(src_dir):
+    """Creates minimal stubs for metadata.generated so that the
+    orm.registry module can be imported in environments where
+    the full code-generation pipeline has not been run.
+    """


_bootstrap_generated_stub takes a src_dir parameter but doesn’t use it. Either remove the parameter or use it (e.g., for locating/creating the stub package paths) to avoid dead arguments.

Copilot · 2026-04-20T06:44:51Z

+    # Wire up stub modules in sys.modules
+    for mod_name in [
+        "metadata.generated",
+        "metadata.generated.schema",
+        "metadata.generated.schema.entity",
+        "metadata.generated.schema.entity.data",
+        "metadata.generated.schema.entity.data.table",
+        "metadata.generated.schema.configuration",
+        "metadata.generated.schema.configuration.profilerConfiguration",
+        "metadata.generated.schema.api",
+        "metadata.generated.schema.api.data",
+        "metadata.generated.schema.api.data.createTableProfile",
+        "metadata.generated.schema.entity.services",
+        "metadata.generated.schema.entity.services.databaseService",
+        "metadata.generated.schema.entity.services.connections",
+        "metadata.generated.schema.entity.services.connections.database",
+        "metadata.generated.schema.entity.services.connections.database.sqliteConnection",
+        "metadata.generated.schema.entity.services.connections.metadata",
+        "metadata.generated.schema.entity.services.connections.metadata.openMetadataConnection",
+        "metadata.generated.schema.settings",
+        "metadata.generated.schema.settings.settings",
+        "metadata.generated.schema.tests",
+        "metadata.generated.schema.tests.customMetric",
+        "metadata.generated.schema.type",
+        "metadata.generated.schema.type.basic",
+    ]:
+        if mod_name not in sys.modules:
+            stub = ModuleType(mod_name)
+            sys.modules[mod_name] = stub
+


When the generated-schema import is missing, this test mutates sys.modules by installing large metadata.generated.* stubs and doesn’t restore them afterward. That can leak into other tests in the same run and change their behavior. Consider scoping the stubs to the test (e.g., using a context manager/fixture that cleans up sys.modules entries after import) or skipping the test when metadata.generated is unavailable.

Copilot · 2026-04-20T06:44:51Z

+        # Add safe metrics for complex type columns
+        for column in complex_columns:
+            safe_metrics = [
+                metric
+                for metric in self.metric_filter.get_column_metrics(
+                    StaticMetric,
+                    column,
+                    self.profiler_interface.table_entity.serviceType,
+                )
+                if not metric.is_window_metric()
+                and metric.name() in COMPLEX_TYPE_METRICS
+            ]
+            if safe_metrics:
+                column_metrics_for_thread_pool.append(
+                    ThreadPoolMetrics(
+                        metrics=safe_metrics,
+                        metric_type=MetricTypes.Static,
+                        column=column,
+                        table=self.table,
+                    )
+                )
+


The new complex-type path in _prepare_column_metrics() (routing complex columns to COMPLEX_TYPE_METRICS) isn’t covered by existing profiler unit tests. Please add a unit test that builds a table with a complex column (e.g., JSON/ARRAY) and asserts that only nullCount/valuesCount are scheduled/computed for it (and that unsafe metrics like uniqueCount/distinctCount are not).

Copilot · 2026-04-20T06:44:51Z

        for column in self.columns:
+            # Skip composed/hybrid metrics for columns that are fully excluded
+            if column.type.__class__.__name__ in NOT_COMPUTE:
+                continue
            self.run_composed_metrics(column)
            self.run_hybrid_metrics(column)


The new logic still runs composed metrics for complex columns (since only NOT_COMPUTE is skipped), which will compute at least nullProportion (and potentially add other composed/hybrid keys as None). This doesn’t match the PR description of restricting complex types to only nullCount/valuesCount. Consider either skipping composed/hybrid metrics for complex types as well, or updating the PR description/implementation to explicitly allow additional derived metrics for complex types.

Copilot · 2026-04-20T06:44:51Z

+"""
+Standalone test runner for complex type profiling changes.
+
+Uses a sys.meta_path finder to intercept ALL metadata.generated.*
+imports, returning permissive stubs. The 'metadata' package itself is
+replaced with a bare module so its __init__.py never runs.
+
+Usage: python run_complex_type_tests.py
+See: https://github.com/open-metadata/OpenMetadata/issues/15627
+"""


This file lacks the standard repository license header present in other ingestion Python files (e.g., ingestion/src/metadata/profiler/metrics/static/unique_count.py:1). Please add the appropriate header or remove the script from the repo if it’s only for local/manual runs.

github-actions · 2026-04-20T06:46:13Z

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

sonarqubecloud · 2026-04-20T07:41:52Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
75.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Copilot AI review requested due to automatic review settings April 19, 2026 20:44

david-mamani requested a review from a team as a code owner April 19, 2026 20:44

Copilot started reviewing on behalf of david-mamani April 19, 2026 20:45 View session

david-mamani force-pushed the feat/15627-profiler-complex-data-types branch from 047143e to fdd2841 Compare April 19, 2026 20:50

Copilot AI reviewed Apr 19, 2026

View reviewed changes

harshach added the safe to test Add this label to run secure Github workflows on PRs label Apr 20, 2026

harshach temporarily deployed to test April 20, 2026 02:00 — with GitHub Actions Inactive

harshach had a problem deploying to test April 20, 2026 02:00 — with GitHub Actions Failure

Merge branch 'main' into feat/15627-profiler-complex-data-types

9448b3f

david-mamani had a problem deploying to test April 20, 2026 05:18 — with GitHub Actions Error

david-mamani temporarily deployed to test April 20, 2026 05:18 — with GitHub Actions Inactive

david-mamani had a problem deploying to test April 20, 2026 05:18 — with GitHub Actions Error

david-mamani temporarily deployed to test April 20, 2026 05:18 — with GitHub Actions Inactive

david-mamani had a problem deploying to test April 20, 2026 05:18 — with GitHub Actions Error

style: apply black/isort formatting for py-checkstyle CI

a451825

Copilot AI review requested due to automatic review settings April 20, 2026 06:38

Copilot started reviewing on behalf of david-mamani April 20, 2026 06:39 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

david-mamani had a problem deploying to test April 20, 2026 06:49 — with GitHub Actions Failure

david-mamani temporarily deployed to test April 20, 2026 06:49 — with GitHub Actions Inactive

Conversation

david-mamani commented Apr 19, 2026

Summary

Problem

Solution

Architecture: Two-tier type classification

Changes

Safe metrics for complex types

Test coverage

Risk assessment

Uh oh!

github-actions bot commented Apr 19, 2026

Uh oh!

github-actions bot commented Apr 19, 2026

Uh oh!

github-actions bot commented Apr 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 20, 2026

Uh oh!

github-actions bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔴 Playwright Results — 1 failure(s), 17 flaky

Genuine Failures (failed on all attempts)

Uh oh!

github-actions bot commented Apr 20, 2026

Uh oh!

gitar-bot bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 20, 2026

Uh oh!

sonarqubecloud bot commented Apr 20, 2026

github-actions bot commented Apr 20, 2026 •

edited

Loading

gitar-bot bot commented Apr 20, 2026 •

edited

Loading