ISSUE #3027 - Better Default by TeddyCr · Pull Request #26158 · open-metadata/OpenMetadata

TeddyCr · 2026-02-27T21:27:46Z

Describe your changes:

This PR implements better default metrics for the profiler, removing system, cardinality and histogram from the default metrics. We are also removing the computeMetrics config as this is no longer relevant for the profiler and are adding in the advance setting the opportunity for users to configure which metrics to compute at the workflow level.

Finally, we renamed the metrics from the Enum registry to aligned with the name of the MetricTypes

Type of change:

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Improvement

I have added tests around the new logic.
For connector/ingestion changes: I updated the documentation.

Copilot

Pull request overview

This PR improves the profiler's default metric configuration by removing computeMetrics (deprecated toggle) and introducing a metrics array field at the workflow level that lets users control which metrics are computed. Histogram and cardinality distribution are removed from defaults to reduce cost. It also renames all Metrics enum keys from UPPER_SNAKE_CASE to camelCase to align with the MetricType schema.

Changes:

Removes computeMetrics from the profiler pipeline schema (JSON, generated TypeScript) and unconditionally invokes compute_metrics() in the profiler core
Adds a metrics array field to databaseServiceProfilerPipeline.json with a curated default set, wiring it through Java (ProfilerWorkflowConfig) and Python (profiler_source.py)
Renames all Metrics enum entries from UPPER_SNAKE_CASE to camelCase and adds schema_metric_type to every metric class; adds valueRank and nullMissingCount to the MetricType enum

Reviewed changes

Copilot reviewed 111 out of 173 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`openmetadata-spec/.../databaseServiceProfilerPipeline.json`	Removes `computeMetrics`, adds `metrics` array with default set, updates `useStatistics` title
`openmetadata-spec/.../profilerConfiguration.json`	Adds `valueRank` and `nullMissingCount` to the `metricType` enum
`openmetadata-ui/.../workflow.ts`, `databaseServiceProfilerPipeline.ts`, `ingestionPipeline.ts`, `createIngestionPipeline.ts`	Generated TypeScript: removes `computeMetrics` field
`openmetadata-ui/.../Services.constant.ts`	Adds `'metrics'` to `ADVANCED_PROPERTIES`
`openmetadata-ui/.../profiler.md`	Removes Compute Metrics section, adds Metrics section
`openmetadata-service/.../ProfilerWorkflowConfig.java`	Builds processor with metrics list from pipeline config
`openmetadata-service/.../IngestionPipelineResourceTest.java`	Adds test for profiler pipeline with metrics
`openmetadata-service/.../WorkflowConfigBuilderTest.java`	Removes `computeMetrics: true` from expected YAML
`ingestion/.../profiler_source.py`	Applies workflow-level metrics override when computing profiler runner
`ingestion/.../default.py`	Removes SYSTEM, HISTOGRAM, CARDINALITY_DISTRIBUTION from default metrics; renames to camelCase
`ingestion/.../core.py`	Removes `computeMetrics` guard, always computes metrics
`ingestion/.../registry.py` (metrics)	Renames all Metrics enum entries from UPPER_SNAKE_CASE to camelCase
All metric classes (static, composed, window, hybrid, system)	Adds `schema_metric_type` attribute to each metric class
All data quality validations	Updates all references from `Metrics.X_Y` to `Metrics.xY` naming
`bootstrap/sql/migrations/.../postDataMigrationSQLScript.sql`	Removes `computeMetrics` from existing profiler pipeline configs
New Python test files	Tests for metric registry, profiler models, entity fetcher, converter, sampling across DBs

ingestion/src/metadata/profiler/processor/default.py

ingestion/src/metadata/profiler/source/database/base/profiler_source.py

ingestion/src/metadata/profiler/metrics/static/column_names.py

Copilot · 2026-02-27T21:31:31Z

ingestion/src/metadata/profiler/metrics/static/null_missing_count.py

    ```
    """

+    schema_metric_type = MetricType.nullCount


The NullMissingCount metric has schema_metric_type = MetricType.nullCount, which is the same value assigned to NullCount. These two distinct metric classes should map to different MetricType values. Looking at profilerConfiguration.json, nullMissingCount was added as a separate enum value, so NullMissingCount.schema_metric_type should be MetricType.nullMissingCount, not MetricType.nullCount.

@copilot open a new pull request to apply changes based on this feedback

openmetadata-ui/src/main/resources/ui/src/constants/Services.constant.ts

...a-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json

bootstrap/sql/migrations/native/2.0.0/postgres/postDataMigrationSQLScript.sql

...a-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json

github-actions · 2026-02-27T21:37:58Z

🛡️ TRIVY SCAN RESULT 🛡️

Target: `openmetadata-ingestion-base-slim:trivy (debian 12.13)`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Java`

Vulnerabilities (34)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.12.7	2.15.0
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.13.4	2.15.0
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42003	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4.2
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42004	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4
`com.google.code.gson:gson`	CVE-2022-25647	🚨 HIGH	2.2.4	2.8.9
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.3.0	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.3.0	3.25.5, 4.27.5, 4.28.2
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.7.1	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.7.1	3.25.5, 4.27.5, 4.28.2
`com.nimbusds:nimbus-jose-jwt`	CVE-2023-52428	🚨 HIGH	9.8.1	9.37.2
`com.squareup.okhttp3:okhttp`	CVE-2021-0341	🚨 HIGH	3.12.12	4.9.2
`commons-beanutils:commons-beanutils`	CVE-2025-48734	🚨 HIGH	1.9.4	1.11.0
`commons-io:commons-io`	CVE-2024-47554	🚨 HIGH	2.8.0	2.14.0
`dnsjava:dnsjava`	CVE-2024-25638	🚨 HIGH	2.1.7	3.6.0
`io.airlift:aircompressor`	CVE-2025-67721	🚨 HIGH	0.27	2.0.3
`io.netty:netty-codec-http2`	CVE-2025-55163	🚨 HIGH	4.1.96.Final	4.2.4.Final, 4.1.124.Final
`io.netty:netty-codec-http2`	GHSA-xpw8-rcwv-8f8p	🚨 HIGH	4.1.96.Final	4.1.100.Final
`io.netty:netty-handler`	CVE-2025-24970	🚨 HIGH	4.1.96.Final	4.1.118.Final
`net.minidev:json-smart`	CVE-2021-31684	🚨 HIGH	1.3.2	1.3.3, 2.4.4
`net.minidev:json-smart`	CVE-2023-1370	🚨 HIGH	1.3.2	2.4.9
`org.apache.avro:avro`	CVE-2024-47561	🔥 CRITICAL	1.7.7	1.11.4
`org.apache.avro:avro`	CVE-2023-39410	🚨 HIGH	1.7.7	1.11.3
`org.apache.derby:derby`	CVE-2022-46337	🔥 CRITICAL	10.14.2.0	10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
`org.apache.ivy:ivy`	CVE-2022-46751	🚨 HIGH	2.5.1	2.5.2
`org.apache.mesos:mesos`	CVE-2018-1330	🚨 HIGH	1.4.3	1.6.0
`org.apache.thrift:libthrift`	CVE-2019-0205	🚨 HIGH	0.12.0	0.13.0
`org.apache.thrift:libthrift`	CVE-2020-13949	🚨 HIGH	0.12.0	0.14.0
`org.apache.zookeeper:zookeeper`	CVE-2023-44981	🔥 CRITICAL	3.6.3	3.7.2, 3.8.3, 3.9.1
`org.eclipse.jetty:jetty-server`	CVE-2024-13009	🚨 HIGH	9.4.56.v20240826	9.4.57.v20241219
`org.lz4:lz4-java`	CVE-2025-12183	🚨 HIGH	1.8.0	1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Node.js`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Python`

Vulnerabilities (9)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`apache-airflow`	CVE-2025-68438	🚨 HIGH	3.1.5	3.1.6
`apache-airflow`	CVE-2025-68675	🚨 HIGH	3.1.5	3.1.6, 2.11.1
`cryptography`	CVE-2026-26007	🚨 HIGH	42.0.8	46.0.5
`jaraco.context`	CVE-2026-23949	🚨 HIGH	6.0.1	6.1.0
`starlette`	CVE-2025-62727	🚨 HIGH	0.48.0	0.49.1
`urllib3`	CVE-2025-66418	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2025-66471	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2026-21441	🚨 HIGH	1.26.20	2.6.3
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/extended_sample_data.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/lineage.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data.json`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data_aut.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage.json`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage_aut.yaml`

No Vulnerabilities Found

github-actions · 2026-02-27T21:38:27Z

🛡️ TRIVY SCAN RESULT 🛡️

Target: `openmetadata-ingestion:trivy (debian 12.12)`

Vulnerabilities (4)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`libpam-modules`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2
`libpam-modules-bin`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2
`libpam-runtime`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2
`libpam0g`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Java`

Vulnerabilities (34)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.12.7	2.15.0
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.13.4	2.15.0
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42003	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4.2
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42004	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4
`com.google.code.gson:gson`	CVE-2022-25647	🚨 HIGH	2.2.4	2.8.9
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.3.0	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.3.0	3.25.5, 4.27.5, 4.28.2
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.7.1	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.7.1	3.25.5, 4.27.5, 4.28.2
`com.nimbusds:nimbus-jose-jwt`	CVE-2023-52428	🚨 HIGH	9.8.1	9.37.2
`com.squareup.okhttp3:okhttp`	CVE-2021-0341	🚨 HIGH	3.12.12	4.9.2
`commons-beanutils:commons-beanutils`	CVE-2025-48734	🚨 HIGH	1.9.4	1.11.0
`commons-io:commons-io`	CVE-2024-47554	🚨 HIGH	2.8.0	2.14.0
`dnsjava:dnsjava`	CVE-2024-25638	🚨 HIGH	2.1.7	3.6.0
`io.airlift:aircompressor`	CVE-2025-67721	🚨 HIGH	0.27	2.0.3
`io.netty:netty-codec-http2`	CVE-2025-55163	🚨 HIGH	4.1.96.Final	4.2.4.Final, 4.1.124.Final
`io.netty:netty-codec-http2`	GHSA-xpw8-rcwv-8f8p	🚨 HIGH	4.1.96.Final	4.1.100.Final
`io.netty:netty-handler`	CVE-2025-24970	🚨 HIGH	4.1.96.Final	4.1.118.Final
`net.minidev:json-smart`	CVE-2021-31684	🚨 HIGH	1.3.2	1.3.3, 2.4.4
`net.minidev:json-smart`	CVE-2023-1370	🚨 HIGH	1.3.2	2.4.9
`org.apache.avro:avro`	CVE-2024-47561	🔥 CRITICAL	1.7.7	1.11.4
`org.apache.avro:avro`	CVE-2023-39410	🚨 HIGH	1.7.7	1.11.3
`org.apache.derby:derby`	CVE-2022-46337	🔥 CRITICAL	10.14.2.0	10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
`org.apache.ivy:ivy`	CVE-2022-46751	🚨 HIGH	2.5.1	2.5.2
`org.apache.mesos:mesos`	CVE-2018-1330	🚨 HIGH	1.4.3	1.6.0
`org.apache.thrift:libthrift`	CVE-2019-0205	🚨 HIGH	0.12.0	0.13.0
`org.apache.thrift:libthrift`	CVE-2020-13949	🚨 HIGH	0.12.0	0.14.0
`org.apache.zookeeper:zookeeper`	CVE-2023-44981	🔥 CRITICAL	3.6.3	3.7.2, 3.8.3, 3.9.1
`org.eclipse.jetty:jetty-server`	CVE-2024-13009	🚨 HIGH	9.4.56.v20240826	9.4.57.v20241219
`org.lz4:lz4-java`	CVE-2025-12183	🚨 HIGH	1.8.0	1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Node.js`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Python`

Vulnerabilities (21)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`Werkzeug`	CVE-2024-34069	🚨 HIGH	2.2.3	3.0.3
`aiohttp`	CVE-2025-69223	🚨 HIGH	3.12.12	3.13.3
`aiohttp`	CVE-2025-69223	🚨 HIGH	3.13.2	3.13.3
`apache-airflow`	CVE-2025-68438	🚨 HIGH	3.1.5	3.1.6
`apache-airflow`	CVE-2025-68675	🚨 HIGH	3.1.5	3.1.6, 2.11.1
`azure-core`	CVE-2026-21226	🚨 HIGH	1.37.0	1.38.0
`cryptography`	CVE-2026-26007	🚨 HIGH	42.0.8	46.0.5
`google-cloud-aiplatform`	CVE-2026-2472	🚨 HIGH	1.130.0	1.131.0
`google-cloud-aiplatform`	CVE-2026-2473	🚨 HIGH	1.130.0	1.133.0
`jaraco.context`	CVE-2026-23949	🚨 HIGH	5.3.0	6.1.0
`jaraco.context`	CVE-2026-23949	🚨 HIGH	6.0.1	6.1.0
`protobuf`	CVE-2026-0994	🚨 HIGH	4.25.8	6.33.5, 5.29.6
`pyasn1`	CVE-2026-23490	🚨 HIGH	0.6.1	0.6.2
`python-multipart`	CVE-2026-24486	🚨 HIGH	0.0.20	0.0.22
`ray`	CVE-2025-62593	🔥 CRITICAL	2.47.1	2.52.0
`starlette`	CVE-2025-62727	🚨 HIGH	0.48.0	0.49.1
`urllib3`	CVE-2025-66418	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2025-66471	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2026-21441	🚨 HIGH	1.26.20	2.6.3
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `usr/bin/docker`

Vulnerabilities (4)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`stdlib`	CVE-2025-68121	🔥 CRITICAL	v1.25.5	1.24.13, 1.25.7, 1.26.0-rc.3
`stdlib`	CVE-2025-61726	🚨 HIGH	v1.25.5	1.24.12, 1.25.6
`stdlib`	CVE-2025-61728	🚨 HIGH	v1.25.5	1.24.12, 1.25.6
`stdlib`	CVE-2025-61730	🚨 HIGH	v1.25.5	1.24.12, 1.25.6

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO`

No Vulnerabilities Found

ingestion/src/metadata/profiler/processor/default.py

ingestion/tests/unit/observability/data_quality/test_metric_registry.py

gitar-bot · 2026-02-27T21:38:43Z

...a-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json

      "title": "Timeout (in sec.)"
+    },
+    "metrics": {
+      "description": "List of metrics to compute. If empty, then all metrics will be computed",


💡 Edge Case: Schema description contradicts new default behavior

The metrics field description says "List of metrics to compute. If empty, then all metrics will be computed", but the field now has a non-empty default list of 20 metrics. This means new pipelines will never have an empty metrics list (they'll get the defaults). Users expecting "empty = all metrics" may be confused since the field description implies they can clear the list to compute all metrics, but that behavior would need verification in the profiler source code. Consider updating the description to reflect the new behavior, e.g., "List of metrics to compute. Defaults to a core set of metrics excluding expensive ones like histogram and cardinalityDistribution."

_{Was this helpful? React with 👍 / 👎}

github-actions · 2026-02-27T22:00:20Z

Jest test Coverage

UI tests summary

Lines	Statements	Branches	Functions
	65.8% (56730/86217)	45.3% (29748/65673)	47.98% (8946/18645)

gitar-bot · 2026-02-28T01:27:16Z

🔍 CI failure analysis for bdcb8f1: Multiple CI failures: (1) generate-types job failing due to infrastructure issues (3 failures), (2) maven-collate-ci jobs failing due to fork PR limitations (2 failures), (3) Python unit tests failing across 4 jobs due to outdated metric names in test fixtures - THIS MUST BE FIXED.

Issue 1: generate-types CI Job - Persistent Infrastructure Failure

The generate-types CI job has failed three times during the repository checkout phase before any code could be analyzed.

Root Cause: Persistent infrastructure failure. The GitHub Actions runner repeatedly fails to fetch branch ISSUE-3027 with exit code 1.

Failure occurrences:

Commit a2ddea2e (job 65199599741): Failed at 21:27:56 UTC (Feb 27)
Commit dfc61cfb (job 65204617941): Failed at 22:18:18 UTC (Feb 27)
Commit bdcb8f12 (job 65217769863): Failed at 01:20:19 UTC (Feb 28)

Issue 2: maven-collate-ci Jobs - Fork PR Limitation

The maven-collate-ci jobs have triggered collate workflows that consistently fail when trying to checkout PR commits in the OpenMetadata submodule.

Root Cause: Git submodules cannot checkout commits from forks - they can only reference commits that exist in the configured submodule repository.

Failure occurrences:

Commit dfc61cfb → collate run 22506067101
Commit bdcb8f12 → collate run 22510247136

This is a known limitation for fork PRs.

Issue 3: Unit Tests - Metric Name Validation Failures ⚠️ REQUIRES FIX

The Python unit tests are failing across 4 jobs due to incompatible metric names in test fixtures.

Affected jobs:

Unit Tests & Static Checks (3.10) - job 65217769805
Unit Tests & Static Checks (3.11) - job 65217769819
py-run-tests (3.10) - job 65217769945
py-run-tests (3.11) - job 65217769935 (ran for 69 minutes, likely same failures)

Root Cause: The PR renamed Metrics enum values from uppercase/snake_case to camelCase:

ROW_COUNT → rowCount
NULL_COUNT → nullCount
COUNT → valuesCount
NULL_MISSING_COUNT → nullMissingCount
COUNT_IN_SET → countInSet

However, test fixtures still reference the old metric names, causing Pydantic validation errors:

pydantic_core._pydantic_core.ValidationError: 3 validation errors for ProfilerProcessorConfig
  Value error, Metric name row_count is not a proper metric name from the Registry
  Value error, Metric name COUNT is not a proper metric name from the Registry
  Value error, Metric name null_count is not a proper metric name from the Registry

Test results:

3 failed tests across all affected jobs
5 error tests (all in ProfilerPartitionUnitTest, failing at setup)
4069 passed, 12 skipped, 1 xfailed

Failed/Error tests:

tests/unit/observability/profiler/test_workflow.py::test_profile_def
tests/unit/observability/profiler/test_workflow.py::test_default_profile_def
tests/unit/observability/profiler/test_profiler_models.py::test_valid_metrics
tests/unit/observability/profiler/test_profiler_partitions.py::ProfilerPartitionUnitTest (5 tests, all failing at setup)

Files requiring updates:

Test configuration fixtures in tests/unit/observability/profiler/ that hardcode metric names
Any test data using old metric names like row_count, null_count, COUNT, etc.

Summary

Infrastructure failures (unrelated to PR changes):

✓ generate-types job - infrastructure issues (can be ignored or retried)
✓ maven-collate-ci jobs - fork PR limitation (expected for fork PRs)

Test failures (RELATED to PR changes - MUST BE FIXED):
3. ❌ Python unit tests (4 jobs) - test fixtures must be updated to use new camelCase metric names to match the renamed Metrics enum

Code Review ⚠️ Changes requested 1 resolved / 4 findings

Large-scale metric naming refactor (SCREAMING_SNAKE → camelCase) is mechanically correct across 180+ files. However, the new workflow-level metrics configuration has an inverted precedence bug where profiler-specific config overrides source config, contradicting the intended behavior. Previous findings about the schema description mismatch and silent system metric removal remain unresolved.

⚠️

Bug: System metric silently removed from default profiler metrics

📄 ingestion/src/metadata/profiler/processor/default.py:35

The system metric (which collects DML operation counts, affected rows, and freshness metrics for Snowflake, BigQuery, Redshift, and Databricks) was removed from get_default_metrics() along with the camelCase rename. The PR description and JSON schema defaults only mention removing histogram and cardinalityDistribution as expensive metrics, but the system metric removal appears to be an accidental omission during the refactor rather than an intentional exclusion.

The system metric is still registered in the Metrics enum, still in the MetricType schema, and all its infrastructure (_prepare_system_metrics, SystemProfile, database-specific implementations) remains in place. Without it in defaults, system profiles will no longer be collected for any database unless explicitly configured.

If this removal is intentional, it should be documented. If not, it should be restored.

Suggested fix

    # Table Metrics
    metrics_registry.rowCount.value,
    add_props(table=table)(metrics_registry.columnCount.value),
    add_props(table=table)(metrics_registry.columnNames.value),
    # System Metrics
    add_props(table=table, ometa_client=ometa_client, db_service=db_service)(
        metrics_registry.system.value
    ),
    # Column Metrics
    metrics_registry.median.value,

⚠️

Bug: Metric precedence inverted: profiler config overrides source config

📄 ingestion/src/metadata/profiler/source/database/base/profiler_source.py:238

The reference_metrics assignment at lines 238-242 gives profiler_config.profiler.metrics precedence over source_metrics (workflow-level config):

reference_metrics = (
    profiler_config.profiler.metrics
    if profiler_config.profiler
    else source_metrics
)

This contradicts the PR's stated intent: "When metrics are specified in the source configuration, they take precedence over profiler-specific config."

Additionally, when profiler_config.profiler exists but has metrics=None (e.g., a ProfilerDef with only a name set), reference_metrics becomes None — discarding the user's explicit source_metrics and falling through to get_default_metrics(). This means a user who configures metrics at the workflow level would have them silently ignored if a profiler definition (without metrics) also exists.

Suggested fix

        reference_metrics = (
            source_metrics
            if source_metrics
            else profiler_config.profiler.metrics
            if profiler_config.profiler
            else None
        )

💡 Edge Case: Schema description contradicts new default behavior

📄 openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json:121

The metrics field description says "List of metrics to compute. If empty, then all metrics will be computed", but the field now has a non-empty default list of 20 metrics. This means new pipelines will never have an empty metrics list (they'll get the defaults). Users expecting "empty = all metrics" may be confused since the field description implies they can clear the list to compute all metrics, but that behavior would need verification in the profiler source code. Consider updating the description to reflect the new behavior, e.g., "List of metrics to compute. Defaults to a core set of metrics excluding expensive ones like histogram and cardinalityDistribution."

✅ 1 resolved

✅ Bug: Missing __init__.py in observability/ breaks test discovery

📄 ingestion/tests/unit/observability/data_quality/test_metric_registry.py:1
Tests were moved from ingestion/tests/unit/data_quality/ and ingestion/tests/unit/profiler/ into the new ingestion/tests/unit/observability/ directory. The child directories have __init__.py files (carried over from the original locations), but the new parent directory ingestion/tests/unit/observability/ is missing its __init__.py.

With pytest's default prepend import mode (no custom importmode is configured in pyproject.toml), Python's package import system requires __init__.py files for proper package discovery. Without it at the observability/ level, test modules that require imports across this package boundary (e.g., conftest fixtures, shared utilities) may fail to resolve correctly. This could cause all moved profiler and data quality unit tests to silently not run or to fail with import errors.

Fix: Add an empty ingestion/tests/unit/observability/__init__.py file.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

`Auto-apply`	`Compact`
`gitar auto-apply:on`	`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

gitar-bot · 2026-02-28T01:28:53Z

ingestion/src/metadata/profiler/source/database/base/profiler_source.py

-            [metrics_registry.get(name) for name in profiler_config.profiler.metrics]
-            if profiler_config.profiler.metrics
-            else get_default_metrics(
+        reference_metrics = (


⚠️ Bug: Metric precedence inverted: profiler config overrides source config

The reference_metrics assignment at lines 238-242 gives profiler_config.profiler.metrics precedence over source_metrics (workflow-level config):

reference_metrics = ( profiler_config.profiler.metrics if profiler_config.profiler else source_metrics )

This contradicts the PR's stated intent: "When metrics are specified in the source configuration, they take precedence over profiler-specific config."

Additionally, when profiler_config.profiler exists but has metrics=None (e.g., a ProfilerDef with only a name set), reference_metrics becomes None — discarding the user's explicit source_metrics and falling through to get_default_metrics(). This means a user who configures metrics at the workflow level would have them silently ignored if a profiler definition (without metrics) also exists.

Suggested fix:

reference_metrics = ( source_metrics if source_metrics else profiler_config.profiler.metrics if profiler_config.profiler else None )

_{Was this helpful? React with 👍 / 👎}

sonarqubecloud · 2026-02-28T01:55:11Z

Quality Gate passed for 'open-metadata-ui'

Issues
101 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

sonarqubecloud · 2026-02-28T02:24:08Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
3 New issues
0 Accepted issues

Measures
0 Security Hotspots
68.4% Coverage on New Code
0.1% Duplication on New Code

See analysis details on SonarQube Cloud

edg956

LGTM though I'd check Gitar's suggestions in case they make sense, specially the one about the changing priorities

TeddyCr added 6 commits February 26, 2026 10:55

feat(metric default): move profiler and dq to obs folder

a2dd368

feat(metric default): validate metric registry and schema

9f44b8a

feat(metric default): map metric type name to enum name

ba0910b

Merge remote-tracking branch 'upstream/main' into ISSUE-3027

89ae857

feat(metric default): updated default metrics in profiler

7707333

feat(metric default): migration to remove computeMetrics setting

a2ddea2

TeddyCr requested review from a team as code owners February 27, 2026 21:27

TeddyCr added the Backward-Incompatible-Change label Feb 27, 2026

TeddyCr had a problem deploying to test February 27, 2026 21:27 — with GitHub Actions Error

TeddyCr requested a deployment to test February 27, 2026 21:27 — with GitHub Actions In progress

TeddyCr had a problem deploying to test February 27, 2026 21:27 — with GitHub Actions Error

TeddyCr requested a review from Copilot February 27, 2026 21:27

github-actions bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Feb 27, 2026

Copilot started reviewing on behalf of TeddyCr February 27, 2026 21:28 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes