Fixes #26200: Fix BigQuery string bindings on uniqueCount CTE for binary columns by aniruddhaadak80 · Pull Request #27256 · open-metadata/OpenMetadata

aniruddhaadak80 · 2026-04-10T17:15:52Z

What it does

Fixes the BigQuery profiler pipeline that crashes on BYTES / BINARY columns during uniqueCount calculation due to No matching signature for operator = for argument types: INT64, STRING.

How it does it

SQLAlchemy BigQuery metric runner passes the original metric type (like STRING) into the COUNTIF(col == 1) check. However, in the sqa_profiler_interface.py execution, BigQuery executes the metric label query via a wrapping CTE where data acts as an INT64 COUNT output. SQLAlchemy then attempts to compare the INT64 count returned by the subquery against a bound STRING '1'. Using a standard, un-typed generic column(col.name) instead skips the aggressive data type injection and solves the BigQuery mismatch error.

Fixes #26200

…en-metadata#26737)

…iqueCount\ for binary columns

github-actions · 2026-04-10T17:16:17Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot

Pull request overview

This PR aims to fix the BigQuery profiler’s uniqueCount calculation on BYTES/BINARY columns by avoiding an incorrect STRING-typed bind in the COUNTIF(... = 1) comparison when the metric is executed via the “Label + wrapping subquery” path.

Changes:

Adjust BigQuery UniqueCount SQLAlchemy expression to use an untyped column reference for the COUNTIF(col == 1) comparison.
Update CollectionDAO (UserDAO) list queries to pass filter.getQueryParams() into the underlying JDBI queries.
Add tag-pruning logic in TableRepository.addDataModel(...) for table and column tags.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
ingestion/src/metadata/profiler/metrics/static/unique_count.py	Uses an untyped `column(self.col.name)` for BigQuery `COUNTIF` to prevent STRING binding when comparing against `1`.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java	Adds `filter.getQueryParams()` binding to UserDAO list methods and propagates it into the DAO query signatures.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TableRepository.java	Prunes AUTOMATED tags not present in incoming DataModel tags for tables and columns before persisting/applying tags.

Copilot · 2026-04-10T17:20:57Z

+    if (table.getTags() != null) {
+      java.util.List<String> incomingTags = dataModel.getTags() != null 
+          ? dataModel.getTags().stream().map(org.openmetadata.schema.type.TagLabel::getTagFQN).collect(java.util.stream.Collectors.toList())
+          : java.util.Collections.emptyList();
+      mergedTableTags.removeIf(t -> t.getLabelType() == org.openmetadata.schema.type.TagLabel.LabelType.AUTOMATED && !incomingTags.contains(t.getTagFQN()));
+    }


Current logic will remove all existing AUTOMATED table tags whenever table.getTags() is non-null and dataModel.getTags() is null/omitted, because incomingTags becomes empty and the removeIf predicate matches all automated tags. If the caller omits tags (vs explicitly sending an empty list), this is an unintended behavior change. Consider only pruning automated tags when dataModel.getTags() is explicitly provided (non-null), or distinguish between null and empty to preserve existing tags when tags aren't part of the update payload.

Copilot · 2026-04-10T17:20:57Z

+      if (stored.getTags() != null) {
+        java.util.List<String> incomingColTags = modelColumn.getTags() != null 
+            ? modelColumn.getTags().stream().map(org.openmetadata.schema.type.TagLabel::getTagFQN).collect(java.util.stream.Collectors.toList())
+            : java.util.Collections.emptyList();
+        mergedColumnTags.removeIf(t -> t.getLabelType() == org.openmetadata.schema.type.TagLabel.LabelType.AUTOMATED && !incomingColTags.contains(t.getTagFQN()));
+      }


Similar to table tags: if modelColumn.getTags() is null/omitted but stored.getTags() is non-null, incomingColTags becomes empty and this removes all existing AUTOMATED column tags. If tags are not being updated for the column, this likely wipes automated tags unintentionally. Consider gating the prune on modelColumn.getTags() != null (or otherwise distinguishing null vs empty).

Copilot · 2026-04-10T17:20:57Z

        return EntityDAO.super.listCount(filter);
      }
      return listCount(
-          getTableName(), mySqlCondition, postgresCondition, team, Relationship.HAS.ordinal());
+          getTableName(), mySqlCondition, postgresCondition, team, Relationship.HAS.ordinal(), filter.getQueryParams());
    }


filter.getQueryParams() likely contains a team key (and potentially other keys) that overlaps with separately bound parameters (e.g., @BindFQN("team") String team). Passing it via @BindMap risks overriding the intended binding for :team in the query, which could break filtering (te.nameHash = :team). Consider passing a copy of queryParams with overlapping keys removed (e.g., remove "team" and any other explicitly-bound names) before binding the map.

Copilot · 2026-04-10T17:20:57Z


    List<TagLabel> mergedTableTags =
        mergeTagsWithIncomingPrecedence(table.getTags(), dataModel.getTags());
+    if (table.getTags() != null) {
+      java.util.List<String> incomingTags = dataModel.getTags() != null 
+          ? dataModel.getTags().stream().map(org.openmetadata.schema.type.TagLabel::getTagFQN).collect(java.util.stream.Collectors.toList())


The PR description focuses on fixing BigQuery uniqueCount binding, but this file also introduces tag pruning behavior changes for tables/columns. If these changes are intentional, the PR description should cover them; otherwise consider splitting into a separate PR to keep scope and review risk contained.

Copilot · 2026-04-10T17:20:58Z

        if session.get_bind().dialect.name == Dialects.BigQuery:
-            return func.countif(col == 1).label(self.name())
+            # We are querying against the subquery output (which is a COUNT), so the type is numeric.
+            # Use an untyped column to avoid passing the original metric type (like STRING or BYTES) into the COUNTIF comparison.
+            count_col = column(self.col.name)
+            return func.countif(count_col == 1).label(self.name())


Please add a regression test for the BigQuery path to ensure the generated SQL compares the COUNT subquery output as a numeric (e.g., no :STRING binding for the literal 1 when the original column type is STRING/BYTES). There are existing unit tests for UniqueCount, but they don’t appear to cover the BigQuery Label hotfix flow in SQAProfilerInterface.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Copilot · 2026-04-11T07:39:54Z

          beforeId,
-          Relationship.HAS.ordinal());
+          Relationship.HAS.ordinal(),
+          filter.getQueryParams());


Same binding-collision issue as listCount: listBefore now passes filter.getQueryParams() into @BindMap params while also binding :team via @BindFQN("team"). If queryParams contains "team", it can override the hashed :team value expected by te.nameHash = :team, causing the filter to stop matching. Remove colliding keys from the map (e.g., "team") before binding, or bind the extra params with a prefix.

Copilot · 2026-04-11T07:39:54Z

+          filter.getQueryParams());
    }



Same binding-collision issue as listCount/listBefore: listAfter passes filter.getQueryParams() into @BindMap params while also binding :team via @BindFQN("team"). If queryParams contains "team", it can override the hashed value used by te.nameHash = :team. Consider removing colliding keys from the map or binding the extra params with a prefix.

Suggested change

filter.getQueryParams());

}

getListAfterQueryParams(filter));

}

private Map<String, String> getListAfterQueryParams(ListFilter filter) {

Map<String, String> queryParams = new HashMap<>(filter.getQueryParams());

queryParams.remove("team");

return queryParams;

}

aniruddhaadak80 · 2026-04-11T13:47:50Z

Hello! I am participating in the WeMakeDevs hackathon. Could a maintainer please assign the safe to test label so the GitHub Actions workflows can validate my fixes? Thank you!

aniruddhaadak80 · 2026-04-13T13:49:15Z

Could someone help trigger the CI by adding the safe to test label here? Much appreciated.

github-actions · 2026-04-13T14:05:38Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

github-actions · 2026-04-13T14:09:19Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Copilot · 2026-04-13T14:15:40Z

+      java.util.List<String> incomingTags = dataModel.getTags() != null 
+          ? dataModel.getTags().stream().map(org.openmetadata.schema.type.TagLabel::getTagFQN).collect(java.util.stream.Collectors.toList())
+          : java.util.Collections.emptyList();
+      mergedTableTags.removeIf(t -> t.getLabelType() == org.openmetadata.schema.type.TagLabel.LabelType.AUTOMATED && !incomingTags.contains(t.getTagFQN()));


These newly added lines are not formatted to the repository's standard (Spotless/google-java-format) and rely on fully-qualified names inside the method body, making the code harder to read/maintain. Please apply the standard formatter and use existing imports (e.g., TagLabel::getTagFQN, Collectors/toList) to keep the style consistent and avoid CI formatting failures.

Copilot · 2026-04-13T14:15:41Z

          afterId,
-          Relationship.HAS.ordinal());
+          Relationship.HAS.ordinal(),
+          filter.getQueryParams());


Same binding-collision risk as listCount/listBefore: passing filter.getQueryParams() via @BindMap can re-bind :team and override the @BindFQN("team") hashed value (or cause duplicate binding). Please pass a cleaned params map with conflicting keys removed (at minimum team).

…et for fast lookup

aniruddhaadak80 · 2026-04-13T14:45:55Z

Absolutely! I just refactored TableRepository.java over on this branch to extract the stale-tag removal logic into the simple removeStaleAutomatedTags helper method, replaced the fully qualified class names with standard top-of-file imports, and improved efficiency by leveraging Collectors.toSet() over List.contains. Thanks for pointing that out!

github-actions · 2026-04-13T14:46:23Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

harshach · 2026-04-13T15:01:27Z

@aniruddhaadak80 this shouldn't be touching TableRepository.java, Not sure I follow the changes you are doing here

…r PRs

github-actions · 2026-04-13T17:13:03Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

github-actions · 2026-04-13T17:15:01Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot

Pull request overview

Copilot reviewed 1 out of 2 changed files in this pull request and generated 1 comment.

Copilot · 2026-04-13T17:16:46Z

@@ -68,7 +68,10 @@ def query(self, sample: Optional[type], session: Optional[Session] = None):

        # TODO: Move all connectors from subquery to COUNT(IF) or COUNTIF for peformance


Typo in TODO comment: “peformance” → “performance”.

Suggested change

# TODO: Move all connectors from subquery to COUNT(IF) or COUNTIF for peformance

# TODO: Move all connectors from subquery to COUNT(IF) or COUNTIF for performance

…wershell redirect

github-actions · 2026-04-13T17:24:04Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

github-actions · 2026-04-13T17:24:32Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

github-actions · 2026-04-13T17:25:08Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Copilot · 2026-04-13T17:26:24Z

+def test_bigquery_unique_count():
+    # Mocking session binding
+    session_mock = Mock()
+    session_mock.get_bind().dialect.name = Dialects.BigQuery
+
+    unique_count_metric = UniqueCount(Column("test_col"))
+    result = unique_count_metric.fn(session_mock)
+
+    assert "countif" in str(result).lower()


The test only asserts that COUNTIF appears in the rendered SQL, but it doesn’t verify the regression being fixed (i.e., that the comparison is numeric and not bound/typed as a string) and it doesn’t cover the problematic BYTES/BINARY column scenario described in the PR. Consider constructing the metric with a binary column type (e.g., LargeBinary/BINARY) and asserting against the compiled expression (BigQuery dialect) that the = 1 comparison is treated as numeric (e.g., literal 1 or an integer-typed bindparam), so this test fails under the previous buggy behavior.

…perly validate untyped column typing for BigQuery

github-actions · 2026-04-13T17:38:38Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

gitar-bot · 2026-04-13T17:40:35Z

Code Review ✅ Approved 3 resolved / 3 findings

Fixes BigQuery string bindings on uniqueCount CTE for binary columns by using fully qualified class names in TableRepository and correcting test method calls. All findings have been addressed.

✅ 3 resolved

✅ Quality: Fully qualified class names instead of imports in TableRepository

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TableRepository.java:1411-1414 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TableRepository.java:1434-1437 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TableRepository.java:1410-1415 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TableRepository.java:1433-1438
The new code at lines 1411-1414 and 1434-1437 uses fully qualified class names (java.util.List, java.util.stream.Collectors, java.util.Collections, org.openmetadata.schema.type.TagLabel) instead of using imports at the top of the file. Most of these classes are likely already imported. This hurts readability and is inconsistent with the rest of the codebase.

✅ Bug: Test calls non-existent fn() instead of query()

📄 ingestion/tests/unit/profiler/metrics/test_unique_count.py:15
The test calls unique_count_metric.fn(session_mock) but UniqueCount extends QueryMetric, which defines query() not fn(). The fn() method is only on StaticMetric. This test will raise an AttributeError at runtime.

Additionally, query() requires a sample parameter (first positional arg after self), so the correct call should pass both sample and session.

✅ Quality: Test only checks string output, not the untyped column fix

📄 ingestion/tests/unit/profiler/metrics/test_unique_count.py:17
The test asserts "countif" in str(result).lower() which would pass even with the old buggy code (which also used countif). Consider asserting that the generated SQL does NOT contain the original column type (e.g., STRING or BYTES), or inspect the clause elements to verify an untyped column is used in the comparison. This would make the test actually validate the fix.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

aniruddhaadak80 · 2026-04-13T17:44:56Z

All feedback incorporated!

The accidental TableRepository.java and CollectionDAO.java files are completely reverted. This PR now exclusively touches the Python BigQuery unique_count.py fix.
Fixed the test: it now correctly calls query(sample=None, session=session_mock) instead of the non-existent fn() method.
Updated the test to explicitly verify that the generated column is untyped (NullType) inside the countif expression to prevent the BigQuery type mismatch error.

Looks like CI is failing on Verify PR labels. Could someone re-add the safe to test label so the workflows can run? Thanks!

aniruddhaadak80 added 3 commits April 10, 2026 20:08

fix(dbt): remove stale automated tags absent from incoming schema (op…

8934b5f

…en-metadata#26737)

Fix UserDAO missing parameter domainEntityType open-metadata#27190

0dd985e

Fixes open-metadata#26200: Fix BigQuery Profiler type mismatch on \un…

59bd3e8

…iqueCount\ for binary columns

Copilot AI review requested due to automatic review settings April 10, 2026 17:15

aniruddhaadak80 requested a review from a team as a code owner April 10, 2026 17:15

aniruddhaadak80 mentioned this pull request Apr 10, 2026

Binary Column Cause Profiler Agent to Fail in BigQury #26200

Open

Copilot started reviewing on behalf of aniruddhaadak80 April 10, 2026 17:16 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

gitar-bot bot reviewed Apr 10, 2026

View reviewed changes

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TableRepository.java Outdated

aniruddhaadak80 requested a review from Copilot April 10, 2026 18:40

Copilot started reviewing on behalf of aniruddhaadak80 April 10, 2026 18:40 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

aniruddhaadak80 requested a review from Copilot April 11, 2026 07:34

Copilot started reviewing on behalf of aniruddhaadak80 April 11, 2026 07:35 View session

Copilot AI reviewed Apr 11, 2026

View reviewed changes

test: add backend test file

dde639e

chore: remove placeholder java test breaking CI

831af48

Copilot AI review requested due to automatic review settings April 13, 2026 14:08

Copilot started reviewing on behalf of aniruddhaadak80 April 13, 2026 14:09 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

refactor: extract tag removal logic, use basic imports, and utilize S…

b41d598

…et for fast lookup

fix: remove accidentally included Java backend changes meant for othe…

3431c41

…r PRs

Copilot AI review requested due to automatic review settings April 13, 2026 17:12

Copilot started reviewing on behalf of aniruddhaadak80 April 13, 2026 17:13 View session

test: add unit test for BigQuery uniqueCount CTE binding

132cb21

Copilot AI reviewed Apr 13, 2026

View reviewed changes

aniruddhaadak80 added 2 commits April 13, 2026 22:53

fix(tests): correct text encoding of test_unique_count.py from bad po…

d282fe8

…wershell redirect

chore: wipe binary file

d470c69

Copilot AI review requested due to automatic review settings April 13, 2026 17:24

test: rewrite unique_count bigquery mock as proper python utf8 file

2973b8a

Copilot AI reviewed Apr 13, 2026

View reviewed changes

gitar-bot bot reviewed Apr 13, 2026

View reviewed changes

Comment thread ingestion/tests/unit/profiler/metrics/test_unique_count.py Outdated

gitar-bot bot reviewed Apr 13, 2026

View reviewed changes

Comment thread ingestion/tests/unit/profiler/metrics/test_unique_count.py

fix(tests): address maintainer bugs on non-existent fn() call and pro…

40e4344

…perly validate untyped column typing for BigQuery

-          filter.getQueryParams());
-    }
+          getListAfterQueryParams(filter));
+    }
+    private Map<String, String> getListAfterQueryParams(ListFilter filter) {
+      Map<String, String> queryParams = new HashMap<>(filter.getQueryParams());
+      queryParams.remove("team");
+      return queryParams;
+    }

		@@ -68,7 +68,10 @@ def query(self, sample: Optional[type], session: Optional[Session] = None):

		# TODO: Move all connectors from subquery to COUNT(IF) or COUNTIF for peformance

Conversation

aniruddhaadak80 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What it does

How it does it

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

aniruddhaadak80 commented Apr 11, 2026

Uh oh!

aniruddhaadak80 commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

aniruddhaadak80 commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

harshach commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 13, 2026

aniruddhaadak80 commented Apr 10, 2026 •

edited

Loading

aniruddhaadak80 commented Apr 13, 2026 •

edited

Loading

gitar-bot bot commented Apr 13, 2026 •

edited

Loading