Skip to content

Conversation

@suddendust
Copy link
Contributor

@suddendust suddendust commented Dec 10, 2025

Optimize IN/NOT IN Queries with= ANY(ARRAY[])

Overview

Replaced IN () syntax with = ANY(ARRAY[]) for better PostgreSQL query performance across flat and nested JSON collections.

SQL Examples

Flat/Top-level Scalar Fields

Before:

"item" IN (?, ?, ?)                    -- 3 parameters
"price" IN (?, ?, ?)                   -- 3 parameters

After:

"item" = ANY(?)                        -- 1 parameter (text array)
"price" = ANY(?)                       -- 1 parameter (int4 array)

Array Fields (overlap operator)

Before:

"tags" && ARRAY[?, ?, ?]::text[]       -- 3 parameters
"numbers" && ARRAY[?, ?]               -- 2 parameters

After:

"tags" && ?                            -- 1 parameter (text array)
"numbers" && ?                         -- 1 parameter (int4 array)

NOT IN (flat collections)

Before:

"item" NOT IN (?, ?)                   -- 2 parameters

After:

"item" IS NULL OR NOT ("item" = ANY(?)) -- 1 parameter (text array)

JSONB Array Fields

Before:

"props_dot_tags" = ANY(ARRAY[?::jsonb, ?::jsonb])  -- 2 parameters with explicit casts

After:

"props_dot_tags" = ANY(ARRAY[?::jsonb, ?::jsonb])  -- unchanged (JSONB requires explicit casts)

Testing

[x] Add integration tests.
[x] Tested in a live env.

@codecov
Copy link

codecov bot commented Dec 10, 2025

Codecov Report

❌ Patch coverage is 77.90698% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.41%. Comparing base (4e8e31c) to head (6f0874e).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...ld/PostgresInRelationalFilterParserArrayField.java 63.63% 4 Missing and 4 partials ⚠️
...PostgresInRelationalFilterParserJsonPrimitive.java 50.00% 3 Missing and 2 partials ⚠️
...d/PostgresInRelationalFilterParserScalarField.java 68.75% 2 Missing and 3 partials ⚠️
...resContainsRelationalFilterParserNonJsonField.java 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #258      +/-   ##
============================================
- Coverage     80.48%   80.41%   -0.07%     
- Complexity     1330     1337       +7     
============================================
  Files           231      231              
  Lines          6020     6060      +40     
  Branches        537      545       +8     
============================================
+ Hits           4845     4873      +28     
- Misses          808      813       +5     
- Partials        367      374       +7     
Flag Coverage Δ
integration 80.41% <77.90%> (-0.07%) ⬇️
unit 58.21% <47.67%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

/**
* Generates SQL for scalar IN operator (used when array field has been unnested). Example:
* "tags_unnested" IN (?, ?, ?)
* "tags_unnested" = ANY(ARRAY[?, ?, ?])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suddendust Why do we have multiple placeholders in ANY? This should be just one placeholder. If the placeholders are variable query plan will not be cached.

Also, even if we use IN, postgres query planner is automatically converting queries to ANY.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I am actually testing this. Basically, using just one placeholder for the entire array.

@suddendust suddendust changed the title Use ARRAY instead of IN queries for Better Perf in Flat PG Collections [Draft] Use ARRAY instead of IN queries for Better Perf in Flat PG Collections Dec 10, 2025
@suddendust suddendust changed the title [Draft] Use ARRAY instead of IN queries for Better Perf in Flat PG Collections Use ARRAY instead of IN queries for Better Perf in Flat PG Collections Dec 11, 2025
*/
public static String inferSqlTypeFromValue(Object[] values) {
if (values.length == 0) {
return "text";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this cause issues when type of the column is not text?

Copy link
Contributor Author

@suddendust suddendust Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there'd be an issue, found this in texting. Fixed this now.

* @param values Array of values to infer type from
* @return PostgreSQL internal type name: "int4", "float8", "bool", or "text"
*/
public static String inferSqlTypeFromValue(Object[] values) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not very happy about this. How big is the lift to pass type information from entity service?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now just need to start passing in the type info in IdentifierExpression. That'd entail changes in document-store and then the corresponding changes in entity-service.


String placeholders =
if (values.length == 0) {
return "1 = 0";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious, what is this for?

@suresh-prakash suresh-prakash merged commit 35fac78 into hypertrace:main Dec 18, 2025
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants