Skip to content

feat(common-utils): apply direct_read KV items optimization to SQL filters#2405

Merged
kodiakhq[bot] merged 4 commits into
mainfrom
aaron/direct-read-filters
Jun 4, 2026
Merged

feat(common-utils): apply direct_read KV items optimization to SQL filters#2405
kodiakhq[bot] merged 4 commits into
mainfrom
aaron/direct-read-filters

Conversation

@knudtty

@knudtty knudtty commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Summary

SQL type: 'sql' filters on Map columns now get the same has()/hasAny() rewrite that Lucene filters already use when a KV items column with a text(tokenizer=array) skip index exists.

  • Map['k'] = 'v' → has(Items, concat('k', '=', 'v'))
  • Map['k'] IN ('a','b') → hasAny(Items, array(concat(...), ...))

Extracts buildKvItemsLookup from CustomSchemaSQLSerializerV2 into a shared export. Removes debug console.log('AVK filter').

References

This accomplishes the direct_read optimization for filters instead of the approach reverted by #2401

…lters

SQL `type: 'sql'` filters on Map columns now get the same has()/hasAny()
rewrite that Lucene filters already use when a KV items column with a
text(tokenizer=array) skip index exists.

- Map['k'] = 'v' → has(Items, concat('k', '=', 'v'))
- Map['k'] IN ('a','b') → hasAny(Items, array(concat(...), ...))

Extracts buildKvItemsLookup from CustomSchemaSQLSerializerV2 into a
shared export. Removes debug console.log('AVK filter').
@changeset-bot

changeset-bot Bot commented Jun 2, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 15b35ce

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@hyperdx/common-utils Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel

vercel Bot commented Jun 2, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment Jun 4, 2026 2:52pm
hyperdx-storybook Ready Ready Preview, Comment Jun 4, 2026 2:52pm

Request Review

@github-actions github-actions Bot added the review/tier-3 Standard — full human review required label Jun 2, 2026
@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

🟡 Tier 3 — Standard

Introduces new logic, modifies core functionality, or touches areas with non-trivial risk.

Why this tier:

  • Diff size: 318 production lines changed (Tier 2 max: < 250)

Review process: Full human review — logic, architecture, edge cases.
SLA: First-pass feedback within 1 business day.

Stats
  • Production files changed: 2
  • Production lines changed: 318 (+ 475 in test files, excluded from tier calculation)
  • Branch: aaron/direct-read-filters
  • Author: knudtty

To override this classification, remove the review/tier-3 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

E2E Test Results

All tests passed • 194 passed • 3 skipped • 1314s

Status Count
✅ Passed 194
❌ Failed 0
⚠️ Flaky 5
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Deep Review

No critical issues found. No P0/P1 ship-blockers: the change is well-tested, falls back safely to valid (if unoptimized) SQL on any parse failure, and reaches parity with the existing Lucene rewrite. The recommendations below center on one new broad round-trip path, escaping/encoding edges, and release hygiene.

🟡 P2 -- recommended

  • packages/common-utils/src/core/renderChartConfig.ts:346 -- rewriteSqlFilterWithKvItems only early-returns when the lookup is empty, so when any KV-items column exists every type:'sql' filter is re-serialized through astify/sqlify even when it contains no map subscript; since direct_read columns now ship by default this round-trips unrelated ClickHouse-specific filter syntax through the Postgresql dialect, risking silent mis-serialization.
    • Fix: Track whether tryOptimize actually rewrote a node and return the original condition string verbatim when nothing matched.
    • adversarial, performance, testing
  • packages/common-utils/src/core/renderChartConfig.ts:420 -- SqlString.format emits MySQL-style backslash escaping for keys/values, but the replacement is reparsed and re-emitted via the Postgresql dialect, so values containing a single quote or backslash can either bail out of the rewrite or be silently altered.
    • Fix: Escape values for the actual target dialect consistently and add coverage for quote/backslash values.
    • security, testing, adversarial
  • packages/common-utils/src/core/renderChartConfig.ts:424 -- Map['a'] = 'b=c' rewrites to has(items, concat('a','=','b=c')), which also matches a row whose key is a=b and value is c, returning rows the exact map subscript would not (a value/key containing the separator char collides).
    • Fix: Bail on the rewrite when the map key or any value contains the configured separator string.
    • adversarial
  • .changeset/many-comics-rule.md:2 -- the changeset declares a patch bump, but the feature was hand-written into the already-versioned ## 0.20.0 Minor Changes section of CHANGELOG.md (package is already at 0.20.0), so changeset version will emit a separate 0.20.1 section and duplicate the entry.
    • Fix: Remove the manual CHANGELOG.md edit and set the bump type to match intent (minor for a feature) so the changelog is generated consistently.
    • project-standards
  • packages/common-utils/src/__tests__/rewriteSqlFilterWithKvItems.test.ts:93 -- pass-through cases (no subscript, numeric/column RHS, map not in lookup) assert with toContain/not.toContain, which would not catch sqlify round-trip mutations of an otherwise-unchanged filter.
    • Fix: Assert toBe(condition) for non-rewritten inputs and add cases for quote/backslash and separator-char values.
    • testing, maintainability, security
  • packages/common-utils/src/queryParser.ts:1093 -- the extracted buildKvItemsLookup has no direct unit test; its ALIAS-vs-MATERIALIZED version gating and array text-index matching are exercised only indirectly through CustomSchemaSQLSerializerV2.
    • Fix: Add direct unit tests with a mock Metadata covering the version gate, the strategy loop, and index matching.
    • kieran-typescript, testing
  • packages/common-utils/src/__tests__/renderChartConfig.test.ts:987 -- the new renderWhere guards (subquery CTE, missing databaseName/tableName) are not asserted for the KV-items path, so a regression that fetches metadata or rewrites under a CTE would go uncaught.
    • Fix: Add a test with a subquery CTE plus a map-subscript SQL filter asserting no rewrite, and spy that the metadata fetches are skipped when tableName is empty.
    • testing
🔵 P3 nitpicks (8)
  • packages/common-utils/src/core/renderChartConfig.ts:470 -- parser.sqlify(ast).slice(prefix.length) silently corrupts the output if sqlify normalizes the synthetic prefix differently from the literal string, and a wrong slice would not throw.
    • Fix: Assert the output starts with the known prefix (or split on WHERE) and fall through to the catch otherwise.
    • reliability, kieran-typescript
  • packages/common-utils/src/core/renderChartConfig.ts:1112 -- the rewrite is applied only to filters[] of type:'sql'; the top-level where and select.aggCondition SQL clauses are not rewritten, unlike the Lucene path which optimizes all three (performance asymmetry only, results unaffected).
    • Fix: Route the SQL where/aggCondition strings through the same rewrite if full parity is intended.
    • correctness
  • packages/common-utils/src/core/renderChartConfig.ts:452 -- traverse only descends into binary_expr/expr_list, so a subscript under NOT (...) (unary_expr) is left unoptimized (correct, just not accelerated).
    • Fix: Optionally recurse into unary_expr with the same empty-value bail logic.
    • correctness
  • packages/common-utils/src/core/renderChartConfig.ts:372 -- map-subscript fields are read via bracket access (left['column']?.expr?.value, left['array_index']), bypassing the type checker into implicit any.
    • Fix: Define a local augmented ColumnRef shape (as fastifySQL does) and narrow once.
    • kieran-typescript
  • packages/common-utils/src/core/renderChartConfig.ts:430 -- the hasAny(...) replacement is built by interleaving template literals with nested SqlString.format calls, making the quoting structure hard to audit.
    • Fix: Extract a named buildHasExpr/buildHasAnyExpr helper.
    • kieran-typescript, maintainability
  • packages/common-utils/src/core/renderChartConfig.ts:424 -- the has()/hasAny() emission duplicates the equivalent emission in the Lucene serializer in queryParser.ts instead of sharing it.
    • Fix: Extract a shared formatter alongside buildKvItemsLookup and call it from both paths.
    • maintainability
  • packages/common-utils/src/queryParser.ts:1141 -- the inner .catch(() => []) on getSkipIndices is redundant given the function's outer try/catch and subtly changes partial-failure behavior (partial lookup vs empty Map).
    • Fix: Remove the inner catch or document the intended partial-failure semantics.
    • reliability
  • packages/common-utils/src/queryParser.ts:1215 -- the private one-line buildKvItemsLookup wrapper only forwards to the shared function and adds indirection.
    • Fix: Inline the call in the constructor.
    • maintainability

Reviewers (9): correctness, security, adversarial, performance, reliability, kieran-typescript, testing, maintainability, project-standards.

Testing gaps:

  • No toBe pass-through assertion guarding the sqlify round-trip of non-rewritten SQL filters; no quote/backslash/separator-char value coverage.
  • buildKvItemsLookup is tested only indirectly (no direct unit test of version gating / index matching / metadata-failure fallback).
  • Subquery-CTE and empty databaseName/tableName guards for the KV-items path are not asserted.

Note on a dropped finding: Two reviewers flagged a "duplicate getSkipIndices RPC" (P1) from the extraction. Verified against MetadataCache.getOrFetch (packages/common-utils/src/core/metadata.ts:49), which dedups both cached values and in-flight pending promises by key — the extracted buildKvItemsLookup reuses the same fetch, so there is no extra network round-trip. Dropped as a non-issue.

@knudtty knudtty requested review from a team and teeohhem and removed request for a team June 3, 2026 15:42
Comment thread packages/common-utils/src/queryParser.ts Outdated
Comment thread packages/common-utils/src/core/renderChartConfig.ts Outdated
Comment thread packages/common-utils/src/core/renderChartConfig.ts Outdated

@wrn14897 wrn14897 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@kodiakhq kodiakhq Bot merged commit 747352f into main Jun 4, 2026
19 checks passed
@kodiakhq kodiakhq Bot deleted the aaron/direct-read-filters branch June 4, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automerge review/tier-3 Standard — full human review required

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants