perf: Improve getKeyValues query performance for JSON keys #1284

pulpdrew · 2025-10-21T11:15:47Z

Closes HDX-2623

Summary

This change improves the performance of getKeyValues when getting values of a JSON key.

Generally, columns that are not referenced outside of a CTE will be pruned by the query planner. For JSON however, if the outer select references one field in a JSON column, then the inner select will read (it seems) the entire JSON object.

This PR also adds integration tests for getKeyValues to ensure that the function generates queries that work as expected in ClickHouse.

Performance impact (on single JSON Dashboard Filter)

Original: 15.03s

Optimized: 0.443s

changeset-bot · 2025-10-21T11:15:51Z

🦋 Changeset detected

Latest commit: 4945803

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
@hyperdx/common-utils	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

vercel · 2025-10-21T11:15:53Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
hyperdx-v2-oss-app	Ready	Preview	Comment	Oct 27, 2025 4:46pm

claude · 2025-10-21T11:17:40Z

PR Review: Performance Optimization for getKeyValues

Summary

This PR significantly improves query performance (from 15s to 0.443s - 97% improvement) by optimizing how JSON fields are selected in ClickHouse queries. The optimization is well-implemented with comprehensive integration tests.

✅ No critical issues found.

What Was Changed

Core optimization (metadata.ts:683-688): Changed from selecting all columns to selecting only requested keys in the CTE, avoiding full JSON column reads
Integration tests added: Comprehensive test suite covering JSON columns, materialized columns, and edge cases
Unit tests cleaned up: Removed obsolete tests for old column expansion logic
CI infrastructure: Added integration test support for common-utils package

Code Quality Observations

✅ Strengths:

Excellent test coverage with real ClickHouse integration tests
Clear comments explaining the performance rationale
Proper separation of unit vs integration tests
Early return for empty keys list prevents unnecessary queries

Minor Suggestions (non-blocking):

metadata.ts:688 - The fallback to || '*' appears unreachable since keys.length === 0 returns early (line 671). Consider removing the fallback.
Integration test cleanup - metadata.int.test.ts:79: Consider awaiting client.close() in an afterAll at the outer describe level to ensure proper cleanup even if tests fail.

Testing Notes

Integration tests properly handle both disableRowLimit modes
Test data includes edge cases (empty strings, missing fields, empty JSON objects)
Proper test isolation with table creation/teardown

Recommendation: Approve and merge ✅

The performance improvement is substantial, the implementation is solid, and the test coverage is excellent. The minor suggestions above are optional refinements that do not block merging.

github-actions · 2025-10-21T11:22:00Z

E2E Test Results

✅ All tests passed • 26 passed • 3 skipped • 197s

Status	Count
✅ Passed	26
❌ Failed	0
⚠️ Flaky	0
⏭️ Skipped	3

View full report →

pulpdrew · 2025-10-23T09:04:39Z

packages/common-utils/src/metadata.ts

-        const selectClause = keys
-          .map((k, i) => `groupUniqArray(${limit})(${k}) AS param${i}`)
-          .join(', ');
+        if (keys.length === 0) return [];


All the functional changes are in this file.

This check was added because previously, the query would generate an empty select clause when no keys were provided, resulting in a query error. (eg. SELECT FROM table...)

pulpdrew · 2025-10-27T12:37:18Z

Makefile

+.PHONY: dev-int-common-utils
+dev-int-common-utils:
+	docker compose -p int -f ./docker-compose.ci.yml up -d
+	npx nx run @hyperdx/common-utils:dev:int $(FILE)
+	docker compose -p int -f ./docker-compose.ci.yml down
+


@teeohhem You mentioned our filter queries are fragile - this PR adds integration tests so that we can actually test the queries against real ClickHouse data. Hopefully that will help reduce some of the fragility. I think it would be great if we could extend these tests to cover more of our query generation code from common-utils in the future (eg. all of renderChartConfig).

teeohhem · 2025-10-27T14:48:30Z

packages/common-utils/src/metadata.ts

-                databaseName: chartConfig.from.databaseName,
-                tableName: chartConfig.from.tableName,
-                connectionId: chartConfig.connection,
-              });


What's the reasoning behind this change (just so I understand)?

Consider a case where we are trying to get filter values for stringCol and jsonCol.nested.field.

Before these changes, the query would have been:

WITH sampledData AS ( SELECT `stringCol`, `jsonCol`, -- This is bad for performance ... every other column in the table FROM table ...sampling condition ) SELECT groupUniqArray(20)(stringCol) as param0, groupUniqArray(20)(jsonCol.nested.field) as param1 -- None of the other columns are used out here, so they don't need to be selected in the CTE FROM sampledData

There's no need to select ... every other column in the table in the CTE, and selecting an entire JSON column instead of just the sub-column / path we need is bad for performance.

So now with this change we do:

WITH sampledData AS ( SELECT stringCol as param0, jsonCol.nested.field as param1 -- This is better for performance FROM table ...sampling condition ) SELECT groupUniqArray(20)(param0) as param0, groupUniqArray(20)(param1) as param1 FROM sampledData

Great info! thanks! This seems like an important thing to comment in the code (and also remove the comments below)

// Build select expression that includes all columns by name
// This ensures materialized columns are included

Thanks for pointing that out! The comments have been fixed.

vercel bot deployed to Preview October 21, 2025 11:19 View deployment

pulpdrew force-pushed the drew/optimize-filter-sampling branch from d2aaa67 to b332faa Compare October 21, 2025 11:27

vercel bot deployed to Preview October 21, 2025 11:30 View deployment

pulpdrew force-pushed the drew/optimize-filter-sampling branch from b332faa to 4c54135 Compare October 22, 2025 21:23

vercel bot deployed to Preview October 22, 2025 21:28 View deployment

pulpdrew force-pushed the drew/optimize-filter-sampling branch from 4c54135 to 5c6327b Compare October 22, 2025 21:36

vercel bot deployed to Preview October 22, 2025 21:39 View deployment

pulpdrew force-pushed the drew/optimize-filter-sampling branch from 5c6327b to 1d8c5f0 Compare October 23, 2025 08:16

vercel bot deployed to Preview October 23, 2025 08:19 View deployment

pulpdrew force-pushed the drew/optimize-filter-sampling branch from 1d8c5f0 to 3297abd Compare October 23, 2025 09:03

pulpdrew commented Oct 23, 2025

View reviewed changes

vercel bot deployed to Preview October 23, 2025 09:07 View deployment

pulpdrew added 2 commits October 23, 2025 11:15

perf: Improve getKeyValues query performance for JSON keys

e5c77da

test: Add integration tests for common-utils/metadata

23f37db

pulpdrew force-pushed the drew/optimize-filter-sampling branch from 3297abd to 23f37db Compare October 23, 2025 09:16

pulpdrew marked this pull request as ready for review October 23, 2025 09:16

vercel bot deployed to Preview October 23, 2025 09:20 View deployment

fix: Close client in metadata integration test

d417159

vercel bot deployed to Preview October 25, 2025 10:21 View deployment

pulpdrew requested review from a team and teeohhem and removed request for a team October 27, 2025 12:28

pulpdrew commented Oct 27, 2025

View reviewed changes

teeohhem reviewed Oct 27, 2025

View reviewed changes

fix: Improve getKeyValues comments

2dc77ff

vercel bot deployed to Preview October 27, 2025 15:32 View deployment

teeohhem self-requested a review October 27, 2025 16:22

teeohhem approved these changes Oct 27, 2025

View reviewed changes

pulpdrew added the automerge label Oct 27, 2025

Merge branch 'main' into drew/optimize-filter-sampling

4945803

kodiakhq bot merged commit 8190ee8 into main Oct 27, 2025
8 of 9 checks passed

vercel bot deployed to Preview October 27, 2025 16:46 View deployment

kodiakhq bot deleted the drew/optimize-filter-sampling branch October 27, 2025 16:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: Improve getKeyValues query performance for JSON keys #1284

perf: Improve getKeyValues query performance for JSON keys #1284

Uh oh!

pulpdrew commented Oct 21, 2025 •

edited

Loading

Uh oh!

changeset-bot bot commented Oct 21, 2025 •

edited

Loading

Uh oh!

vercel bot commented Oct 21, 2025 •

edited

Loading

Uh oh!

claude bot commented Oct 21, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 21, 2025 •

edited

Loading

Uh oh!

pulpdrew Oct 23, 2025

Uh oh!

pulpdrew Oct 27, 2025

Uh oh!

teeohhem Oct 27, 2025

Uh oh!

pulpdrew Oct 27, 2025

Uh oh!

teeohhem Oct 27, 2025

Uh oh!

pulpdrew Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf: Improve getKeyValues query performance for JSON keys #1284

perf: Improve getKeyValues query performance for JSON keys #1284

Uh oh!

Conversation

pulpdrew commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance impact (on single JSON Dashboard Filter)

Uh oh!

changeset-bot bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

vercel bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Performance Optimization for getKeyValues

Summary

What Was Changed

Code Quality Observations

Minor Suggestions (non-blocking):

Testing Notes

Uh oh!

github-actions bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Test Results

Uh oh!

pulpdrew Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

pulpdrew Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

teeohhem Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

pulpdrew Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

teeohhem Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

pulpdrew Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pulpdrew commented Oct 21, 2025 •

edited

Loading

changeset-bot bot commented Oct 21, 2025 •

edited

Loading

vercel bot commented Oct 21, 2025 •

edited

Loading

claude bot commented Oct 21, 2025 •

edited

Loading

github-actions bot commented Oct 21, 2025 •

edited

Loading