feat(charts): make the series limit opt-in and consistent across chunks#2449
Conversation
Each time-window chunk's __hdx_series_limit CTE ranked the top N within its own window, so the union across chunks could exceed seriesLimit and adjacent windows disagreed on which series to keep. Chunked queries now carry the full chart range as seriesLimitDateRange and the CTE ranks over that pinned range, so every chunk keeps the identical top-N set. Follow-up to HDX-4499.
🦋 Changeset detectedLatest commit: 137c338 The changes in this PR will be included in the next version bump. This PR includes changesets to release 4 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
🔴 Tier 4 — CriticalTouches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD. Why this tier:
Review process: Deep review from a domain expert. Synchronous walkthrough may be required. Stats
|
Greptile SummaryThis PR replaces the workspace-wide team
Confidence Score: 5/5Safe to merge — all changed paths are well-tested, the chunking logic is verified by both unit and integration tests, and the removed team-level setting had a clear per-chart replacement added in the same PR. The core chunking fix is tightly scoped: only the CTE's WHERE/GROUP BY are re-rendered against the pinned range; the outer query and aggregate expressions are untouched. The null/undefined normalization for seriesLimit is handled consistently at every layer (Zod schema, convertToTimeChartConfig, form reset, URL round-trip). The byte-identical CTE assertion in tests is a strong correctness signal. No open logic gaps were found. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Chart config with seriesLimit] --> B{enableQueryChunking\n& shouldUseChunking?}
B -- No --> C[windows = undefined\nno seriesLimitDateRange]
B -- Yes --> D[getGranularityAlignedTimeWindows\nwindows newest→oldest]
D --> E[rankingDateRange = windows 0 .dateRange\nnewest window]
E --> F{seriesLimit != null\n& rankingDateRange != null?}
F -- No --> G[windowedConfig = chunk window only]
F -- Yes --> H[windowedConfig = chunk window\n+ seriesLimitDateRange = rankingDateRange]
H --> I[renderSeriesLimitCte\nre-renders WHERE & GROUP BY\nagainst seriesLimitDateRange]
I --> J[CTE byte-identical\nacross all chunks]
J --> K[Union of chunks = exactly N series]
C --> L[CTE uses outer query dateRange\nunchunked path unchanged]
Reviews (9): Last reviewed commit: "Merge branch 'main' into warren/fix-seri..." | Re-trigger Greptile |
E2E Test Results✅ All tests passed • 199 passed • 3 skipped • 1343s
Tests ran across 4 shards in parallel. |
buildChartConfigForExplanations called convertToTimeChartConfig without the team's seriesLimit, so the Generated SQL section always rendered the default LIMIT 100 even when the team setting was lower. Pass the same value DBTimeChart uses so the preview matches the executed query.
Pinning the __hdx_series_limit ranking to the full chart range made every chunk re-scan the whole range. Rank over the newest window instead: still one fixed range shared by all chunks (so the top-N set stays consistent), but the scan is bounded by the smallest window. Trade-off: series are picked by recent activity, so groups with no events in the newest window are dropped from the chart.
The team seriesLimit setting no longer falls back to 100: when unset, charts fetch every series and no __hdx_series_limit CTE is emitted. The team settings page shows the setting as "Disabled" by default and adds a Disable button to clear a configured value back to undefined.
pulpdrew
left a comment
There was a problem hiding this comment.
Change itself LGTM in isolation. I have some thoughts though:
- I think opt-in is a good idea, in the future I could see it being useful to opt-in at the chart level instead of team-wide, so that charts without too many series don't pay the performance penalty of this new approach, while charts that do need series limiting can opt-in individually. This could be another "display setting"
- Also I am wondering if you've considered a two-query solution: One query to fetch the top-N series over the whole time range; then the main query, without a CTE, just an additional
WHERE <series> IN (...)condition?- This would address "Series are selected by activity in the newest window, so a group with no events there is dropped from the chart even if it was active earlier in the range."
- I would imagine this could also make the query building a bit simpler too, but there are probably some complexities around making sure that the
tuples identifying the top N series are rendered correctly, so maybe not. - This could also be used to fix the complaint that the chart seems to change drastically as new chunks load. The "pre flight" query could be used to get the max series value that will be rendered, and the chart y-axis could be scaled according to that value, instead of being re-scaled as each new chunk is loaded in.
- Not related to this change but something I notice about original implementation which is still present: When using multiple series, or when using ratio mode, only the first series is aggregated to choose the "top" series, instead of the max of the series or the ratio value.
|
…ile Display Settings The series limit is now configured per chart in the Display Settings drawer instead of as a workspace-wide team setting. The value already persisted on the chart config, so convertToTimeChartConfig now reads config.seriesLimit directly; the team seriesLimit field, its ClickHouse settings schema, and the Team Settings UI control are removed. The chunk-consistency behavior (seriesLimitDateRange pinned to the newest window) is unchanged and now driven per tile. HDX-4499
Clearing the per-chart Series Limit in Display Settings updated the chart query but reappeared when the drawer was reopened. In Chart Explorer the config round-trips through the URL query state (JSON), which drops keys with undefined values; react-hook-form's `values` prop then leaves the stale field in place because the key is absent. Clear to null instead, which survives JSON serialization and forces RHF to reset the field. seriesLimit is now nullish in the schema, and convertToTimeChartConfig normalizes a cleared null back to undefined (no CTE). HDX-4499
Summary
Reworks the top-N series cap from #2429. The limit moves from a workspace-wide team setting to a per-chart control in the Display Settings drawer, and now holds correctly across chunked queries.
Per-chart, opt-in (default: disabled). A chart's
seriesLimitis set in its Display Settings drawer — it already persisted on the chart config, so no new persistence was needed. The control only renders for builder line/bar charts. When unset, charts fetch every series and no__hdx_series_limitCTE is emitted. The old workspace-wide teamseriesLimitsetting is removed (team settings UI, theseriesLimitClickHouse setting schema, and the mongoose field).Value sourced from the chart, not the team.
convertToTimeChartConfigno longer takes a team limit; it readsconfig.seriesLimit(clamped to ≥ 1).DBTimeChart,SearchTotalCountChart, and the "Generated SQL" preview (buildChartConfigForExplanations) all drop theme.team.seriesLimitplumbing, so the preview matches the executed query — including no CTE when the limit is disabled.Consistent across chunks. Time charts fetch in time-windowed chunks, and each chunk previously ranked its top-N within its own window — so the union across chunks could exceed the limit and adjacent windows disagreed. Chunked queries now pin the ranking to the newest chunk window via a runtime-only
seriesLimitDateRange, andrenderSeriesLimitCterenders the CTE's WHERE/GROUP BY against that pinned range (boundary inclusivity normalized) while the outer query still fetches only its window. The CTE is byte-identical across chunks, so every chunk keeps the same top-N set and the union equals N. Trade-off: a group with no events in the newest window is dropped even if it was active earlier. Unchunked consumers never set the field and are unchanged.Tests
ChartUtils/buildChartConfigForExplanations: per-chart limit applied / omitted when disabled.ChartDisplaySettingsDrawer: control shown for builder line/bar charts only; entered value setsseriesLimit; emptying clears it to disabled.useChartConfig: serial and parallel chunked fetches pin the ranking to the newest window; unchunked fetches don't set it.renderChartConfig: CTE pinned toseriesLimitDateRangeand byte-identical across two windowed renders; no CTE without a limit.References
Screenshot