[Rollup] Loosen validations when only raw data is queried #35744

polyfractal · 2018-11-20T15:59:53Z

Ask coming from kibana-land: elastic/kibana#24059

If a user hits the RollupSearch endpoint, we enforce a variety of constraints based on the matching job that queried. The most notable restriction is the interval. Once a chart is rendered, a user may wish to zoom in on a region. If this region is purely "raw" data, the interval validation isn't technically required any more because it is all raw data, and UIs may wish to display finer granularity buckets in this region.

This is tricky to support in Rollup today. We don't know the extent of data bounds until the search is executed. Only after the results come back do we know where the live and rolled data exist (and potentially overlap). So bypassing the validation through RollupSearch endpoint would be relatively complicated. But telling to the user to switch to the regular search is not possible, since the user doesn't know where the bounds are either.

If we want to support this kind of behavior, there are a few routes we could take:

Pre-search request to find bounds

Simple approach is to internally fire off a pre-search to find the bounds of the live (or rolled) data, so that the rollup search endpoint knows where boundaries exist. This sounds expensive for a behavior that I expect will be the minority case.

Technically this could be applied to the client-side too, and just tell clients/UI/Kibana to only use RollupSearch endpoint if they want both. Not super user-friendly though.

Obtain bounds from running task

We could get the data bounds from the currently running task. This also requires a pre-flight request, and has the disadvantage of not working if the task is gone. E.g. there's no guarantee a running task will match up with the index being searched, it may be gone

Enrich responses with metadata

We could enrich the aggregation response with metadata indicating which buckets were generated from "raw" data. This would be trivially easy to implement since we already know this information when merging shard responses.

This would give the client/UI enough information to know which region was entirely "raw" data, so if they wanted to zoom into this region exclusively they could switch to the regular search endpoint.

Or maybe hit the Rollup endpoint with some kind of parameter saying to ignore validation? Not sure. In any case, this feels like the most workable and flexible solution, albeit the least "magic" solution.

elasticmachine · 2018-11-20T15:59:54Z

Pinging @elastic/es-search-aggs

wchaparro · 2023-06-23T19:07:47Z

With the 8.7 release of Elasticsearch, we have made a new downsampling capability associated with the new time series datastreams functionality generally available (GA). This capability was in tech preview in ILM since 8.5. Downsampling provides a method to reduce the footprint of your time series data by storing it at reduced granularity. The downsampling process rolls up documents within a fixed time interval into a single summary document. Each summary document includes statistical representations of the original data: the min, max, sum, value_count, and average for each metric. Data stream time series dimensions are stored unchanged.

Downsampling is superior to rollup because:

Downsampled indices are searched through the _search API
It is possible to query multiple downsampled indices together with raw data indices
The pre-aggregation is based on the metrics and time series definitions in the index mapping so very little configuration is required (i.e. much easier to add new time serieses)
Downsampling is managed as an action in ILM
It is possible to downsample a downsampled index, and reduce granularity as the index ages
The performance of the pre-aggregation process is superior in downsampling, as it builds on the time_series index mode infrastructure

Because of the introduction of this new capability, we are deprecating the rollups functionality, which never left the Tech Preview/Experimental status, in favor of downsampling and thus we are closing this issue. We encourage you to migrate your solution to downsampling and take advantage of the new TSDB functionality.

$@polyfractal$ polyfractal added >enhancement :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data labels Nov 20, 2018

$@polyfractal$ polyfractal mentioned this issue Nov 20, 2018

[Rollups] Combination charts should default to most granular data in the time interval elastic/kibana#24059

Open

alexwizp mentioned this issue Jan 31, 2019

Rollup support for TSVB elastic/kibana#28762

Merged

15 tasks

rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020

wchaparro closed this as not planned Won't fix, can't repro, duplicate, stale Jun 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Rollup] Loosen validations when only raw data is queried #35744

[Rollup] Loosen validations when only raw data is queried #35744

polyfractal commented Nov 20, 2018

elasticmachine commented Nov 20, 2018

wchaparro commented Jun 23, 2023

[Rollup] Loosen validations when only raw data is queried #35744

[Rollup] Loosen validations when only raw data is queried #35744

Comments

polyfractal commented Nov 20, 2018

Pre-search request to find bounds

Obtain bounds from running task

Enrich responses with metadata

elasticmachine commented Nov 20, 2018

wchaparro commented Jun 23, 2023