Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rollup] Loosen validations when only raw data is queried #35744

Closed
polyfractal opened this issue Nov 20, 2018 · 2 comments
Closed

[Rollup] Loosen validations when only raw data is queried #35744

polyfractal opened this issue Nov 20, 2018 · 2 comments
Labels
>enhancement :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@polyfractal
Copy link
Contributor

Ask coming from kibana-land: elastic/kibana#24059

If a user hits the RollupSearch endpoint, we enforce a variety of constraints based on the matching job that queried. The most notable restriction is the interval. Once a chart is rendered, a user may wish to zoom in on a region. If this region is purely "raw" data, the interval validation isn't technically required any more because it is all raw data, and UIs may wish to display finer granularity buckets in this region.

This is tricky to support in Rollup today. We don't know the extent of data bounds until the search is executed. Only after the results come back do we know where the live and rolled data exist (and potentially overlap). So bypassing the validation through RollupSearch endpoint would be relatively complicated. But telling to the user to switch to the regular search is not possible, since the user doesn't know where the bounds are either.

If we want to support this kind of behavior, there are a few routes we could take:

Pre-search request to find bounds

Simple approach is to internally fire off a pre-search to find the bounds of the live (or rolled) data, so that the rollup search endpoint knows where boundaries exist. This sounds expensive for a behavior that I expect will be the minority case.

Technically this could be applied to the client-side too, and just tell clients/UI/Kibana to only use RollupSearch endpoint if they want both. Not super user-friendly though.

Obtain bounds from running task

We could get the data bounds from the currently running task. This also requires a pre-flight request, and has the disadvantage of not working if the task is gone. E.g. there's no guarantee a running task will match up with the index being searched, it may be gone

Enrich responses with metadata

We could enrich the aggregation response with metadata indicating which buckets were generated from "raw" data. This would be trivially easy to implement since we already know this information when merging shard responses.

This would give the client/UI enough information to know which region was entirely "raw" data, so if they wanted to zoom into this region exclusively they could switch to the regular search endpoint.

Or maybe hit the Rollup endpoint with some kind of parameter saying to ignore validation? Not sure. In any case, this feels like the most workable and flexible solution, albeit the least "magic" solution.

@polyfractal polyfractal added >enhancement :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data labels Nov 20, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@wchaparro
Copy link
Member

With the 8.7 release of Elasticsearch, we have made a new downsampling capability associated with the new time series datastreams functionality generally available (GA). This capability was in tech preview in ILM since 8.5. Downsampling provides a method to reduce the footprint of your time series data by storing it at reduced granularity. The downsampling process rolls up documents within a fixed time interval into a single summary document. Each summary document includes statistical representations of the original data: the min, max, sum, value_count, and average for each metric. Data stream time series dimensions are stored unchanged.

Downsampling is superior to rollup because:

  • Downsampled indices are searched through the _search API
  • It is possible to query multiple downsampled indices together with raw data indices
  • The pre-aggregation is based on the metrics and time series definitions in the index mapping so very little configuration is required (i.e. much easier to add new time serieses)
  • Downsampling is managed as an action in ILM
  • It is possible to downsample a downsampled index, and reduce granularity as the index ages
  • The performance of the pre-aggregation process is superior in downsampling, as it builds on the time_series index mode infrastructure

Because of the introduction of this new capability, we are deprecating the rollups functionality, which never left the Tech Preview/Experimental status, in favor of downsampling and thus we are closing this issue. We encourage you to migrate your solution to downsampling and take advantage of the new TSDB functionality.

@wchaparro wchaparro closed this as not planned Won't fix, can't repro, duplicate, stale Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

4 participants