-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Support window in more time-series aggregations #138456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
b6f0988 to
b07352b
Compare
b07352b to
1c43338
Compare
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
| * 02:00 -> [12:00, 13:59) // bucket=2m | ||
| * 00:04 -> [04:00, 05:59), [06:00, 07:59), [08:00, 09:59), [10:00, 11:59), [12:00, 13:59) // window=10m | ||
| */ | ||
| private void expandWindowBuckets() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kkrik-es This is the main change
| * For example, if the window is 10 minutes and the bucket size is 2 minutes: | ||
| * If we see a bucket at 12:00, we need to ensure buckets exist for the previous 10 minutes. | ||
| * 02:00 -> [12:00, 13:59) // bucket=2m | ||
| * 00:04 -> [04:00, 05:59), [06:00, 07:59), [08:00, 09:59), [10:00, 11:59), [12:00, 13:59) // window=10m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm somewhat confused by these examples - while the text above is clear. Shouldn't we have an example where, at 12:00, we need to include data from 11:50 and on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update the comment; the PR description may provide more clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right that example was nice.
.../compute/src/main/java/org/elasticsearch/compute/operator/TimeSeriesAggregationOperator.java
Show resolved
Hide resolved
| long endTimestamp = tsBlockHash.getLongKeyFromGroup(groupId); | ||
| long bucket = timeBucket.nextRoundingValue(endTimestamp - timeResolution.convert(largestWindowMillis())); | ||
| bucket = Math.max(bucket, tsBlockHash.getMinLongKey()); | ||
| while (bucket < endTimestamp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: consider adding a comment here, we're filling in missing buckets due to the window expanding outside each bucket.
kkrik-es
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, makes sense.
|
Thanks Kostas! |
This change adds support for window functions for additional time-series
aggregations, including `min_over_time`, `max_over_time`,
`first_over_time`, `count_over_time`, and `sum_over_time`. These changes
are straightforward. The main update in this PR is how the window is
expanded before sliding over the partial results.
For example, given these data points:
```
|_tsid| cluster| host | timestamp | metric |
| t1 | prod | h1 | 2025-04-15T01:12:00Z | 100 |
| t2 | prod | h2 | 2025-04-15T01:14:00Z | 200 |
```
With `bucket=5s` and no window:
```
TS ...
| WHERE TRANGE('2025-04-15T01:10:00Z', '2025-04-15T01:15:00Z')
| STATS sum(sum_over_time(metric)) BY host, TBUCKET(5s)
```
Yields:
```
cluster | bucket | SUM |
prod | 2025-04-15T01:10:00Z | 300 |
```
With a window=5s:
```
TS ...
| WHERE TRANGE('2025-04-15T01:10:00Z', '2025-04-15T01:15:00Z')
| STATS sum(sum_over_time(metric, 5s)) BY host, TBUCKET(1s)
```
Yields:
```
cluster | bucket | SUM |
prod | 2025-04-15T01:12:00Z | 100 |
prod | 2025-04-15T01:14:00Z | 200 |
```
Ideally, all buckets from `2025-04-15T01:10:00Z` to
`2025-04-15T01:14:00Z` should be generated:
```
cluster | bucket | SUM |
prod | 2025-04-15T01:10:00Z | 300 |
prod | 2025-04-15T01:11:00Z | 300 |
prod | 2025-04-15T01:12:00Z | 300 |
prod | 2025-04-15T01:13:00Z | 200 |
prod | 2025-04-15T01:14:00Z | 200 |
```
With this change, buckets are expanded as if sliding over the raw input
before combining for the final results.
This change adds support for window functions for additional time-series
aggregations, including `min_over_time`, `max_over_time`,
`first_over_time`, `count_over_time`, and `sum_over_time`. These changes
are straightforward. The main update in this PR is how the window is
expanded before sliding over the partial results.
For example, given these data points:
```
|_tsid| cluster| host | timestamp | metric |
| t1 | prod | h1 | 2025-04-15T01:12:00Z | 100 |
| t2 | prod | h2 | 2025-04-15T01:14:00Z | 200 |
```
With `bucket=5s` and no window:
```
TS ...
| WHERE TRANGE('2025-04-15T01:10:00Z', '2025-04-15T01:15:00Z')
| STATS sum(sum_over_time(metric)) BY host, TBUCKET(5s)
```
Yields:
```
cluster | bucket | SUM |
prod | 2025-04-15T01:10:00Z | 300 |
```
With a window=5s:
```
TS ...
| WHERE TRANGE('2025-04-15T01:10:00Z', '2025-04-15T01:15:00Z')
| STATS sum(sum_over_time(metric, 5s)) BY host, TBUCKET(1s)
```
Yields:
```
cluster | bucket | SUM |
prod | 2025-04-15T01:12:00Z | 100 |
prod | 2025-04-15T01:14:00Z | 200 |
```
Ideally, all buckets from `2025-04-15T01:10:00Z` to
`2025-04-15T01:14:00Z` should be generated:
```
cluster | bucket | SUM |
prod | 2025-04-15T01:10:00Z | 300 |
prod | 2025-04-15T01:11:00Z | 300 |
prod | 2025-04-15T01:12:00Z | 300 |
prod | 2025-04-15T01:13:00Z | 200 |
prod | 2025-04-15T01:14:00Z | 200 |
```
With this change, buckets are expanded as if sliding over the raw input
before combining for the final results.
This change adds support for window functions for additional time-series aggregations, including
min_over_time,max_over_time,first_over_time,count_over_time, andsum_over_time. These changes are straightforward. The main update in this PR is how the window is expanded before sliding over the partial results.For example, given these data points:
With
bucket=5sand no window:Yields:
With a window=5s:
Yields:
Ideally, all buckets from
2025-04-15T01:10:00Zto2025-04-15T01:14:00Zshould be generated:With this change, buckets are expanded as if sliding over the raw input before combining for the final results.