Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(over window): window function RANGE frame support - backend part #14416

Merged
merged 9 commits into from
Feb 4, 2024

Conversation

stdrc
Copy link
Contributor

@stdrc stdrc commented Jan 8, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR implements the real RANGE frame support in backend executors. Resolves #11109.

The basic idea to support RANGE frame in find_affected_ranges is to:

  1. Calculate the logical order column values of the first and last current row/key for a given set of delta;
  2. Try to find the cache keys that are closest to the logical order values calculated in step 1;
  3. Compare with the first/last current key of ROWS frames, choose the minimum/maximum one as the final first/last current key;
  4. Calculate the logical order column values of the first frame start and last frame end, similar to step 1;
  5. Try to find the closest cache keys, similar to step 2;
  6. Compare with the first/last frame start/end of ROWS frames, choose the minimum/maximum one, similar to step 3.

The basic idea to support RANGE frame in WindowBuffer is to:

  1. Calculate the logical order column values of the frame start and frame end for a given current key;
  2. Try to find the closest ones in the buffer.

118x out of 17xx lines of addition are tests, reviewers plz don't panic.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

  • Support RANGE frames in window function calls.

@stdrc stdrc force-pushed the rc/refactor-over-window-rows-frame branch from 513a2a6 to cb73122 Compare January 9, 2024 09:37
@stdrc stdrc changed the base branch from rc/refactor-over-window-rows-frame to rc/expose-order-type-direction January 9, 2024 09:46
@stdrc stdrc force-pushed the rc/expose-order-type-direction branch from 7c7f9f9 to 61df00f Compare January 9, 2024 09:59
@stdrc stdrc force-pushed the rc/expose-order-type-direction branch from 61df00f to 2e3ec9a Compare January 9, 2024 10:32
@stdrc stdrc changed the base branch from rc/expose-order-type-direction to rc/easier-display-for-frame-bounds January 10, 2024 03:42
@stdrc stdrc force-pushed the rc/range-frame-backend branch 2 times, most recently from 49ea439 to 5c88fec Compare January 10, 2024 04:45
Copy link
Contributor Author

stdrc commented Jan 10, 2024

@stdrc stdrc force-pushed the rc/range-frame-backend branch 2 times, most recently from 84b2e41 to a16d90d Compare January 10, 2024 05:28
@stdrc stdrc changed the base branch from rc/easier-display-for-frame-bounds to rc/extract-range-utils January 10, 2024 05:28
@stdrc stdrc changed the base branch from rc/extract-range-utils to rc/easier-display-for-frame-bounds January 10, 2024 05:48
@stdrc stdrc force-pushed the rc/easier-display-for-frame-bounds branch from c894a25 to 3fad06a Compare January 10, 2024 06:39
@stdrc stdrc force-pushed the rc/easier-display-for-frame-bounds branch from 3fad06a to 08e5e77 Compare January 10, 2024 07:26
@stdrc stdrc force-pushed the rc/easier-display-for-frame-bounds branch from 08e5e77 to 14d0113 Compare January 10, 2024 07:38
@stdrc stdrc force-pushed the rc/easier-display-for-frame-bounds branch from 14d0113 to cf16e2d Compare January 10, 2024 08:08
Preceding(offset) => Following(offset),
CurrentRow => CurrentRow,
Following(offset) => Preceding(offset),
UnboundedFollowing => UnboundedPreceding,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that FrameBound<usize> and FrameBound<RangeFrameOffset> are so different that they don't really need to be variants of same type. How about just saparating them as 2 types, such as RowsFrameBound and RangeFrameBound?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO although they have many different methods and are used very differently, in their core they are the same concept, so I decided to introduce a generic type. Also I didn't want to repeat the 5 enum variants for two times (will be 3 times when we support GROUPS frame later).

Maybe we can type RowsFrameBound = FrameBound<usize>; to make it easier to read.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing type aliases sounds good

}

#[test]
fn test_calc_logical_for_timestamp_desc_nulls_first() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timestamp format seemed to be supported actually, but why it's disabled in the planner test?

Copy link
Contributor Author

@stdrc stdrc Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestamp and timestamptz are supported. While for timestamptz, we only support offset to be interval without month and day fields, because that requires timezone info to do calculation and is not currently supported by our Add(Timestamptz, Interval) expression.

@@ -143,6 +147,73 @@ pub(super) fn find_frame_end_for_rows_frame<'cache>(
find_boundary_for_rows_frame::<false /* RIGHT */>(frame_bounds, part_with_delta, curr_key)
}

/// Given the first and last key in delta, calculate the order values of the first
/// and the last frames logically affected by some `RANGE` frames.
pub(super) fn calc_logical_curr_for_range_frames(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Managed to understand the function, but didn't quite get the point of "logical"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "logical (current key)" I mean the current key will be affected if it exists, but in actual we may not find the logical current key in the cache + delta, hence consider the closest one as the "actual current key".

Copy link
Member

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Base automatically changed from rc/range-frame-frontend to main February 4, 2024 09:33
Signed-off-by: Richard Chien <stdrc@outlook.com>

add comment

Signed-off-by: Richard Chien <stdrc@outlook.com>

order type little things

Signed-off-by: Richard Chien <stdrc@outlook.com>

`fnd_affected_ranges` seems to work now

Signed-off-by: Richard Chien <stdrc@outlook.com>

typo

Signed-off-by: Richard Chien <stdrc@outlook.com>

typo

Signed-off-by: Richard Chien <stdrc@outlook.com>

typo

Signed-off-by: Richard Chien <stdrc@outlook.com>

simplify frame start/end calculation

Signed-off-by: Richard Chien <stdrc@outlook.com>

minor

Signed-off-by: Richard Chien <stdrc@outlook.com>

check range frame bounds

Signed-off-by: Richard Chien <stdrc@outlook.com>

add comment

Signed-off-by: Richard Chien <stdrc@outlook.com>

display RangeFrameBounds

Signed-off-by: Richard Chien <stdrc@outlook.com>

make `WindowBuffer` only handle `ROWS` frame

Signed-off-by: Richard Chien <stdrc@outlook.com>

reorder methods

Signed-off-by: Richard Chien <stdrc@outlook.com>

unify

Signed-off-by: Richard Chien <stdrc@outlook.com>

allow end-2-end range frame queries

Signed-off-by: Richard Chien <stdrc@outlook.com>

minor

Signed-off-by: Richard Chien <stdrc@outlook.com>

use real order key data type and order type

Signed-off-by: Richard Chien <stdrc@outlook.com>

store order data type and order type in `RangeFrameBounds`

Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
@stdrc stdrc enabled auto-merge February 4, 2024 14:40
@stdrc stdrc added this pull request to the merge queue Feb 4, 2024
Merged via the queue into main with commit ed643b8 Feb 4, 2024
27 of 28 checks passed
@stdrc stdrc deleted the rc/range-frame-backend branch February 4, 2024 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature user-facing-changes Contains changes that are visible to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support over window function with RANGE frame
4 participants