-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(join): leverage band conditions in hash join #8749
Conversation
Codecov Report
@@ Coverage Diff @@
## main #8749 +/- ##
==========================================
+ Coverage 70.79% 70.80% +0.01%
==========================================
Files 1171 1171
Lines 193943 194337 +394
==========================================
+ Hits 137298 137608 +310
- Misses 56645 56729 +84
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 7 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
769e364
to
c18da29
Compare
7f24aa1
to
964c90a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Could we add a
watermark
inJoinSide
and do the state cleaning logic inside theJoinSide
? - considering if we have a side's entry it is inserted but never matched, so we can not clean. we need to filter the useless data with the watermark when it is inserted.
@@ -291,6 +291,30 @@ macro_rules! impl_has_variant { | |||
|
|||
impl_has_variant! {InputRef, Literal, FunctionCall, AggCall, Subquery, TableFunction, WindowFunction} | |||
|
|||
#[derive(Debug, Clone, PartialEq, Eq, Hash)] | |||
pub(crate) struct InequalityInputPair { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BandJoin
indicates a BETWEEN sematic, while here one side inequality in enough.
|
@@ -291,6 +291,30 @@ macro_rules! impl_has_variant { | |||
|
|||
impl_has_variant! {InputRef, Literal, FunctionCall, AggCall, Subquery, TableFunction, WindowFunction} | |||
|
|||
#[derive(Debug, Clone, PartialEq, Eq, Hash)] | |||
pub(crate) struct InequalityInputPair { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BandJoin
indicates a BETWEEN sematic, while here one side inequality in enough.
@@ -77,6 +77,25 @@ | |||
| └─StreamTableScan { table: t1, columns: [t1.ts, t1.v1, t1.v2, t1._row_id], pk: [t1._row_id], dist: UpstreamHashShard(t1._row_id) } | |||
└─StreamExchange { dist: HashShard(t2.ts) } | |||
└─StreamTableScan { table: t2, columns: [t2.ts, t2._row_id], pk: [t2._row_id], dist: UpstreamHashShard(t2._row_id) } | |||
- name: band hash join | |||
sql: | | |||
create table t1 (ts timestamp with time zone, v1 int, v2 int, watermark for ts as ts - INTERVAL '1' SECOND) append only; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestion: Maybe it would also be helpful to add a test that evaluates the query and not just looks at the plan. Inspired by the PSQL tests:
CREATE TABLE t1 (
c1 int,
c2 int,
c3 int,
c4 timestamptz,
c5 timestamp,
c6 varchar,
c7 varchar,
c8 int,
CONSTRAINT t1_pkey PRIMARY KEY (c1)
);
CREATE TABLE t2 (
c1 int,
c2 text,
CONSTRAINT t2_pkey PRIMARY KEY (c1)
);
CREATE TABLE t4 (
c1 int,
c2 int,
c3 text,
CONSTRAINT t4_pkey PRIMARY KEY (c1)
);
INSERT INTO t4
SELECT id,
id + 1,
'AAA'
FROM generate_series(1, 100, 1) as t(id);
DELETE FROM t4 WHERE c1 % 3 != 0;
INSERT INTO t1
SELECT id,
id % 10,
id,
'1970-01-01'::timestamptz + ((id % 100) || ' days')::interval,
'1970-01-01'::timestamp + ((id % 100) || ' days')::interval,
id % 10,
id % 10,
1
FROM generate_series(1, 1000, 1) as t(id);
INSERT INTO t2
SELECT id,
'AAA'
FROM generate_series(1, 100, 1) t(id);
SELECT t1.c1, t2.c1 FROM (SELECT c1 FROM t4 WHERE c1 between 50 and 60) t1 FULL JOIN t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestion: Maybe it would also be helpful to add a test that evaluates the query and not just looks at the plan. Inspired by the PSQL tests:
CREATE TABLE t1 ( c1 int, c2 int, c3 int, c4 timestamptz, c5 timestamp, c6 varchar, c7 varchar, c8 int, CONSTRAINT t1_pkey PRIMARY KEY (c1) ); CREATE TABLE t2 ( c1 int, c2 text, CONSTRAINT t2_pkey PRIMARY KEY (c1) ); CREATE TABLE t4 ( c1 int, c2 int, c3 text, CONSTRAINT t4_pkey PRIMARY KEY (c1) ); INSERT INTO t4 SELECT id, id + 1, 'AAA' FROM generate_series(1, 100, 1) as t(id); DELETE FROM t4 WHERE c1 % 3 != 0; INSERT INTO t1 SELECT id, id % 10, id, '1970-01-01'::timestamptz + ((id % 100) || ' days')::interval, '1970-01-01'::timestamp + ((id % 100) || ' days')::interval, id % 10, id % 10, 1 FROM generate_series(1, 1000, 1) as t(id); INSERT INTO t2 SELECT id, 'AAA' FROM generate_series(1, 100, 1) t(id); SELECT t1.c1, t2.c1 FROM (SELECT c1 FROM t4 WHERE c1 between 50 and 60) t1 FULL JOIN t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1;
The example is extremely helpful. Thank you very much.
… do not generate internal watermark
9fda66c
to
7567882
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you for your effort.
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Leverage band conditions in hash join.
Emit watermarks by band conditions in non-equi conditions in join condition of streaming hash join.
Clean state by band conditions in non-equi conditions in join condition in streaming hash join executor.
Checklist For Contributors
./risedev check
(or alias,./risedev c
)Checklist For Reviewers
Documentation
Click here for Documentation
Types of user-facing changes
Please keep the types that apply to your changes, and remove the others.
Release note