-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(executor): introduce DataChunkBuilder and split chunks in Order … #612
feat(executor): introduce DataChunkBuilder and split chunks in Order … #612
Conversation
…and Limit Signed-off-by: Qi Wang <wangqiim@163.com>
src/binder/expression/agg_call.rs
Outdated
"avg" => ( | ||
AggKind::Avg, | ||
Some(DataType::new(DataTypeKind::Double, false)), | ||
), | ||
"avg" => { | ||
let agg_kind = AggKind::Avg; | ||
let mut default_data_type = Some(DataType::new(DataTypeKind::Double, false)); | ||
if let Some(ref data_type) = args[0].return_type() { | ||
if let DataTypeKind::Decimal(_, _) = data_type.kind { | ||
default_data_type = args[0].return_type(); | ||
} | ||
} | ||
(agg_kind, default_data_type) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain a bit why to change this (in this PR)? It seems not relevant. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example in q1.sql.
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-12-01' - interval '71' day
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
For avg(l_quantity) as avg_qty
、avg(l_extendedprice) as avg_price
、avg(l_discount) as avg_disc
,the evaluate result is DataTypeKind::Decimal
, But The output_type of excuter will be DataTypeKind::Double
. So it will panic when DataChunkBuilder.push_row(row.values) in order.rs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I got it. I think the problem arose because when we rewrite avg
into sum / count
(a few lines below, starting from the match kind
), the return_type
mismatches left_expr
's type. Would it be better to fix the problem there?
src/executor/limit.rs
Outdated
if (start..end) == (0..cardinality) { | ||
yield batch; | ||
} else { | ||
yield batch.slice(start..end); | ||
for row in batch.rows().skip(start).take(end - start) { | ||
if let Some(chunk) = builder.push_row(row.values()) { | ||
yield chunk | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterating over batch here seems to introduce addtional overheads. Maybe we just don't need to split chunks in limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some executors need to split chunks because they iterate over the inputs row by row and then generate different distribution of data, e.g., sort, join. The inputs are splitted, but the output is a single big chunk.
For limit, it only forwards input chunks, so no addtional effort to split chunks is needed.
Signed-off-by: Qi Wang <wangqiim@163.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Others LGTM
"avg" => ( | ||
AggKind::Avg, | ||
Some(DataType::new(DataTypeKind::Double, false)), | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little bit unsure and may need some time to look into the details of how the type system work. 🤔 (Maybe some other reviewers can help confirm it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avg will be rewritten, so I think it's okay to fill any type here.
"avg" => ( | ||
AggKind::Avg, | ||
Some(DataType::new(DataTypeKind::Double, false)), | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avg will be rewritten, so I think it's okay to fill any type here.
…and Limit
issue #611
when aggregated coloum datatype is Decimal.and Limit.Signed-off-by: Qi Wang wangqiim@163.com