Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(executor): introduce DataChunkBuilder and split chunks in Order … #612

Merged
merged 3 commits into from
Apr 9, 2022
Merged

feat(executor): introduce DataChunkBuilder and split chunks in Order … #612

merged 3 commits into from
Apr 9, 2022

Conversation

wangqiim
Copy link
Contributor

@wangqiim wangqiim commented Apr 7, 2022

…and Limit
issue #611

  1. Fix panic bug and add test cases.
select * 
from t 
limit 0 offset 3
  1. Don't modify average_agg return type when aggregated coloum datatype is Decimal.
  2. Introduce DataChunkBuilder and split chunks in Order and Limit.

Signed-off-by: Qi Wang wangqiim@163.com

…and Limit

Signed-off-by: Qi Wang <wangqiim@163.com>
Comment on lines 92 to 101
"avg" => (
AggKind::Avg,
Some(DataType::new(DataTypeKind::Double, false)),
),
"avg" => {
let agg_kind = AggKind::Avg;
let mut default_data_type = Some(DataType::new(DataTypeKind::Double, false));
if let Some(ref data_type) = args[0].return_type() {
if let DataTypeKind::Decimal(_, _) = data_type.kind {
default_data_type = args[0].return_type();
}
}
(agg_kind, default_data_type)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain a bit why to change this (in this PR)? It seems not relevant. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example in q1.sql.

select
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
from
    lineitem
where
    l_shipdate <= date '1998-12-01' - interval '71' day
group by
    l_returnflag,
    l_linestatus
order by
    l_returnflag,
    l_linestatus;

For avg(l_quantity) as avg_qtyavg(l_extendedprice) as avg_priceavg(l_discount) as avg_disc,the evaluate result is DataTypeKind::Decimal, But The output_type of excuter will be DataTypeKind::Double. So it will panic when DataChunkBuilder.push_row(row.values) in order.rs.

Copy link
Member

@xxchan xxchan Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I got it. I think the problem arose because when we rewrite avg into sum / count (a few lines below, starting from the match kind), the return_type mismatches left_expr's type. Would it be better to fix the problem there?

Comment on lines 36 to 38
if (start..end) == (0..cardinality) {
yield batch;
} else {
yield batch.slice(start..end);
for row in batch.rows().skip(start).take(end - start) {
if let Some(chunk) = builder.push_row(row.values()) {
yield chunk
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterating over batch here seems to introduce addtional overheads. Maybe we just don't need to split chunks in limit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some executors need to split chunks because they iterate over the inputs row by row and then generate different distribution of data, e.g., sort, join. The inputs are splitted, but the output is a single big chunk.

For limit, it only forwards input chunks, so no addtional effort to split chunks is needed.

Signed-off-by: Qi Wang <wangqiim@163.com>
Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others LGTM

Comment on lines -92 to -95
"avg" => (
AggKind::Avg,
Some(DataType::new(DataTypeKind::Double, false)),
),
Copy link
Member

@xxchan xxchan Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little bit unsure and may need some time to look into the details of how the type system work. 🤔 (Maybe some other reviewers can help confirm it)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avg will be rewritten, so I think it's okay to fill any type here.

@xxchan xxchan requested a review from pleiadesian April 8, 2022 22:30
Comment on lines -92 to -95
"avg" => (
AggKind::Avg,
Some(DataType::new(DataTypeKind::Double, false)),
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avg will be rewritten, so I think it's okay to fill any type here.

@skyzh skyzh enabled auto-merge (squash) April 9, 2022 04:51
@skyzh skyzh merged commit a8e0d4c into risinglightdb:main Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants