Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(batch): support hash join non-equi condition in java frontend #1713

Merged
merged 9 commits into from
Apr 11, 2022

Conversation

yuhao-su
Copy link
Contributor

@yuhao-su yuhao-su commented Apr 8, 2022

What's changed and what's your intention?

  • support hash join non-equi condition in java frontend
  • fix bugs about visibility and cardinality

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

#1621

@yuhao-su yuhao-su marked this pull request as draft April 8, 2022 10:18
@CLAassistant
Copy link

CLAassistant commented Apr 8, 2022

CLA assistant check
All committers have signed the CLA.

@yuhao-su yuhao-su changed the title WIP feat(batch): support hash join non-equi condition in java frontend feat(batch): support hash join non-equi condition in java frontend Apr 10, 2022
@yuhao-su yuhao-su marked this pull request as ready for review April 10, 2022 18:46
@codecov
Copy link

codecov bot commented Apr 10, 2022

Codecov Report

Merging #1713 (191c814) into main (73eb449) will increase coverage by 0.08%.
The diff coverage is 85.45%.

@@            Coverage Diff             @@
##             main    #1713      +/-   ##
==========================================
+ Coverage   71.29%   71.37%   +0.08%     
==========================================
  Files         599      599              
  Lines       77672    77722      +50     
==========================================
+ Hits        55377    55476      +99     
+ Misses      22295    22246      -49     
Flag Coverage Δ
rust 71.37% <85.45%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/batch/src/executor/projection.rs 72.00% <ø> (ø)
src/expr/src/expr/expr_input_ref.rs 68.57% <63.63%> (-3.43%) ⬇️
src/batch/src/executor/join/hash_join.rs 85.57% <81.81%> (+1.57%) ⬆️
src/batch/src/executor/hash_agg.rs 92.22% <100.00%> (+0.02%) ⬆️
src/batch/src/executor/join/hash_join_state.rs 88.78% <100.00%> (+6.70%) ⬆️
...rontend/src/optimizer/plan_node/batch_hash_join.rs 96.42% <100.00%> (+0.04%) ⬆️
src/common/src/types/ordered_float.rs 23.50% <0.00%> (-0.20%) ⬇️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@@ -161,19 +161,19 @@ impl<K: HashKey + Send + Sync> Executor for HashJoinExecutor<K> {
match take(&mut self.state) {
HashJoinState::FirstProbe(probe_table) => {
let ret = self.probe(true, probe_table).await?;
if let Some(data_chunk) = ret {
if let Some(data_chunk) = ret && data_chunk.cardinality() > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we assume that child executor should never return 0 cardinality data chunk, we should return an error rather skip it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible for an executor to generate a 0 cardinality chunk, but it just shouldn't output it to the downstream executor. Why consider it as an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but why we may produce an empty data chunk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible situation is inner join with non-equi conditions. In our implementation, we first generate a chunk with equi condition and then filter out result by non-equi condition. If we are unlucky, all result rows are filtered out, the we get and 0 cardinality chunk. This may happen in other executors as well. We should investigate that later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to implement never return 0 len data chunk. But we can do this later after refactored with stream api.

src/batch/src/executor/join/hash_join.rs Outdated Show resolved Hide resolved
ensure!(probe_data_chunk.cardinality() > 0);
// TODO(yuhao): We should make sure the output chunk of upstream executor
// has cardinality > 0.
// ensure!(probe_data_chunk.cardinality() > 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we can't ensure this here until we check all executor implementations to make sure output chunk has cardinality > 0

@yuhao-su yuhao-su enabled auto-merge (squash) April 11, 2022 03:23
@yuhao-su yuhao-su merged commit f364656 into main Apr 11, 2022
@yuhao-su yuhao-su deleted the support_non-equi_condition_in_java_front_end branch April 11, 2022 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants