Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(streaming): apply StateTable to hash join #3085

Merged
merged 27 commits into from
Jun 16, 2022

Conversation

yuhao-su
Copy link
Contributor

@yuhao-su yuhao-su commented Jun 8, 2022

What's changed and what's your intention?

  • apply StateTable to hash join
  • solve unflushed degree issue

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

#2794
#2795

@yuhao-su yuhao-su changed the title (WIP)feat(streaming): apply StateTable to hash join feat(streaming): apply StateTable to hash join Jun 14, 2022
@codecov
Copy link

codecov bot commented Jun 15, 2022

Codecov Report

Merging #3085 (68f114d) into main (68f114d) will not change coverage.
The diff coverage is n/a.

❗ Current head 68f114d differs from pull request most recent head 23a222c. Consider uploading reports for the commit 23a222c to get more accurate results

@@           Coverage Diff           @@
##             main    #3085   +/-   ##
=======================================
  Coverage   73.22%   73.22%           
=======================================
  Files         747      747           
  Lines      101907   101907           
=======================================
  Hits        74619    74619           
  Misses      27288    27288           
Flag Coverage Δ
rust 73.22% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@yuhao-su yuhao-su requested a review from BugenZhao June 15, 2022 09:41
Copy link
Contributor

@BowenXiao1999 BowenXiao1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this PR consists of mutiple changes:

  • Update on mem_table.
  • Integrate StateTable
    etc

@yuhao-su
Copy link
Contributor Author

It seems this PR consists of mutiple changes:

  • Update on mem_table.
  • Integrate StateTable
    etc

Need to fix #2794 or test fails. To do that we must have update.

Copy link
Contributor

@BowenXiao1999 BowenXiao1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. The hash agg is more natural to integrate with StateTable so the logic looks clean.

But indeed this PR can be divided (?).

Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM.

/// Get an owned `Row` by the given `indices` from current row.
///
/// Use `datum_refs_by_indices` if possible instead to avoid allocating owned datums.
pub fn by_indices(&self, indices: &[usize]) -> Row {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off the topic: We should really review the current implementation of serialize functions. In most cases there's no need to allocate a Row.

/// When evicted, `cached` does not hold any entries.
/// If a `JoinEntryState` exists for a join key, the all records under this
/// join key will be presented in the cache.
pub struct JoinEntryState {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this struct is definitely a BTreeMap now. 🤣 How about implement Deref for it?

src/stream/src/executor/managed_state/join/mod.rs Outdated Show resolved Hide resolved
src/stream/src/executor/managed_state/join/mod.rs Outdated Show resolved Hide resolved
Comment on lines +298 to +300
let old_row = join_row.clone().into_row();
join_row.inc_degree();
let new_row = join_row.clone().into_row();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be too costly. 😢 Let's optimize it later.

@yuhao-su
Copy link
Contributor Author

Overall LGTM. The hash agg is more natural to integrate with StateTable so the logic looks clean.

But indeed this PR can be divided (?).

Yes! But I might keep changing the API anyway

@yuhao-su yuhao-su enabled auto-merge (squash) June 16, 2022 05:34
@yuhao-su yuhao-su merged commit ffe54bb into main Jun 16, 2022
@yuhao-su yuhao-su deleted the hash_join_with_state_table branch June 16, 2022 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants