Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(streaming): use xxhash64 for hash key in cache #9163

Merged
merged 6 commits into from
Apr 14, 2023
Merged

Conversation

BugenZhao
Copy link
Member

@BugenZhao BugenZhao commented Apr 13, 2023

Signed-off-by: Bugen Zhao i@bugenzhao.comI hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Resolve the long-standing issue of #5579. Also, add a generic for HashCode to avoid misusing between different hashers.

We may perform some benchmarks to see whether...

  • xxHash is fast enough,
  • and there are frequent key conflicts before (cc @kwannoel)

This PR does not introduce breaking changes, as this only affects the hash code for hash maps and the hasher of hash-shuffle or consistent hash is still Crc32.

Close #5579.

Checklist For Contributors

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer to the issue)
  • All checks passed in ./risedev check (or alias, ./risedev c)

Checklist For Reviewers

  • I have requested macro/micro-benchmarks as this PR can affect performance substantially, and the results are shown.

Documentation

  • My PR DOES NOT contain user-facing changes.
Click here for Documentation

Types of user-facing changes

Please keep the types that apply to your changes, and remove the others.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
@kwannoel
Copy link
Contributor

Running q17 bench to see performance, give me a while

@BugenZhao
Copy link
Member Author

Running q17 bench to see performance, give me a while

Thanks for your being so responsive! 😍

@st1page
Copy link
Contributor

st1page commented Apr 13, 2023

Running q17 bench to see performance, give me a while

Thanks for your being so responsive! heart_eyes

+1 😍

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
@kwannoel
Copy link
Contributor

kwannoel commented Apr 13, 2023

Slight improvement from q17 micro bench (may be just noise 😆 )

cargo bench
Q17/benchmark_hash_agg  time:   [7.1066 s 7.1511 s 7.2039 s]
                        change: [-3.4729% -2.6952% -1.8194%] (p = 0.00 < 0.05)
                        Performance has improved.

Will try out q17 nexmark bench later, benchmark instance still compiling the code 😅

@kwannoel
Copy link
Contributor

and there are frequent key conflicts before (cc @kwannoel)

Not too sure if frequent key conflicts OR it's just many duplicate keys. I'm guessing it may be the latter, and if so there's not much we can do about it by changing the hash type.

Need to verify kube-bench workload to know for sure. Let me open an issue to track this.

@codecov
Copy link

codecov bot commented Apr 13, 2023

Codecov Report

Merging #9163 (e5ee601) into main (9bf890a) will decrease coverage by 0.02%.
The diff coverage is 70.00%.

@@            Coverage Diff             @@
##             main    #9163      +/-   ##
==========================================
- Coverage   70.86%   70.85%   -0.02%     
==========================================
  Files        1200     1200              
  Lines      199974   199986      +12     
==========================================
- Hits       141711   141696      -15     
- Misses      58263    58290      +27     
Flag Coverage Δ
rust 70.85% <70.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/batch/src/task/hash_shuffle_channel.rs 0.00% <0.00%> (ø)
src/common/src/hash/key.rs 78.43% <64.00%> (-0.68%) ⬇️
src/common/src/row/mod.rs 67.59% <66.66%> (ø)
src/common/src/util/hash_util.rs 85.71% <75.00%> (-4.29%) ⬇️
src/batch/src/executor/hash_agg.rs 90.98% <100.00%> (ø)
src/common/src/array/data_chunk.rs 90.73% <100.00%> (ø)
src/common/src/hash/consistent_hash/vnode.rs 91.35% <100.00%> (ø)
src/common/src/row/owned_row.rs 87.68% <100.00%> (ø)

... and 9 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@kwannoel
Copy link
Contributor

Looks good, seems like 20% improvement

Screenshot 2023-04-13 at 5 51 43 PM

@kwannoel
Copy link
Contributor

Let me run it for longer and see how it goes

@kwannoel
Copy link
Contributor

Okay LGTM it is done processing.

@kwannoel
Copy link
Contributor

I will run a longer bench tmr with 1B records

@lmatz
Copy link
Contributor

lmatz commented Apr 13, 2023

Hash Joins should also improve this massive I suppose

@kwannoel
Copy link
Contributor

1B:
Screenshot 2023-04-14 at 12 14 42 PM

Copy link
Contributor

@lmatz lmatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impressive improvement

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
@BugenZhao BugenZhao added this pull request to the merge queue Apr 14, 2023
Merged via the queue into main with commit eb1e433 Apr 14, 2023
5 checks passed
@BugenZhao BugenZhao deleted the bz/different-hashes branch April 14, 2023 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/fix Bug fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should use different hash for exchange and operators
6 participants