Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize MPP Broadcast Join #7084

Closed
3 tasks done
solotzg opened this issue Mar 16, 2023 · 0 comments
Closed
3 tasks done

Optimize MPP Broadcast Join #7084

solotzg opened this issue Mar 16, 2023 · 0 comments
Assignees
Labels
type/enhancement Issue or PR for enhancement type/performance

Comments

@solotzg
Copy link
Contributor

solotzg commented Mar 16, 2023

MPP Broadcast Join Enhancement

Implementation

ref pingcap/tidb#40494

TiDB

  • support data compression in Broadcast / Passthrough exchange operator
  • add new session var: tidb_prefer_broadcast_join_by_exchange_data_size (release-7.1):
    • OFF: same behavior like <= v7.0
    • ON: compare data exchange size of join and choose the smallest one
      • tidb_broadcast_join_threshold_size and tidb_broadcast_join_threshold_count will be ignored if planner success to determine whether need broadcast.
  • refine the process about checking broadcast join
    • Broadcast exchange size:
      • Build: (mppStoreCnt - 1) * sizeof(BuildTable)
      • Probe: 0
      • exchange size: Build * (mppStoreCnt^?)
      • choose the child plan with the maximum approximate value as Probe
    • Shuffle exchange size:
      • Build: sizeof(BuildTable) * (mppStoreCnt - 1) / mppStoreCnt
      • Probe: sizeof(ProbeTable) * (mppStoreCnt - 1) / mppStoreCnt
      • exchange size: (Build + Probe) * (mppStoreCnt^?)
    • The size of Build hash table when using shuffle join is about sizeof(broadcast join hash table) / (mppStoreCnt). It will cost more time to construct Build hash table and search Probe while using broadcast join. Set a scale factor (mppStoreCnt^?) when estimating broadcast join in isJoinFitMPPBCJ and isJoinChildFitMPPBCJ (based on TPCH benchmark).
    • If the approximate exchange size of broadcast is smaller than hash, then we need to choose broadcast way.

TiFlash

Support data compression in Broadcast / Passthrough exchange operator

  • support data compression in broadcast/passthrough exchange
  • update parts of utils modules
  • For StorageDisaggregated, use latest mpp version for task meta.
    • TODO: enable data compression if necessary

image

Progress

Benchmark

ENV

Original Threshold x 10: only update threshold about broadcast join

  • set tidb_broadcast_join_threshold_count=102400
  • set tidb_broadcast_join_threshold_size=1048576000
  • set tidb_prefer_broadcast_join_by_exchange_data_size=off

Optimize BCJ

  • set tidb_prefer_broadcast_join_by_exchange_data_size=on

Time(s) Original Optimize BCJ Original Threshold x 10 Performance(QPS) Improvement: (Original) / (Optimize BCJ) - 1.0 Performance(QPS) Improvement: (Original) / ( Original Threshold x 10 ) - 1.0
Q1 2.79 2.79 2.92 0.00% -4.45%
Q2 1.11 0.97 1.11 14.43% 0.00%
Q3 3.05 2.52 3.05 21.03% 0.00%
Q4 2.38 2.45 2.38 -2.86% 0.00%
Q5 6.48 4.33 6.14 49.65% 5.54%
Q6 0.7 0.64 0.7 9.38% 0.00%
Q7 2.99 2.72 2.32 9.93% 28.88%
Q8 4.87 2.18 5.07 123.39% -3.94%
Q9 17.01 13.79 12.72 23.35% 33.73%
Q10 3.39 3.59 3.39 -5.57% 0.00%
Q11 0.84 0.5 0.44 68.00% 90.91%
Q12 1.51 1.31 1.31 15.27% 15.27%
Q13 3.12 3.19 3.19 -2.19% -2.19%
Q14 0.70 0.77 0.84 -9.09% -16.67%
Q15 1.51 1.58 1.58 -4.43% -4.43%
Q16 0.84 0.84 0.84 0.00% 0.00%
Q17 6.41 6.48 6.48 -1.08% -1.08%
Q18 6.95 5.87 6.74 18.40% 3.12%
Q19 1.71 1.85 1.78 -7.57% -3.93%
Q20 1.38 1.44 1.51 -4.17% -8.61%
Q21 8.36 7.21 7.08 15.95% 18.08%
Q22 0.7 0.64 0.64 9.38% 9.38%
SUM 78.8 67.66 72.23 16.46% 9.10%
  Original Optimize BCJ Original Threshold x 10 NET/IO Reduction:(Original-(Optimize BCJ))/Original NET/IO Reduction:(Original-( Original Threshold x 10))/Original
Total Exchange Size By NET (GB) 75.894 22.823 52.376 69.93% 30.99%

After pingcap/tidb#42915

Time Cost(s) Original Optimize BCJ Performance(QPS) Improvement
Q1 2.79 2.85 -2.11%
Q2 1.11 0.97 14.43%
Q3 3.05 3.05 0.00%
Q4 2.38 2.38 0.00%
Q5 6.48 4.66 39.06%
Q6 0.7 0.7 0.00%
Q7 2.99 2.18 37.16%
Q8 4.87 2.11 130.81%
Q9 17.01 10.57 60.93%
Q10 3.39 3.32 2.11%
Q11 0.84 0.44 90.91%
Q12 1.51 1.58 -4.43%
Q13 3.12 3.19 -2.19%
Q14 0.77 0.77 0.00%
Q15 1.51 1.58 -4.43%
Q16 0.84 0.84 0.00%
Q17 6.41 6.27 2.23%
Q18 6.95 6.74 3.12%
Q19 1.78 1.71 4.09%
Q20 1.38 1.38 0.00%
Q21 8.36 7.28 14.84%
Q22 0.7 0.64 9.38%
SUM 78.94 65.21 21.06%
  Original Optimize BCJ NET/IO Reduction:(Original-(Optimize BCJ))/Original
Total Exchange Size By NET (GB) 75.894 30.216 60.19%

Benchmark: One MPP Store

ENV

Time(s) Original Optimize BCJ Performance(QPS) Improvement: (Original) / (Optimize BCJ) - 1.0
Q1 8.15 8.15 0.00%
Q2 2.52 2.25 12.00%
Q3 5.94 4.4 35.00%
Q4 4.53 4.6 -1.52%
Q5 14.13 10.7 32.06%
Q6 1.98 1.91 3.66%
Q7 6.61 5.34 23.78%
Q8 9.56 5.47 74.77%
Q9 38.82 29.83 30.14%
Q10 8.36 7.15 16.92%
Q11 1.71 1.04 64.42%
Q12 4.19 3.46 21.10%
Q13 8.62 8.42 2.38%
Q14 2.05 2.11 -2.84%
Q15 4.13 4.13 0.00%
Q16 2.18 2.05 6.34%
Q17 13.72 13.79 -0.51%
Q18 17.28 15.07 14.66%
Q19 5.00 5.00 0.00%
Q20 3.05 3.05 0.00%
Q21 16.81 13.79 21.90%
Q22 1.31 1.24 5.65%
SUM 180.65 152.95 18.11%
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Issue or PR for enhancement type/performance
Projects
None yet
Development

No branches or pull requests

1 participant