Replies: 4 comments 6 replies
-
Beta Was this translation helpful? Give feedback.
-
|
@JkSelf Thanks for the detailed description IMO, conceptually, it seems like the difference we are discussing is not in the engine architecture, but rather how the HashTable is built and injected. Which is orthogonal to #15754 which just provides a cache to store and reference HashTables during joins, regardless of origin. For instance,
#15754 is agnostic to this aspect. It is just a mechanism for storing HashTables and referencing them throughout the join execution. I wonder if for Spark/Gluten, is it just matter of populating the cache by changing, reusedHashTableInfo_(std::move(reusedHashTableInfo)to auto* cache = HashTableCache::instance();
cache->put(cacheKey(), table, joinHasNullKeys_);Then all your HashBuild operators will have cache hit (instead of presto approach where 1st HashBuild has cache miss), but everything else should work out of the box ? |
Beta Was this translation helpful? Give feedback.
-
|
Discussed this with @JkSelf offline and will make the change to extend hash table cache API to fit Gluten use case by passing the pre-built hash table from Gluten runtime and the hash build internal workflow with hash table cache should remain the same (pretty much as @shrinidhijoshi suggested). cc @shrinidhijoshi @mbasmanova |
Beta Was this translation helpful? Give feedback.
-
|
Thank you @xiaoxmeng @JkSelf. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Background and Problem
Gluten currently faces severe memory issues when implementing Broadcast Hash Join (see Gluten Issue #7548). Since Gluten's implementation follows the Shuffle Hash Join approach, each task in every executor independently builds its own hash table, which leads to:
Our Proposed Solution
To address these issues, we proposed two optimization strategies in our design document:
Solution 1: Executor-level Hash Table Reuse
Solution 2: Driver-side Pre-build (Consistent with Spark Architecture)
To support these two solutions at the Velox layer, we submitted PR #13041, which adds the capability for the HashBuild operator to accept pre-built hash tables.
Recent Developments and Architectural Considerations
During the review process of PR #13041, the community introduced PR #15754, which implements Presto-based Broadcast Hash Table Caching. While we appreciate this contribution and recognize its value for Presto workloads, we've identified that this approach addresses a different use case and cannot fully satisfy Gluten's requirements due to fundamental architectural differences between Spark and Presto.
Spark Architecture:
Why Both Approaches Are Needed
PR #15754's implementation is designed for Presto's architecture where hash table construction happens within the HashJoin operator. However, Spark's framework fundamentally differs in that. Adapting Gluten to use PR #15754's approach would require:
We believe both approaches have merit and serve different architectural needs. We respectfully request that the Velox community consider supporting both solutions:
@mbasmanova @pedroerp @xiaoxmeng @shrinidhijoshi @FelixYBW @zhouyuan @jinchengchenghh @rui-mo @zhli1142015 @zhztheplayer @marin-ma
Looking forward to your insights. Thanks.
Beta Was this translation helpful? Give feedback.
All reactions