-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve hoisted literals code generation #2
Comments
Hi, can I tackle this one? |
Yes, good one for you. |
@psockali You can use |
Hi @asuhan , I will do so. |
Hey @asuhan I think I need to be much more clearer when addressing things in mapd. (e.g. literal in LLVM IR, literal in REX, literal in Analyzer::Exp, etc...). Finally managed to look at this in more detail and it turns out that Currently I would propose to scan for I would then generate the memory load for each literal in I would also make this, in effect, a configurable behaviour and would probably introduce a new compiler option flag. I hope to start tomorrow on it, if there are no objections to the above approach. |
Hi @psockali, your approach sounds reasonable. Add a constant scanning visitor, generate corresponding literal loads in the first block of the query func and feed these values to the always_inlined You mentioned |
Hey @shtilman, as usual: theory does not reflect reality. It turns out that during LLVM IR generation This invalidates the proposed approach, because scanning the source I have to rethink .. please bear with me. |
Hey @shtilman, @asuhan mentioned two approaches:
I think there is an alternative way, but it would require to clone the row_function (in order to correct the function type with the hoisted variable). Either way, let me first try the hybrid approach, it probably will be complicated enough. |
Comments and discussion are in #145 PR. |
…ay (#2) Correctly merge separate arrow arrays offets to one chunk offset array
* Register local / global hint in Calcite * Support g_ prefix for global query hint name * Translate global hint in analyzer * Add tests * Apply comments #1: global hint registration * Apply comments #2: global hint flag identification * Apply comments #3: global hint translation * Apply comments #4: remove unnecessary virtual keyword * Fixup a bug on allow_gpu_hashtable build hint for overlaps join * Fixup a bug related to a query having multiple identical subqueries * Add global hint tests related to overlaps join hashtable * Apply comments #5: misc cleanup
…tions * Pass work_unit to deliver the body node to extract hash table cache key * Separate partition sorting and actual computing * Pass cache_key to window context object * Disallow hash table cache access for invalid join type * Support hash table cache for window context * Support sorted partitoon cache for window context * Address comment #1: advocate std::memcpy for copying sorted partition * Address comment #2: move common logic used in both perfect/baseline join hash table to HashJoin.h * Address comment #3: improve payload_copy logic Co-authored-by: yoonminnam <yoon-min.nam@heavy.ai>
* Add necessary method to propagate update query info in the projection node * Enhance related logic to properly handle update query with window function expression * Add tests * Enable window func update query on non-fragmented table in CtasUpdateTest * Misc cleanup * Disallow the update query having window func in dist mode * Address comment #1: change test queries to get deterministic query results * Address comment #2: revert const qualifier Co-authored-by: yoonminnam <yoon-min.nam@heavy.ai>
* Support ms / us / ns time units for INTERVAL keyword * Add few constants related to small time units and a new ExtractField to represent invalid status * Trasnlate window frame bound expression for range mode with date / time / timestamp types * Fixup null value generator for date / time / timestamp types for codegen * Implement fixed length date value encoder * Implement computation logic of search range of aggregate tree for timeinterval type ordering col * Cast timestamp col var iff it is encoded type * Support codegen * Fix issue #1 : compute size of aggregate tree properly * Fix issue #2 : finding null range from sorted column * Fix issue #3 : handling edge case while computing window aggregation over the frame * Add tests * Disable range mode with date type ordering column in dist mode * Address comments v1
… considering `max_gpu_slab_size` (#7133) * Cleaup perfect join hash table init * Cleanup baseline join hashtable init * Add rowid size getter * Introduce hash table entry info class in perfect join hash table * Introduce hash table entry info class in baseline join hash table * Introduce hash table entry info class in overlaps and range join hash tables * Fixup init log related to hash table layout * Fixup hash table init on an empty table * Control the maximum hash table by # hash entries for CPU and its size for GPU * Address comment #1: revert partial_err checking logic in baseline join hash table builder * Address comment #2 Signed-off-by: jack <jack@omnisci.com>
… considering `max_gpu_slab_size` (#7133) * Cleaup perfect join hash table init * Cleanup baseline join hashtable init * Add rowid size getter * Introduce hash table entry info class in perfect join hash table * Introduce hash table entry info class in baseline join hash table * Introduce hash table entry info class in overlaps and range join hash tables * Fixup init log related to hash table layout * Fixup hash table init on an empty table * Control the maximum hash table by # hash entries for CPU and its size for GPU * Address comment #1: revert partial_err checking logic in baseline join hash table builder * Address comment #2 Signed-off-by: jack <jack@omnisci.com>
… considering `max_gpu_slab_size` (#7133) * Cleaup perfect join hash table init * Cleanup baseline join hashtable init * Add rowid size getter * Introduce hash table entry info class in perfect join hash table * Introduce hash table entry info class in baseline join hash table * Introduce hash table entry info class in overlaps and range join hash tables * Fixup init log related to hash table layout * Fixup hash table init on an empty table * Control the maximum hash table by # hash entries for CPU and its size for GPU * Address comment #1: revert partial_err checking logic in baseline join hash table builder * Address comment #2 Signed-off-by: jack <jack@omnisci.com>
…ion (#7185) * Add runtime function to create null StringView * Support projection of none encoded string column after left join * Address comment #1: create null StringView obj * Address comment #2: avoid compilation error * Address comments #3: improve codegen logic Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
…ion (#7185) * Add runtime function to create null StringView * Support projection of none encoded string column after left join * Address comment #1: create null StringView obj * Address comment #2: avoid compilation error * Address comments #3: improve codegen logic Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
* Renaming OverlapsJoinHashTable class to BoundingBoxIntersectJoinHashTable * Rename global flag g_enable_overlaps_hashjoin to g_enable_bbox_intersect_hashjoin * Rename overlaps hash join tuner * Rename kOVERLAPS SQL_OP * Fixup serialization type toString * Rename convertOverlaps * Rename cache types * Rename is_overlaps_oper() function * Rename query rewriter logic * Change server configurations v2 * Change namings in query rewriter * Completely remove overlaps keyword used in codebase * Address comments #1 * Address comment #2 Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
* Renaming OverlapsJoinHashTable class to BoundingBoxIntersectJoinHashTable * Rename global flag g_enable_overlaps_hashjoin to g_enable_bbox_intersect_hashjoin * Rename overlaps hash join tuner * Rename kOVERLAPS SQL_OP * Fixup serialization type toString * Rename convertOverlaps * Rename cache types * Rename is_overlaps_oper() function * Rename query rewriter logic * Change server configurations v2 * Change namings in query rewriter * Completely remove overlaps keyword used in codebase * Address comments #1 * Address comment #2 Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
* Fixup the function's logic * Address comments * Address comments #2 Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
* Determine the size of projection buffer based on filtered per-device cardinality when applicable * Address comments * Add logging about mem alloc * Add loggig about cardinality estimation * Address comments #1 * Address comments #2 * Disable dist test; cannot access executor instance from QR * Add test case for sharded table (in both single and dist) Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
* Rename device bitmap ptr variable and its getter * Rename count distinct bitmap host mem ptr getter * Rename count_distinct device mem ptr variable * Improve buf allocation logic for count distinct * Cleanup code & improve logic for mode and tdigest * Cleanup logic v2 * Address comments * Address comments #2 Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
* Fixup the function's logic * Address comments * Address comments #2 Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
* Determine the size of projection buffer based on filtered per-device cardinality when applicable * Address comments * Add logging about mem alloc * Add loggig about cardinality estimation * Address comments #1 * Address comments #2 * Disable dist test; cannot access executor instance from QR * Add test case for sharded table (in both single and dist) Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
* Rename device bitmap ptr variable and its getter * Rename count distinct bitmap host mem ptr getter * Rename count_distinct device mem ptr variable * Improve buf allocation logic for count distinct * Cleanup code & improve logic for mode and tdigest * Cleanup logic v2 * Address comments * Address comments #2 Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
* Fixup the function's logic * Address comments * Address comments #2 Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
* Determine the size of projection buffer based on filtered per-device cardinality when applicable * Address comments * Add logging about mem alloc * Add loggig about cardinality estimation * Address comments #1 * Address comments #2 * Disable dist test; cannot access executor instance from QR * Add test case for sharded table (in both single and dist) Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
* Add buffer holders for GPU execution * Rename structures used to codegen * Introduce WindowFunctionCtx namespace * Add preparation for GPU execution in window ctx * Cleanup & improve WindowFunctionContext::compute() * Improve a logic to build aggregate tree w/ supporting reusing * Improve segment tree constructor * Rebase * Address comments #1 * Address comments #2: refactor bool param functions * Address comments #3 * Address comments #4: tbb * Address comments #5 * Address comments #6 * Fixup test failures Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
Currently, we generate memory loads (from the literal buffer) when the literal node is visited and rely on loop invariant code motion pass to hoist these loads outside of the query loop. This is a heavy-handed use of LICM and might sometimes not be optimized. We should collect the constant expression with a visitor and generate the loads in the entry block.
The text was updated successfully, but these errors were encountered: