Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve hoisted literals code generation #2

Open
asuhan opened this issue May 8, 2017 · 10 comments
Open

Improve hoisted literals code generation #2

asuhan opened this issue May 8, 2017 · 10 comments

Comments

@asuhan
Copy link
Contributor

asuhan commented May 8, 2017

Currently, we generate memory loads (from the literal buffer) when the literal node is visited and rely on loop invariant code motion pass to hoist these loads outside of the query loop. This is a heavy-handed use of LICM and might sometimes not be optimized. We should collect the constant expression with a visitor and generate the loads in the entry block.

@pressenna
Copy link
Contributor

Hi, can I tackle this one?

@billmaimone
Copy link
Contributor

Yes, good one for you.

@asuhan
Copy link
Contributor Author

asuhan commented Dec 18, 2017

@psockali You can use ScalarExprVisitor to build a constant-collecting visitor. There's a pitfall though: we don't know all constants upfront when sub-queries are involved. Have a look at RelAlgTranslator::translateInOper. We can use a hybrid approach: collect and generate all the other constants upfront and leave the current handling in place (and keep LICM). Alternatively, we can reorganize the compiler to do a full pass first without emitting any IR, which we know we'll eventually have to do for other reasons as well. Going for the first approach won't hurt our ability to achieve the second and the code should be fully reusable once we have the full solution in place, so I recommend going for it as an incremental, nearly-complete solution.

@pressenna
Copy link
Contributor

Hi @asuhan , I will do so.
Am I correct in assuming that only runtime constants (within the scope of a block) shall be hoisted, but literals can remain as they are?

@pressenna
Copy link
Contributor

Hey @asuhan

I think I need to be much more clearer when addressing things in mapd. (e.g. literal in LLVM IR, literal in REX, literal in Analyzer::Exp, etc...).

Finally managed to look at this in more detail and it turns out that Analyzer::Constant literals are hoisted in Executor::codegenHoistedConstants into a memory buffer. And each unoptimized access to the literal requires a memory read from that buffer. As you have pointed out RelAlgTranslator::translateInOper is creating Analyzer::Constant dynamically, so scanning early on, right after translating from REX, is not very promising.

Currently I would propose to scan for Analyzer::Constant via an ScalarExprVisitor before the actual LLVM IR generation (somewhere in Executor::compileWorkUnit) and also extend the Constant class to hold a few more attributes required for this extra step.

I would then generate the memory load for each literal in @query_group_by_template and pass them as arguments to the always inlined @row_func , which also needs to be extended to take these new parameters.

I would also make this, in effect, a configurable behaviour and would probably introduce a new compiler option flag.

I hope to start tomorrow on it, if there are no objections to the above approach.

@shtilman
Copy link
Contributor

Hi @psockali, your approach sounds reasonable. Add a constant scanning visitor, generate corresponding literal loads in the first block of the query func and feed these values to the always_inlined @row_func. Not sure if any Constant extensions are necessary but let's see if your motivation can be justified. No need to make this behavior configurable - we're just putting some literal loads where they should have been generated from the get go, instead of relying on LICM to move them there.

You mentioned @query_group_by_template, note that it's just one of the templates that the query func can be built from, the other one is @query_template.

@pressenna
Copy link
Contributor

Hey @shtilman, as usual: theory does not reflect reality.

It turns out that during LLVM IR generation Analyzer::Constant nodes are created on the fly and sometimes the original Constant node is translated.
See Executor::codegenRowId, Executor::codegenCmpDecimalConst and Executor::codegenOuterJoinNullPlaceholder.

This invalidates the proposed approach, because scanning the source Analyzer::Exp tree will not be reliable nor sufficient.

I have to rethink .. please bear with me.

@pressenna
Copy link
Contributor

Hey @shtilman,

@asuhan mentioned two approaches:

  • Hybrid:
    Currently this is the approach I am going for.
  • Full Pass without IR generation:
    I am not confident enough at this stage to make such a central change to the compiler as I am not sure what state(s) the compiler currently maintains and how I would reset it correctly.

I think there is an alternative way, but it would require to clone the row_function (in order to correct the function type with the hoisted variable).

Either way, let me first try the hybrid approach, it probably will be complicated enough.

@pressenna
Copy link
Contributor

Hi @shtilman / @asuhan
I committed an initial version of the hybrid solution. the PR is #145 .
Please let me know if this approach is OK for now as I fear that it does not capture all the cases.

@shtilman
Copy link
Contributor

shtilman commented Jan 22, 2018

Comments and discussion are in #145 PR.

alexbaden pushed a commit that referenced this issue Jan 2, 2020
…ay (#2)

Correctly merge separate arrow arrays offets to one chunk offset array
andrewseidl pushed a commit that referenced this issue Nov 9, 2021
* Register local / global hint in Calcite

* Support g_ prefix for global query hint name

* Translate global hint in analyzer

* Add tests

* Apply comments #1: global hint registration

* Apply comments #2: global hint flag identification

* Apply comments #3: global hint translation

* Apply comments #4: remove unnecessary virtual keyword

* Fixup a bug on allow_gpu_hashtable build hint for overlaps join

* Fixup a bug related to a query having multiple identical subqueries

* Add global hint tests related to overlaps join hashtable

* Apply comments #5: misc cleanup
andrewseidl pushed a commit that referenced this issue Apr 16, 2022
…tions

* Pass work_unit to deliver the body node to extract hash table cache key

* Separate partition sorting and actual computing

* Pass cache_key to window context object

* Disallow hash table cache access for invalid join type

* Support hash table cache for window context

* Support sorted partitoon cache for window context

* Address comment #1: advocate std::memcpy for copying sorted partition

* Address comment #2: move common logic used in both perfect/baseline join hash table to HashJoin.h

* Address comment #3: improve payload_copy logic

Co-authored-by: yoonminnam <yoon-min.nam@heavy.ai>
andrewseidl pushed a commit that referenced this issue Jul 8, 2022
* Add necessary method to propagate update query info in the projection node

* Enhance related logic to properly handle update query with window function expression

* Add tests

* Enable window func update query on non-fragmented table in CtasUpdateTest

* Misc cleanup

* Disallow the update query having window func in dist mode

* Address comment #1: change test queries to get deterministic query results

* Address comment #2: revert const qualifier

Co-authored-by: yoonminnam <yoon-min.nam@heavy.ai>
andrewseidl pushed a commit that referenced this issue Jul 11, 2022
* Support ms / us / ns time units for INTERVAL keyword

* Add few constants related to small time units and a new ExtractField to represent invalid status

* Trasnlate window frame bound expression for range mode with date / time / timestamp types

* Fixup null value generator for date / time / timestamp types for codegen

* Implement fixed length date value encoder

* Implement computation logic of search range of aggregate tree for timeinterval type ordering col

* Cast timestamp col var iff it is encoded type

* Support codegen

* Fix issue #1 : compute size of aggregate tree properly

* Fix issue #2 : finding null range from sorted column

* Fix issue #3 : handling edge case while computing window aggregation over the frame

* Add tests

* Disable range mode with date type ordering column in dist mode

* Address comments v1
jack-mapd pushed a commit that referenced this issue Jan 19, 2023
… considering `max_gpu_slab_size` (#7133)

* Cleaup perfect join hash table init

* Cleanup baseline join hashtable init

* Add rowid size getter

* Introduce hash table entry info class in perfect join hash table

* Introduce hash table entry info class in baseline join hash table

* Introduce hash table entry info class in overlaps and range join hash tables

* Fixup init log related to hash table layout

* Fixup hash table init on an empty table

* Control the maximum hash table by # hash entries for CPU and its size for GPU

* Address comment #1: revert partial_err checking logic in baseline join hash table builder

* Address comment #2

Signed-off-by: jack <jack@omnisci.com>
jack-mapd pushed a commit that referenced this issue Jan 23, 2023
… considering `max_gpu_slab_size` (#7133)

* Cleaup perfect join hash table init

* Cleanup baseline join hashtable init

* Add rowid size getter

* Introduce hash table entry info class in perfect join hash table

* Introduce hash table entry info class in baseline join hash table

* Introduce hash table entry info class in overlaps and range join hash tables

* Fixup init log related to hash table layout

* Fixup hash table init on an empty table

* Control the maximum hash table by # hash entries for CPU and its size for GPU

* Address comment #1: revert partial_err checking logic in baseline join hash table builder

* Address comment #2

Signed-off-by: jack <jack@omnisci.com>
jack-mapd pushed a commit that referenced this issue Jan 23, 2023
… considering `max_gpu_slab_size` (#7133)

* Cleaup perfect join hash table init

* Cleanup baseline join hashtable init

* Add rowid size getter

* Introduce hash table entry info class in perfect join hash table

* Introduce hash table entry info class in baseline join hash table

* Introduce hash table entry info class in overlaps and range join hash tables

* Fixup init log related to hash table layout

* Fixup hash table init on an empty table

* Control the maximum hash table by # hash entries for CPU and its size for GPU

* Address comment #1: revert partial_err checking logic in baseline join hash table builder

* Address comment #2

Signed-off-by: jack <jack@omnisci.com>
misiugodfrey pushed a commit that referenced this issue Jun 27, 2023
…ion (#7185)

* Add runtime function to create null StringView

* Support projection of none encoded string column after left join

* Address comment #1: create null StringView obj

* Address comment #2: avoid compilation error

* Address comments #3: improve codegen logic

Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
misiugodfrey pushed a commit that referenced this issue Jul 12, 2023
…ion (#7185)

* Add runtime function to create null StringView

* Support projection of none encoded string column after left join

* Address comment #1: create null StringView obj

* Address comment #2: avoid compilation error

* Address comments #3: improve codegen logic

Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
misiugodfrey pushed a commit that referenced this issue Aug 28, 2023
* Renaming OverlapsJoinHashTable class to BoundingBoxIntersectJoinHashTable

* Rename global flag g_enable_overlaps_hashjoin to g_enable_bbox_intersect_hashjoin

* Rename overlaps hash join tuner

* Rename kOVERLAPS SQL_OP

* Fixup serialization type toString

* Rename convertOverlaps

* Rename cache types

* Rename is_overlaps_oper() function

* Rename query rewriter logic

* Change server configurations v2

* Change namings in query rewriter

* Completely remove overlaps keyword used in codebase

* Address comments #1

* Address comment #2

Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
misiugodfrey pushed a commit that referenced this issue Sep 1, 2023
* Renaming OverlapsJoinHashTable class to BoundingBoxIntersectJoinHashTable

* Rename global flag g_enable_overlaps_hashjoin to g_enable_bbox_intersect_hashjoin

* Rename overlaps hash join tuner

* Rename kOVERLAPS SQL_OP

* Fixup serialization type toString

* Rename convertOverlaps

* Rename cache types

* Rename is_overlaps_oper() function

* Rename query rewriter logic

* Change server configurations v2

* Change namings in query rewriter

* Completely remove overlaps keyword used in codebase

* Address comments #1

* Address comment #2

Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
misiugodfrey pushed a commit that referenced this issue Sep 26, 2023
* Fixup the function's logic

* Address comments

* Address comments #2

Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
misiugodfrey pushed a commit that referenced this issue Sep 26, 2023
* Determine the size of projection buffer based on filtered per-device cardinality when applicable

* Address comments

* Add logging about mem alloc

* Add loggig about cardinality estimation

* Address comments #1

* Address comments #2

* Disable dist test; cannot access executor instance from QR

* Add test case for sharded table (in both single and dist)

Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
misiugodfrey pushed a commit that referenced this issue Sep 26, 2023
* Rename device bitmap ptr variable and its getter

* Rename count distinct bitmap host mem ptr getter

* Rename count_distinct device mem ptr variable

* Improve buf allocation logic for count distinct

* Cleanup code & improve logic for mode and tdigest

* Cleanup logic v2

* Address comments

* Address comments #2

Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
misiugodfrey pushed a commit that referenced this issue Sep 28, 2023
* Fixup the function's logic

* Address comments

* Address comments #2

Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
misiugodfrey pushed a commit that referenced this issue Sep 28, 2023
* Determine the size of projection buffer based on filtered per-device cardinality when applicable

* Address comments

* Add logging about mem alloc

* Add loggig about cardinality estimation

* Address comments #1

* Address comments #2

* Disable dist test; cannot access executor instance from QR

* Add test case for sharded table (in both single and dist)

Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
misiugodfrey pushed a commit that referenced this issue Sep 28, 2023
* Rename device bitmap ptr variable and its getter

* Rename count distinct bitmap host mem ptr getter

* Rename count_distinct device mem ptr variable

* Improve buf allocation logic for count distinct

* Cleanup code & improve logic for mode and tdigest

* Cleanup logic v2

* Address comments

* Address comments #2

Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
misiugodfrey pushed a commit that referenced this issue Oct 12, 2023
* Fixup the function's logic

* Address comments

* Address comments #2

Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
misiugodfrey pushed a commit that referenced this issue Oct 12, 2023
* Determine the size of projection buffer based on filtered per-device cardinality when applicable

* Address comments

* Add logging about mem alloc

* Add loggig about cardinality estimation

* Address comments #1

* Address comments #2

* Disable dist test; cannot access executor instance from QR

* Add test case for sharded table (in both single and dist)

Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
misiugodfrey pushed a commit that referenced this issue Aug 27, 2024
* Add buffer holders for GPU execution

* Rename structures used to codegen

* Introduce WindowFunctionCtx namespace

* Add preparation for GPU execution in window ctx

* Cleanup & improve WindowFunctionContext::compute()

* Improve a logic to build aggregate tree w/ supporting reusing

* Improve segment tree constructor

* Rebase

* Address comments #1

* Address comments #2: refactor bool param functions

* Address comments #3

* Address comments #4: tbb

* Address comments #5

* Address comments #6

* Fixup test failures

Signed-off-by: Misiu Godfrey <misiu.godfrey@kraken.mapd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants