issues/2: adding hybrid solution for hoisting constants #145

pressenna · 2017-12-28T10:57:17Z

This is the initial hybrid solution.
It is far from perfect, but it hoisted constants correctly (in most cases, except for CASE) into the query functions .entry block.

valid constanstants which are then hoisted into the query functions .entry block.

shtilman

Search for hoisted variables, multiple flavors of codegenHoistedConstants, Scalar visitor that codegens as it goes.. Seems a bit overengineered. Could you clean it up a bit and maybe describe the steps you're going through?

The way I thought about this hoisting was quite straightforward:

visit and collect Analyzer::Constants
when building the .entry block - go through that list and codegen these constants - that will fill the .literals payload and generate appropriate loads in the entry block
maintain a map that for each Constant would establish its relationship to codegen-ed llvm value (load) and also corresponding row_func arg
then in row_func whenever you need to codegen a Constant you first look in that map: if it's in, you pick the row_func arg it's tied to.

And all this should of course be guarded by hoist_literals flag.

shtilman · 2018-01-09T13:09:17Z

QueryEngine/QueryTemplateGenerator.cpp

@@ -300,12 +302,26 @@ llvm::Function* query_template(llvm::Module* mod,
  Value* error_code = &*(++query_arg_it);
  error_code->setName("error_code");

+  // WARNING: this does not work with the LICM optimizer... the generated IR seems correct, but it SIGSEGVs during peephole optimisation
+  // BasicBlock* bb_hoisted = BasicBlock::Create(mod->getContext(), ".hoisted_variables", query_func_ptr, 0);


Again, a separate block is not needed here - not to mention that we wouldn't want to create it if there's nothing to hoist, and it's a good idea to let the block called .entry to remain the entry block.

shtilman · 2018-01-09T13:09:36Z

QueryEngine/QueryTemplateGenerator.cpp

  auto bb_entry = BasicBlock::Create(mod->getContext(), ".entry", query_func_ptr, 0);
+  BasicBlock* bb_hoisted = bb_entry;


Not needed.

shtilman · 2018-01-09T13:09:47Z

QueryEngine/QueryTemplateGenerator.cpp

  auto bb_preheader = BasicBlock::Create(mod->getContext(), ".loop.preheader", query_func_ptr, 0);
  auto bb_forbody = BasicBlock::Create(mod->getContext(), ".for.body", query_func_ptr, 0);
  auto bb_crit_edge = BasicBlock::Create(mod->getContext(), "._crit_edge", query_func_ptr, 0);
  auto bb_exit = BasicBlock::Create(mod->getContext(), ".exit", query_func_ptr, 0);

+  if (NULL != bb_hoisted) {


Check not needed, call generator on bb_entry

shtilman · 2018-01-09T13:17:22Z

QueryEngine/NativeCodegen.cpp

  const bool is_group_by{!query_mem_desc.group_col_widths.empty()};
  auto query_func =
      is_group_by ? query_group_by_template(cgen_state_->module_,
                                            is_nested_,
                                            co.hoist_literals_,
                                            query_mem_desc,
                                            co.device_type_,
-                                            ra_exe_unit.scan_limit)
+                                            ra_exe_unit.scan_limit, visitor_to_use)


Please clang-format your changes

shtilman · 2018-01-09T13:53:58Z

QueryEngine/ConstantIR.cpp

+llvm::Value* findHoistedValueInBlock(llvm::Function* function, std::string name) {
+  // WARNING: this does not work with the LICM optimizer... the generated IR seems correct, but it SIGSEGVs during peephole optimisation
+//  llvm::BasicBlock* hoistedVariablesBlock = find_BasicBlock_by_name(function, ".hoisted_variables");
+  llvm::BasicBlock* hoistedVariablesBlock = find_BasicBlock_by_name(function, ".entry");


As we discussed we wanted to generate literal loads in the entry block, no need in extra blocks.
Function's entry block should remain the first in BB list, you shouldn't search for it, just calling front() should be fine. Not sure why you need to search for hoisted values by name..

shtilman · 2018-01-10T07:31:17Z

QueryEngine/NativeCodegen.cpp

@@ -657,24 +658,27 @@ std::vector<llvm::Value*> generate_column_heads_load(const int num_columns,
                                                     llvm::Function* query_func,
                                                     llvm::LLVMContext& context) {
  auto max_col_local_id = num_columns - 1;
-  auto& fetch_bb = query_func->front();
+  llvm::BasicBlock * entryBlock = find_BasicBlock_by_name(query_func, ".entry");
+  llvm::BasicBlock& fetch_bb(*entryBlock);


Again, no need to look for .entry, it's the first block, front() should work just fine.

shtilman · 2018-01-10T07:32:41Z

QueryEngine/NativeCodegen.cpp

+  {
+    // WARNING: this does not work with the LICM optimizer... the generated IR seems correct, but it SIGSEGVs during peephole optimisation
+//    llvm::BasicBlock* hoistedVarsBlock = find_BasicBlock_by_name(query_func, ".hoisted_variables");
+    llvm::BasicBlock* hoistedVarsBlock = find_BasicBlock_by_name(query_func, ".entry");


query_func->front()

shtilman · 2018-01-10T07:38:40Z

QueryEngine/QueryTemplateGenerator.cpp

+
+    if (bb_hoisted != bb_entry) {
+      BranchInst::Create(bb_entry, bb_hoisted);
+    }


Branch not needed.

shtilman · 2018-01-10T07:40:06Z

QueryEngine/QueryTemplateGenerator.cpp

@@ -583,12 +600,26 @@ llvm::Function* query_group_by_template(llvm::Module* mod,
  Value* error_code = &*(++query_arg_it);
  error_code->setName("error_code");

+  // WARNING: this does not work with the LICM optimizer... the generated IR seems correct, but it SIGSEGVs during peephole optimisation
+  // BasicBlock* bb_hoisted = BasicBlock::Create(mod->getContext(), ".hoisted_variables", query_func_ptr, 0);


Same for this template, .entry is entry

shtilman · 2018-01-10T08:00:44Z

QueryEngine/ConstantIR.cpp

 std::vector<llvm::Value*> Executor::codegenHoistedConstants(const std::vector<const Analyzer::Constant*>& constants,
                                                            const EncodingType enc_type,
                                                            const int dict_id) {
  CHECK(!constants.empty());
  const auto& type_info = constants.front()->get_type_info();
-  auto lit_buff_lv = get_arg_by_name(cgen_state_->row_func_, "literals");
-  int16_t lit_off{-1};
+  int64_t lit_off{-1};


Could you describe why you decided to revamp this function? I see different versions.. What is the workflow?

codegenHoistedConstantsInPlace
Creates the literal load from the literals buffer in place, effectively the original codegenHoistedConstants enhanced to also set the names for IR nodes accordingly

codegenHoistedConstants
This is the original entry point, it has been changed to first check if the Constant was already hoisted (by checking presence in the function argument list). If not, it calls codegenHoistedConstantsInPlace to generate the load.

codegenHoistedConstantsInEntryBlock (just renamed it)
New entry point for the generation of the loads in the .entry block. It only creates the load, if the Constand had not been hoisted before.

pressenna · 2018-01-10T16:45:40Z

Hey @shtilman ,

Sorry for the poor documentation on the change... will try to avoid this in the future.

The basic idea was as follows:

Scan the RA Tree for Constants via the visitor
Generate loads for found and valid Constants into the .entry block
This is being done directly while scanning the RA Tree, because sometimes the parent RA node is required for correct type information for code generation. Doing this only on the Constants caused quite a few Tests to fail (especially where varchars were involved).
Use the .entry block's IR as an indirect form of mapping
The gen context already maintains a Constant -> offset mapping
and finally when generating row_func it looks up the the right argument by name and, if found, uses it, otherwise generates a new load from the literals buffer.

This only works because duplicate Constants are eliminated during code generation, so the offset into the literals buffer is identical for Constants equal by value, and thus the names can be used for looking them up.

The rational for using the .entry block directly as mapping was my concern with the IN operator.
It has a hard limit of 5 million entries, so was trying to avoid maintaining too many mappings for the same thing (as the de-duplication already does this in some form: Constant -> offset ).

The hoist_literals flag is still being considered. If it is not set, then no constants will be hoisted.
This is achieved by passing the relevant methods a NULL instead of an instance of ConstantHoistingVisitor.

Please advice how to proceed.

shtilman · 2018-01-13T00:46:08Z

Ok, let me dig a bit deeper into your approach.

pressenna · 2018-01-13T10:04:36Z

I had a revisit as well and I think your are right to introduce a mapping between Constans (or their offset) and the function arguments / .basic_block literals load instructions.

It turns out that I am currently doing a linear search (by name) for the function arguments and in the .entry block for hoisted literal loads. I was hoping to do a binary search, but sadly llvm::SymbolTableList is a linked list, which sort of diminishes the benefits of binary search.

Please bear with me until I make the amendments.

asuhan · 2018-01-13T21:07:40Z

Why would you even care about a binary search? We're talking maybe 100 constants at most.

pressenna · 2018-01-13T22:11:28Z

Maybe I am missing something, but looking at the RelAlgTranslator::translateInOper I had the impression that it can create quite a few Constants instances. There is a special path for integers and dict encoded strings, but for decimals and floats it would take the hit and create as many Constants as there are results in the result set. Let me see if I can create a test case for this scenario.

asuhan · 2018-01-14T00:00:37Z

Sure, but there is a lot of linear-time work to be done anyway during code generation, not sure what the impact would be. To be honest, I'm not entirely convinced we'd survive generating a kernel with enough constants for binary search to make a difference anyway. Also, in terms of use cases, the right hand side of an IN expression should be something which makes sense to enumerate (user names or ids, for example), which isn't really the case for floating point and decimals. So I'm not particularly bothered by the lack of binary search and would rather take simpler code.

pressenna · 2018-01-14T11:43:02Z

Yes, makes sense. In that case I am going to leave it as it is for now, so as not to create too much deltas for a review.

shtilman

Hi @psockali,

Sorry for the pause.

I'm not sure about this fused visitor/codegen that tries to cover everything, and these repeated searches by prefix. Feels a bit fragile and heavy. How can we simplify it?

What if we abandon the visitor idea altogether and just let codegen go through all of the incoming Constants as it did before - getting the old offsets or adding new offsets as before, but then instead of generating corresponding loads it would generate named offset-based value references, accumulating a list of (offset, type) tuples along the way. Then this list would be used to generate all needed literal loads in query_func's entry block and adjust the declaration of row_func, and the Call as well.

pressenna · 2018-01-22T16:00:24Z

Hey @shtilman,

this whole 'Spagat' only made sense with the assumption that there might be up to 5 million Constants (by using the IN operator). Now that assumption has been rendered void. It all feels like doing an initial pass to capture things and a final pass to generate the target IR seems most reasonable.
How do you feel about trashing this PR and instead refocusing on two/multi pass based IR generation, with constants hoisting as the initial benefactor of the multi-pass paradigm?

asuhan · 2018-01-22T17:59:44Z

In which situation we care about could the IN operator create over 5 million constants? For integer and strings we generate a bitmap instead. Can we throw an exception if the IN operator generates 5+ million constants and still work under the same assumption?

pressenna · 2018-01-22T18:29:34Z

@asuhan, I am not sure I am following you correctly.

The check is already there (https://github.com/mapd/mapd-core/blob/647a0efd7d931aa27105652796aa7416f19b1dc4/QueryEngine/RelAlgTranslator.cpp#L369), but I am not sure if g_enable_watchdog is true.

And I thought that decimal/floats are virtually not a use case.

Are you saying that a single pass with a retrospective mutation of the query_function's declaration (including a search and replace of the query_function's IR) outweighs the complexity it introduces?

I mean, if there are at most about 100 hoisted literals, I doubt it would make a big difference.

Either way, please advise if I should attempt the single pass approach...

This reverts commit 027e47e.

This reverts commit 4781182.

This reverts commit 3380b27.

…pfront for valid constanstants which are then hoisted into the query functions .entry block." This reverts commit 390cf4a.

shtilman · 2018-01-22T21:41:16Z

@psockali it seems to me that we need to tie constant hoisting to the actual [staged] codegen, which would solve the problem of on-the-fly constants and also subquery constants, since hoisting would be staged too. All relevant constants would be injected into each query's compilation, after all subqueries have been compiled and executed.

My proposal is to continue filling up literals_ buffer but defer literal load generation. Or, perhaps, redirect generation of literal loads to the query func's entry block, i.e. whenever a constant is codegened, query_func's entry block would get the load from literal buffer (if it hasn't yet been loaded) and row_func would get an appropriately named arg value. You know the offset and the type of each literal so you can then connect the dots in the IR before compiling it.

If number of constants gets too big you could throw an exception or you could stop hoisting.

asuhan · 2018-01-23T04:18:21Z

@psockali g_enable_watchdog is true by default, we only disable it for unit tests and various experiments. I'd rather reject the 5M+ case altogether (even with watchdog disabled) than make heroic efforts for a kernel which might even fail to compile or do so very slowly -- 5M+ instructions is a lot.

pressenna · 2018-01-23T09:04:52Z

But @asuhan, that is what I am saying. I misread the code. It actually does not create 5M load instructions, that is an upper limit for the elements in the bit-set. Either way, I see if I can get the modification of the function declaration / signature working.

asuhan · 2018-01-23T09:08:02Z

@psockali In cases where a bitmap cannot be used (floats for example) it would. There would be 5M+ Analyzer::Constant expressions generating the same number of constants to be loaded from the literal buffer when the IN expression is generated.

shtilman · 2018-01-24T19:29:59Z

@psockali modifying row_func signature may get messy. You may want to consider switching to global variables with internal linkage. For each new offset you could create a literal_<offset> global visible to the entire module (that is, if it hasn't been created yet - call getNamedGlobal(name) to check). If the module doesn't have it yet - 1. create it 2. then define it with a load from literals_ in query_func()'s entry block and 3. use it wherever you are in row_func(). If module already has it - skip steps 1. and 2.
Look at the resulting IR. Global optimizer should be able to handle these globals after row_func() is inlined, they should all be localized.
If that works well - no need to open Pandora's box messing with function signatures.

pressenna · 2018-01-28T04:59:22Z

@asuhan: Would it be OK if I disable the float/decimal use-case for the IN operator?
@shtilman: I was hoping to use the function copy helper to change the signature, but I shall try global variables first.

asuhan · 2018-01-28T05:24:23Z

@psockali No it wouldn't, but it'd be ok to reject it past a certain threshold (5M is as good of a value as any).

pressenna · 2018-01-28T06:49:07Z

Just noticed a cast from size_t to int16_t when loading values from the literals buffer...

https://github.com/mapd/mapd-core/blob/245f9ffe9e036f88171555c71e63d10827efd9b9/QueryEngine/ConstantIR.cpp#L110

I guess it really does not hold a lot of elements.

pressenna · 2018-01-28T07:40:54Z

@shtilman, i am not sure how to correctly initialize the GlobalVariable, shall I create a correct pointer type and use store/load? This then feels like copying the buffer into a .data segment and then loading the values from that .data segment for each use. Not sure if the optimizer will be able to detect this correctly.

pressenna · 2018-01-28T18:17:55Z

The cast actually limits the buffer to 32K, with the CHECK enforcing this, so there is then already a reasonable upper limit.

shtilman · 2018-01-29T12:14:27Z

@psockali yes, I think a simple store into a global var should work. You first load the value from the literal buffer at specific offset, then store into corresponding global which is named after that offset - this is in the entry block. Then you can use that literal_<offset> global whenever you need literals_[offset].

Make sure you create each global var in the query module, with internal linkage. After row_func is inlined, the access to each global you had created will be limited to a single non-recursive function. Then the GlobalOptimizer pass (which follows the inliner in the pipeline) should take all these internal globals and localize them, i.e. convert them into local allocas.

pressenna · 2018-01-29T18:15:40Z

@shtilman, I did as you said, but it complains at link time.

LLVM ERROR: Program used external function 'global_literal_0' which could not be resolved!

pressenna · 2018-01-29T18:22:09Z

I have also pushed the changes

shtilman · 2018-01-29T20:51:53Z

Have you looked at the IR that you end up generating?
Could you run explain on that failing query and post the result here?

pressenna · 2018-01-30T05:57:27Z

Yes, I did. The IR looks OK. The global variable is still present after the optimisation causing the linker to fails, because I am not passing in sufficient linkage information.

The query I used was:

select cast(-1.1 as int) as x from t2;

The generated bitcode is attached: bitcode.zip

This reverts commit 245a6e0.

literals

pressenna · 2018-01-30T22:35:55Z

@shtilman I have committed now a version that alters the row_funcs signature (or more precise, creates a new row_func_hoisted fuction with the correct signature). This passes all the tests and the optimized IR looks good as well, as far as I can tell.

shtilman · 2018-01-31T16:39:05Z

All right, was hoping globals would simplify the hoisting, provided globalopt worked well. Not sure why globalopt failed to localize them, there are a few straightforward conditions each global should meet, perhaps globalvar declarations needed a bit of massaging.. Don't have time to investigate, we'll do signature change.

shtilman · 2018-01-31T11:45:01Z

QueryEngine/NativeCodegen.cpp

@@ -1047,6 +1047,10 @@ Executor::CompilationResult Executor::compileWorkUnit(const std::vector<InputTab
  bind_pos_placeholders("group_buff_idx", false, query_func, cgen_state_->module_);
  bind_pos_placeholders("pos_step", false, query_func, cgen_state_->module_);

+  cgen_state_->query_func_ = query_func;
+  cgen_state_->query_func_entry_ir_builder_.SetInsertPoint(&query_func->front(),


You can use &query_func->getEntryBlock() instead

The create_row_func() method already adds some instructions, thus I am using this mean to ensure that literals resolution ends up as the first instructions.

Sure. What I meant was replacing query_func->front() with query_func->getEntryBlock().
You could point that irbuilder to the very beginning of the entry block like this:

cgen_state_->query_func_entry_ir_builder_.SetInsertPoint(&query_func->getEntryBlock().front());

changed it accordingly

shtilman · 2018-01-31T11:46:59Z

QueryEngine/ConstantIR.cpp

@@ -100,8 +100,10 @@ std::vector<llvm::Value*> Executor::codegenHoistedConstants(const std::vector<co
                                                            const EncodingType enc_type,
                                                            const int dict_id) {
  CHECK(!constants.empty());
+
+  int64_t initial_watermark = cgen_state_->literal_bytes_high_watermark(0);


Watermarks seem redundant.

shtilman · 2018-01-31T16:41:50Z

QueryEngine/ConstantIR.cpp

+  int64_t allocated_watermark = cgen_state_->literal_bytes_high_watermark(0);
+  std::string literal_name = "literal_" + std::to_string(lit_off);
+
+  if (allocated_watermark == initial_watermark) {


Again, watermarks don't seem to be needed. Could just check if defined_literals_ map contains something for this particular lit_off.

This was just some form of precaution.
If something was already present in the literals buffer, I expect it to be present in the defined_literals map. I change it to only check against the map.

shtilman · 2018-01-31T17:02:58Z

QueryEngine/NativeCodegen.cpp

+  if (co.hoist_literals_ && !cgen_state_->defined_literals_.empty()) {
+    // we have some literals...
+
+    // create a new row_function with the literals as argument


Need a better comment, something like

// row_func_ is using literals whose defs have been hoisted up to the query_func_, // extend row_func_ signature to include extra args to pass these literal values.

yes, will do

shtilman · 2018-01-31T17:09:43Z

QueryEngine/NativeCodegen.cpp

+
+    auto ft = llvm::FunctionType::get(get_int_type(32, cgen_state_->context_), row_process_arg_types, false);
+    auto row_func_hoisted = llvm::Function::Create(
+        ft, llvm::Function::ExternalLinkage, "row_func_hoisted", cgen_state_->row_func_->getParent());


Maybe give it a clearer name, e.g. "row_func_hoisted_literals" ?

yes, makes sense

issues/2: adding hybrid solution in which the RA is scanned upfront for

390cf4a

valid constanstants which are then hoisted into the query functions .entry block.

pressenna mentioned this pull request Dec 28, 2017

Improve hoisted literals code generation #2

Open

shtilman reviewed Jan 10, 2018

View reviewed changes

psockali added 3 commits January 10, 2018 13:44

- renaming var_hoisted_id to literal_id

3380b27

further refactoring

4781182

better naming of methods

027e47e

shtilman reviewed Jan 22, 2018

View reviewed changes

Pressenna Sockalingasamy added 4 commits January 22, 2018 21:35

Revert "better naming of methods"

adfec8c

This reverts commit 027e47e.

Revert "further refactoring "

4f93c94

This reverts commit 4781182.

Revert "- renaming var_hoisted_id to literal_id"

8664f1a

This reverts commit 3380b27.

Revert "issues/2: adding hybrid solution in which the RA is scanned u…

914adb5

…pfront for valid constanstants which are then hoisted into the query functions .entry block." This reverts commit 390cf4a.

issues/2: using global variables as intermediaries

245a6e0

Pressenna Sockalingasamy added 2 commits January 30, 2018 19:05

Revert "issues/2: using global variables as intermediaries"

2a8eae8

This reverts commit 245a6e0.

issues/2: altering function signature in order to pass in the hoisted

41097b5

literals

shtilman suggested changes Jan 31, 2018

View reviewed changes

Pressenna Sockalingasamy added 2 commits January 31, 2018 20:26

adjustments after code review

bd62c6c

using simpler method

d920459

pressenna closed this Apr 1, 2018

		auto bb_entry = BasicBlock::Create(mod->getContext(), ".entry", query_func_ptr, 0);
		BasicBlock* bb_hoisted = bb_entry;

issues/2: adding hybrid solution for hoisting constants #145

issues/2: adding hybrid solution for hoisting constants #145

Conversation

pressenna commented Dec 28, 2017

shtilman left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pressenna commented Jan 10, 2018

shtilman commented Jan 13, 2018

pressenna commented Jan 13, 2018

asuhan commented Jan 13, 2018

pressenna commented Jan 13, 2018

asuhan commented Jan 14, 2018

pressenna commented Jan 14, 2018

shtilman left a comment • edited Loading

Choose a reason for hiding this comment

pressenna commented Jan 22, 2018

asuhan commented Jan 22, 2018

pressenna commented Jan 22, 2018

shtilman commented Jan 22, 2018

asuhan commented Jan 23, 2018

pressenna commented Jan 23, 2018

asuhan commented Jan 23, 2018

shtilman commented Jan 24, 2018 • edited Loading

pressenna commented Jan 28, 2018

asuhan commented Jan 28, 2018

pressenna commented Jan 28, 2018

pressenna commented Jan 28, 2018

pressenna commented Jan 28, 2018

shtilman commented Jan 29, 2018

pressenna commented Jan 29, 2018

pressenna commented Jan 29, 2018

shtilman commented Jan 29, 2018

pressenna commented Jan 30, 2018

pressenna commented Jan 30, 2018

shtilman commented Jan 31, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shtilman left a comment •

edited

Loading

shtilman left a comment •

edited

Loading

shtilman commented Jan 24, 2018 •

edited

Loading