-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issues/2: adding hybrid solution for hoisting constants #145
issues/2: adding hybrid solution for hoisting constants #145
Conversation
valid constanstants which are then hoisted into the query functions .entry block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Search for hoisted variables, multiple flavors of codegenHoistedConstants, Scalar visitor that codegens as it goes.. Seems a bit overengineered. Could you clean it up a bit and maybe describe the steps you're going through?
The way I thought about this hoisting was quite straightforward:
- visit and collect Analyzer::Constants
- when building the .entry block - go through that list and codegen these constants - that will fill the .literals payload and generate appropriate loads in the entry block
- maintain a map that for each Constant would establish its relationship to codegen-ed llvm value (load) and also corresponding row_func arg
- then in row_func whenever you need to codegen a Constant you first look in that map: if it's in, you pick the row_func arg it's tied to.
And all this should of course be guarded by hoist_literals flag.
@@ -300,12 +302,26 @@ llvm::Function* query_template(llvm::Module* mod, | |||
Value* error_code = &*(++query_arg_it); | |||
error_code->setName("error_code"); | |||
|
|||
// WARNING: this does not work with the LICM optimizer... the generated IR seems correct, but it SIGSEGVs during peephole optimisation | |||
// BasicBlock* bb_hoisted = BasicBlock::Create(mod->getContext(), ".hoisted_variables", query_func_ptr, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, a separate block is not needed here - not to mention that we wouldn't want to create it if there's nothing to hoist, and it's a good idea to let the block called .entry
to remain the entry block.
auto bb_entry = BasicBlock::Create(mod->getContext(), ".entry", query_func_ptr, 0); | ||
BasicBlock* bb_hoisted = bb_entry; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed.
auto bb_preheader = BasicBlock::Create(mod->getContext(), ".loop.preheader", query_func_ptr, 0); | ||
auto bb_forbody = BasicBlock::Create(mod->getContext(), ".for.body", query_func_ptr, 0); | ||
auto bb_crit_edge = BasicBlock::Create(mod->getContext(), "._crit_edge", query_func_ptr, 0); | ||
auto bb_exit = BasicBlock::Create(mod->getContext(), ".exit", query_func_ptr, 0); | ||
|
||
if (NULL != bb_hoisted) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check not needed, call generator on bb_entry
QueryEngine/NativeCodegen.cpp
Outdated
const bool is_group_by{!query_mem_desc.group_col_widths.empty()}; | ||
auto query_func = | ||
is_group_by ? query_group_by_template(cgen_state_->module_, | ||
is_nested_, | ||
co.hoist_literals_, | ||
query_mem_desc, | ||
co.device_type_, | ||
ra_exe_unit.scan_limit) | ||
ra_exe_unit.scan_limit, visitor_to_use) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clang-format your changes
QueryEngine/ConstantIR.cpp
Outdated
llvm::Value* findHoistedValueInBlock(llvm::Function* function, std::string name) { | ||
// WARNING: this does not work with the LICM optimizer... the generated IR seems correct, but it SIGSEGVs during peephole optimisation | ||
// llvm::BasicBlock* hoistedVariablesBlock = find_BasicBlock_by_name(function, ".hoisted_variables"); | ||
llvm::BasicBlock* hoistedVariablesBlock = find_BasicBlock_by_name(function, ".entry"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed we wanted to generate literal loads in the entry block, no need in extra blocks.
Function's entry block should remain the first in BB list, you shouldn't search for it, just calling front()
should be fine. Not sure why you need to search for hoisted values by name..
QueryEngine/NativeCodegen.cpp
Outdated
@@ -657,24 +658,27 @@ std::vector<llvm::Value*> generate_column_heads_load(const int num_columns, | |||
llvm::Function* query_func, | |||
llvm::LLVMContext& context) { | |||
auto max_col_local_id = num_columns - 1; | |||
auto& fetch_bb = query_func->front(); | |||
llvm::BasicBlock * entryBlock = find_BasicBlock_by_name(query_func, ".entry"); | |||
llvm::BasicBlock& fetch_bb(*entryBlock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, no need to look for .entry, it's the first block, front()
should work just fine.
QueryEngine/NativeCodegen.cpp
Outdated
{ | ||
// WARNING: this does not work with the LICM optimizer... the generated IR seems correct, but it SIGSEGVs during peephole optimisation | ||
// llvm::BasicBlock* hoistedVarsBlock = find_BasicBlock_by_name(query_func, ".hoisted_variables"); | ||
llvm::BasicBlock* hoistedVarsBlock = find_BasicBlock_by_name(query_func, ".entry"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query_func->front()
|
||
if (bb_hoisted != bb_entry) { | ||
BranchInst::Create(bb_entry, bb_hoisted); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Branch not needed.
@@ -583,12 +600,26 @@ llvm::Function* query_group_by_template(llvm::Module* mod, | |||
Value* error_code = &*(++query_arg_it); | |||
error_code->setName("error_code"); | |||
|
|||
// WARNING: this does not work with the LICM optimizer... the generated IR seems correct, but it SIGSEGVs during peephole optimisation | |||
// BasicBlock* bb_hoisted = BasicBlock::Create(mod->getContext(), ".hoisted_variables", query_func_ptr, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for this template, .entry is entry
QueryEngine/ConstantIR.cpp
Outdated
std::vector<llvm::Value*> Executor::codegenHoistedConstants(const std::vector<const Analyzer::Constant*>& constants, | ||
const EncodingType enc_type, | ||
const int dict_id) { | ||
CHECK(!constants.empty()); | ||
const auto& type_info = constants.front()->get_type_info(); | ||
auto lit_buff_lv = get_arg_by_name(cgen_state_->row_func_, "literals"); | ||
int16_t lit_off{-1}; | ||
int64_t lit_off{-1}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you describe why you decided to revamp this function? I see different versions.. What is the workflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- codegenHoistedConstantsInPlace
Creates the literal load from the literals buffer in place, effectively the original codegenHoistedConstants enhanced to also set the names for IR nodes accordingly - codegenHoistedConstants
This is the original entry point, it has been changed to first check if the Constant was already hoisted (by checking presence in the function argument list). If not, it calls codegenHoistedConstantsInPlace to generate the load. - codegenHoistedConstantsInEntryBlock (just renamed it)
New entry point for the generation of the loads in the .entry block. It only creates the load, if the Constand had not been hoisted before.
Hey @shtilman , Sorry for the poor documentation on the change... will try to avoid this in the future. The basic idea was as follows:
This only works because duplicate Constants are eliminated during code generation, so the offset into the literals buffer is identical for Constants equal by value, and thus the names can be used for looking them up. The rational for using the .entry block directly as mapping was my concern with the IN operator. The hoist_literals flag is still being considered. If it is not set, then no constants will be hoisted. Please advice how to proceed. |
Ok, let me dig a bit deeper into your approach. |
I had a revisit as well and I think your are right to introduce a mapping between Constans (or their offset) and the function arguments / .basic_block literals load instructions. It turns out that I am currently doing a linear search (by name) for the function arguments and in the .entry block for hoisted literal loads. I was hoping to do a binary search, but sadly llvm::SymbolTableList is a linked list, which sort of diminishes the benefits of binary search. Please bear with me until I make the amendments. |
Why would you even care about a binary search? We're talking maybe 100 constants at most. |
Maybe I am missing something, but looking at the |
Sure, but there is a lot of linear-time work to be done anyway during code generation, not sure what the impact would be. To be honest, I'm not entirely convinced we'd survive generating a kernel with enough constants for binary search to make a difference anyway. Also, in terms of use cases, the right hand side of an |
Yes, makes sense. In that case I am going to leave it as it is for now, so as not to create too much deltas for a review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @psockali,
Sorry for the pause.
I'm not sure about this fused visitor/codegen that tries to cover everything, and these repeated searches by prefix. Feels a bit fragile and heavy. How can we simplify it?
What if we abandon the visitor idea altogether and just let codegen go through all of the incoming Constants as it did before - getting the old offsets or adding new offsets as before, but then instead of generating corresponding loads it would generate named offset-based value references, accumulating a list of (offset, type) tuples along the way. Then this list would be used to generate all needed literal loads in query_func
's entry block and adjust the declaration of row_func
, and the Call as well.
Hey @shtilman, this whole 'Spagat' only made sense with the assumption that there might be up to 5 million Constants (by using the |
In which situation we care about could the |
@asuhan, I am not sure I am following you correctly. The check is already there (https://github.com/mapd/mapd-core/blob/647a0efd7d931aa27105652796aa7416f19b1dc4/QueryEngine/RelAlgTranslator.cpp#L369), but I am not sure if g_enable_watchdog is true. And I thought that decimal/floats are virtually not a use case. Are you saying that a single pass with a retrospective mutation of the query_function's declaration (including a search and replace of the query_function's IR) outweighs the complexity it introduces? I mean, if there are at most about 100 hoisted literals, I doubt it would make a big difference. Either way, please advise if I should attempt the single pass approach... |
@psockali it seems to me that we need to tie constant hoisting to the actual [staged] codegen, which would solve the problem of on-the-fly constants and also subquery constants, since hoisting would be staged too. All relevant constants would be injected into each query's compilation, after all subqueries have been compiled and executed. My proposal is to continue filling up If number of constants gets too big you could throw an exception or you could stop hoisting. |
@psockali |
But @asuhan, that is what I am saying. I misread the code. It actually does not create 5M load instructions, that is an upper limit for the elements in the bit-set. Either way, I see if I can get the modification of the function declaration / signature working. |
@psockali In cases where a bitmap cannot be used (floats for example) it would. There would be 5M+ |
@psockali modifying row_func signature may get messy. You may want to consider switching to global variables with internal linkage. For each new offset you could create a |
@psockali No it wouldn't, but it'd be ok to reject it past a certain threshold (5M is as good of a value as any). |
Just noticed a cast from I guess it really does not hold a lot of elements. |
@shtilman, i am not sure how to correctly initialize the |
The cast actually limits the buffer to 32K, with the CHECK enforcing this, so there is then already a reasonable upper limit. |
@psockali yes, I think a simple store into a global var should work. You first load the value from the literal buffer at specific offset, then store into corresponding global which is named after that offset - this is in the entry block. Then you can use that Make sure you create each global var in the query module, with internal linkage. After |
@shtilman, I did as you said, but it complains at link time.
|
I have also pushed the changes |
Have you looked at the IR that you end up generating? |
Yes, I did. The IR looks OK. The global variable is still present after the optimisation causing the linker to fails, because I am not passing in sufficient linkage information. The query I used was:
The generated bitcode is attached: bitcode.zip |
@shtilman I have committed now a version that alters the row_funcs signature (or more precise, creates a new row_func_hoisted fuction with the correct signature). This passes all the tests and the optimized IR looks good as well, as far as I can tell. |
All right, was hoping globals would simplify the hoisting, provided globalopt worked well. Not sure why globalopt failed to localize them, there are a few straightforward conditions each global should meet, perhaps globalvar declarations needed a bit of massaging.. Don't have time to investigate, we'll do signature change. |
QueryEngine/NativeCodegen.cpp
Outdated
@@ -1047,6 +1047,10 @@ Executor::CompilationResult Executor::compileWorkUnit(const std::vector<InputTab | |||
bind_pos_placeholders("group_buff_idx", false, query_func, cgen_state_->module_); | |||
bind_pos_placeholders("pos_step", false, query_func, cgen_state_->module_); | |||
|
|||
cgen_state_->query_func_ = query_func; | |||
cgen_state_->query_func_entry_ir_builder_.SetInsertPoint(&query_func->front(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use &query_func->getEntryBlock()
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The create_row_func() method already adds some instructions, thus I am using this mean to ensure that literals resolution ends up as the first instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. What I meant was replacing query_func->front()
with query_func->getEntryBlock()
.
You could point that irbuilder to the very beginning of the entry block like this:
cgen_state_->query_func_entry_ir_builder_.SetInsertPoint(&query_func->getEntryBlock().front());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it accordingly
QueryEngine/ConstantIR.cpp
Outdated
@@ -100,8 +100,10 @@ std::vector<llvm::Value*> Executor::codegenHoistedConstants(const std::vector<co | |||
const EncodingType enc_type, | |||
const int dict_id) { | |||
CHECK(!constants.empty()); | |||
|
|||
int64_t initial_watermark = cgen_state_->literal_bytes_high_watermark(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Watermarks seem redundant.
QueryEngine/ConstantIR.cpp
Outdated
int64_t allocated_watermark = cgen_state_->literal_bytes_high_watermark(0); | ||
std::string literal_name = "literal_" + std::to_string(lit_off); | ||
|
||
if (allocated_watermark == initial_watermark) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, watermarks don't seem to be needed. Could just check if defined_literals_
map contains something for this particular lit_off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just some form of precaution.
If something was already present in the literals buffer, I expect it to be present in the defined_literals map. I change it to only check against the map.
QueryEngine/NativeCodegen.cpp
Outdated
if (co.hoist_literals_ && !cgen_state_->defined_literals_.empty()) { | ||
// we have some literals... | ||
|
||
// create a new row_function with the literals as argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a better comment, something like
// row_func_ is using literals whose defs have been hoisted up to the query_func_,
// extend row_func_ signature to include extra args to pass these literal values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, will do
QueryEngine/NativeCodegen.cpp
Outdated
|
||
auto ft = llvm::FunctionType::get(get_int_type(32, cgen_state_->context_), row_process_arg_types, false); | ||
auto row_func_hoisted = llvm::Function::Create( | ||
ft, llvm::Function::ExternalLinkage, "row_func_hoisted", cgen_state_->row_func_->getParent()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe give it a clearer name, e.g. "row_func_hoisted_literals"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, makes sense
This is the initial hybrid solution.
It is far from perfect, but it hoisted constants correctly (in most cases, except for CASE) into the query functions .entry block.