-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CIR][Transforms] Move MergeCleanupsPass logic into Op::fold method #690
Conversation
This patch adds CIRGen for downcasting a pointer to the complete object through `dynamic_cast<void *>`. Together with llvm#426 , the full functionality of `dynamic_cast` should be supported in CIRGen after this PR merges.
This PR adds support for constant arrays with trailing zeros. The original `CodeGen` does the following: once a constant array contain trailing zeros, a struct with two members is generated: initialized elements and `zeroinitializer` for the remaining part. And depending on some conditions, `memset` or `memcpy` are emitted. In the latter case a global const array is created. Well, we may go this way, but it requires us to implement [features](https://github.com/llvm/clangir/blob/main/clang/lib/CIR/CodeGen/CIRGenDecl.cpp#L182) that are not implemented yet. Another option is to add one more parameter to the `constArrayAttr` and utilize it during the lowering. So far I chose this way, but if you have any doubts, we can discuss here. So we just emit constant array as usually and once there are trailing zeros, lower this arrray (i.e. an attribute) as a value. I added a couple of tests and will add more, once we agree on the approach. So far I marked the PR as a draft one.
…nters (llvm#400) This also adds a missing check whether the pointer returned from `memchr` is null and changes the result to `last` in that case.
Emit the false-branch of the consteval if statement, if any.
Originally, the location associated with a function is checked to be an `mlir::FileLineColLoc` before the function is lowered to an LLVMIR FuncOp. However, runtime function declarations do not have such locations. This patch further allows `mlir::UnknownLoc` to be associated with a function.
This PR adds a support for const structs with bitfields. Now only global structs are supported, the support of the local ones can be added more or less easily - there is one ugly thing need to be done though) So .. what is all about. First of all - as usually, I'm sorry for the big PR. But it's hard to break it down to peaces. The good news is that in the same time it's a copy-pasta from the original codegen, no surprises here. Basically, the most hard place to read is `ConstantAggregateBuilder::addBits` copied with minimum of changes. The main problem - and frankly speaking I have no idea why it's done this way in the original codegen - is that the data layout is different for such structures, I mean literally another type is used. For instance, the code: ``` struct T { int X : 15; int Y : 6; unsigned Z : 9; int W; }; struct T GV = { 1, 5, 256, -1}; ``` is represented in LLVM IR (with no CIR enabled) as: ``` %struct.T = type { i32, i32 } %struct.Inner = type { i8, i32 } @gv = dso_local global { i8, i8, i8, i8, i32 } ... ``` i.e. the global var `GV` is looks like a struct of single bytes (up to the last field, which is not a btfield). And my guess is that we want to have the same behavior in CIR. So we do. The main problem is that we have to treat the same data differently - and this is why one additional `bitcast` is needed when we create a global var. Actually, there was a comment there - and I really wonder where it came from. But anyways, I don't really like this and don't see any good workaround here. Well, maybe we may add a kind of map in order to store the correspondence between types and do a bitcast more wisely. The same is true for the const structs with bitfields defined locally.
The minimal bug repro: ``` #include <stdbool.h> #include <stdint.h> void bar() { bool x = true; uint8_t y = (uint8_t)x; } ``` Fails on verification stage: ``` loc("repro.c":5:24): error: integer width of the output type is smaller or equal to the integer width of the input type fatal error: error in backend: The pass manager failed to lower CIR to LLVMIR dialect! ``` The problem is that in some cases lowering from CIR emits the invalid zext operation. PR fixes this issue by emitting the llvm.bitcast instead of llvm.zext in such cases.
The change is taken from original codegen.
The issue is that the CIR codegen assumes that function pointer is always result of cir.load op. But it isn't true because the funcion pointer may be result of other operations (f.e cir.call).
This PR adds support for global compound literals. The implementation is almost the same as in original codegen. But the original codegen can reuse the value of emitted compound literal global variable in case then the init expression of new variable and this variable are the same. It's easy to implement this feature. But I can't find any test-case then this feature will be applied. So I decided to ignore this optimization opportunity to avoid mistakes.
Here is the next step in VLA support. Basically, these changes handle different expressions, like `int (*a[5])[n]` or `sizeof(a[n])`. I took tests from the original `codegen` - they don't check anything, just verify we don't fail. There is still an issue with a proper cleanup - there are cases when `stack_save` doesn't dominate a corresponded `stack_restore`. For example in the next example: ``` void test(unsigned x) { while (1) { char a[x]; if (x > 5) break; ++x; } } ``` Look like `break` here doesn't lead to `stack_restore`. But I would say this is less related to VLA, though probably I need to fix this as well.
This PR fixes the issue connected with folding a simple boolean expresion pattern (f.e. `0 && RHS = 0`). The problem is that the scalar expression emitter always creates a `cir.bool` attribute as a result of expression. But in some cases the result expression should be a `cir.int` attr.
… start/end functionality
The change is taken from the original llvm codegen.
Originally those are only used for debug info generation, so get a bit more specific on what's missing here.
When replacing the no-proto functions with it's real definition, codegen assumes that only `cir.call` operation may use the replaced function. Such behaviour leads to compilation error because of the `cir.get_global` op can also use the function to get pointer to function. This PR adds handle the case with `cir.get_global` operation and fixes the issue.
This PR adds missing case to lowerCirAttrAsValue.
…r.try The final destination here is to support cir.try_calls that are not within a `try {}` statement in C++. This only affect untested paths that will assert a bit later than before, testcase coming soon.
The second part of the job started in llvm#412 , now about local structures. As it was mentioned previously, sometimes the layout for structures with bit fields inited with constants differ from the originally created in `CIRRecordLayoutBuilder` and it cause `storeOp` verification fail due to different structure type was used to allocation. This PR fix it. An example: ``` typedef struct { int a : 4; int b : 5; int c; } D; void bar () { D d = {1,2,3}; } ``` Well, I can't say I'm proud of these changes - it seems like a type safety violation, but looks like it's the best we can do here. The original codegen doesn't have this problem at all, there is just a `memcpy` there, I provide LLVM IR just for reference: ``` %struct.D = type { i16, i32 } @__const.bar.d = private unnamed_addr constant { i8, i8, i32 } { i8 33, i8 0, i32 3 }, align 4 ; Function Attrs: noinline nounwind optnone uwtable define dso_local void @bar() #0 { entry: %d = alloca %struct.D, align 4 call void @llvm.memcpy.p0.p0.i64(ptr align 4 %d, ptr align 4 @__const.bar.d, i64 8, i1 false) ret void } ```
In llvm#426 we confirmed that CIR needs a `cir.unreachable` operation to mark unreachable program points [(discussion)](llvm#426 (comment)). This PR adds it.
This PR adds support for evaluating constants in member exprs. The change is taken from original codegen.
- Add it to functions but not yet on calls. - Add more skeleton for tagging function attributes. - Testcases One more incremental step towards cir.try_call outside cir.try scopes.
The next step in inline-assembly support: we add instruction operands! Nothing interesting, just some copy-pasta from the `codegen` with some sort of simplifications for now. Well, I'm not sure `functional-type` is the best way to print operands though it's used in mlir's `InlineAsmOp`. But anyways, maybe you have a better idea. There are two or three steps ahead, so we are not that far from being able to run something!
Without this patch, CIR CodeGen continue to generate in the same block after `cir.break` and `cir.continue`, which would cause verification error because `cir.break` and `cir.continue` should appear at the end of blocks. This patch creates a new dangling block after generating `cir.break` and `cir.continue` to fix the issue. This will fix llvm#323.
* New `CastKind::addrspace_cast` for `cir.cast` * `TargetCIRGenInfo::performAddrSpaceCast` helper for non-constant values only * CIRGen for address space casting of pointers and references * Lowering to LLVM
…f.condition, scf.while (llvm#636) This pr intruduces CIRConditionLowering and CIRWhileLowering for lowering to scf.
This pr introduces CIRIfOpLowering for lowering cir.if to scf.if
Assumptions about values having a defining op can be misleading when block arguments are involved.
When loweringPrepare cg.var_arg for AArch64, we create multiple basic blocks, but didn't really get ordering of the blocks in the blocklist of the parent region right. That is, we didn't make sure the last of the block list is the naturally last block (exit) of the region. This PR fixes this problem. If we don't fix this problem, FlattenCFGPass will fail verification because CIRScopeOpFlattening in this pass is onlyy expecting to see cir.yield op in the last block of the region's block list.
This patch implements the lowering of function calls that receive and return void. In practice, nothing has to be done (at least for the x86 ABI), so this case is used as a primer for the target lowering library since it helps populate the base logic for handling calling convention lowering of function calls.
This patch implements the lowering of function definitions with no arguments and returns. In pratice, nothing has to be done (at least for the x86 ABI), so this case is used as a primer for the target lowering library since it helps populate the base logic for handling calling convention lowering of function definitions.
Continue the work of llvm#652 . Test the branch of null pointer expressions with side effects.
…lvm#684) This PR is to fix the issue llvm#658 . Now we can get the correct result using the following command. ``` echo "void test(int *){}" | ./build/Debug/bin/clang -cc1 -triple x86_64-unknown-linux-gnu -fclangir -fno-clangir-direct-lowering -emit-mlir -o - ``` result: ``` module attributes {cir.lang = #cir.lang<c>, cir.sob = #cir.signed_overflow_behavior<undefined>, cir.triple = "x86_64-unknown-linux-gnu", dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<!llvm.ptr, dense<64> : vector<4xi64>>, #dlti.dl_entry<i1, dense<8> : vector<2xi64>>, #dlti.dl_entry<f16, dense<16> : vector<2xi64>>, #dlti.dl_entry<i32, dense<32> : vector<2xi64>>, #dlti.dl_entry<i8, dense<8> : vector<2xi64>>, #dlti.dl_entry<i16, dense<16> : vector<2xi64>>, #dlti.dl_entry<!llvm.ptr<272>, dense<64> : vector<4xi64>>, #dlti.dl_entry<!llvm.ptr<271>, dense<32> : vector<4xi64>>, #dlti.dl_entry<!llvm.ptr<270>, dense<32> : vector<4xi64>>, #dlti.dl_entry<f128, dense<128> : vector<2xi64>>, #dlti.dl_entry<f64, dense<64> : vector<2xi64>>, #dlti.dl_entry<f80, dense<128> : vector<2xi64>>, #dlti.dl_entry<i128, dense<128> : vector<2xi64>>, #dlti.dl_entry<i64, dense<64> : vector<2xi64>>, #dlti.dl_entry<"dlti.stack_alignment", 128 : i64>, #dlti.dl_entry<"dlti.endianness", "little">>, llvm.data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"} { func.func @test(%arg0: memref<i32> loc(fused[#loc3, #loc4])) { %alloca = memref.alloca() {alignment = 8 : i64} : memref<memref<i32>> loc(#loc7) memref.store %arg0, %alloca[] : memref<memref<i32>> loc(#loc5) return loc(#loc2) } loc(#loc6) } loc(#loc) ``` And the test/CIR/Lowering/ThroughMLIR/dot.cir now passes the test, so I have removed the XFAIL flag.
Some function attributes are also callsite attributes, for instance, nothrow. This means they are going to show up in both. We don't support that just yet, hence the PR. CIR has an attribute `ExtraFuncAttr` that we current use as part of `FuncOp`, see CIROps.td. This attribute also needs to be added to `CallOp` and `TryCalOp`. Right now, In `CIRGenCall.cpp: AddAttributesFromFunctionProtoType` fills in `FuncAttrs`, but doesn't use it for anything. We should use the `FuncAttrs` result to populate constructing a `ExtraFuncAttr` and add it to the aforementioned call operations.
I also noticed that #482 has not been updated for a long time. If needed, I can implement the logic in Op::fold. |
Nice catch! Why this process happens twice? Or why/what-circunstances double folding is getting trigger? Is flattening also calling into fold operations? Your solution seems fine, but I'm afraid we need to understand the problem to prevent this "double folding" to pop up whenever we introduce new passes.
This needs not to happen, I can't tell from quick looking, sounds like you need debugger to the rescue.
Right, if you lower out of structured control flow you loose a bunch of the MLIR nice stuff.
I'd work on testcases that make sense for you or your goal, instead of just trying to reach some type of lowering completeness out of CIR. I'd expect code full of goto's not to be the most interesting stuff someone would want to use MLIR for. |
On another note, this PR is implementing both scope and switch, if only one of these operations are causing the said problems, I'd suggest you first land the part that matters, and the "harder parts" you add on the description can be left out to another PR. |
The double folding occurs during applyPartialConversion (ToLLVM or ToMLIR). In applyPartialConversion, there is an attempt to legalize an Op. clangir/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp Lines 3565 to 3566 in 47df94c
clangir/mlir/lib/Transforms/Utils/DialectConversion.cpp Lines 3609 to 3615 in 55e1496
In convertOperations: clangir/mlir/lib/Transforms/Utils/DialectConversion.cpp Lines 2624 to 2626 in 55e1496
In convert: clangir/mlir/lib/Transforms/Utils/DialectConversion.cpp Lines 2570 to 2573 in 55e1496
During the legalization, an isn't legal Op is legalized with legalizeWithFold. clangir/mlir/lib/Transforms/Utils/DialectConversion.cpp Lines 2030 to 2040 in 47df94c
In legalizeWithFold, operations with empty results undergo legalization again. clangir/mlir/lib/Transforms/Utils/DialectConversion.cpp Lines 2075 to 2078 in 47df94c
After returning to legalize, since it is still an isn't legal Op, it undergoes legalizeWithFold again. This seems to be a loop. Therefore, if the result of the fold is empty, we need to complete the folding before the ToLLVM and ToMLIR passes, that is, before applyPartialConversion, otherwise, the above situation will occur.
It only happens when running: ./build/Debug/bin/cir-opt -cir-to-mlir --debug ./clang/test/CIR/Lowering/ThroughMLIR/scope.cir And from my tests, the error occurs when the
However, when I remove the ./build/Debug/bin/cir-opt -cir-to-mlir --debug ./clang/test/CIR/Lowering/ThroughMLIR/scope.cir the problem disappears:
Through the debugger, I pinpointed the issue to occur during Now, I have moved the logic of ScopeOp::fold back to MergeCleanups and added comments, hoping that someone will be able to fix it :) |
Thanks for the full explanation, very nice! I wish I knew more about SCF to perhaps provide some insight on the root cause here =(
That works for me. One more promissing alternative though, is to enable Also, you need to update the PR because we just did a rebase against upstream! |
Should we close this PR? |
This pr is a part of #593 .
Move RemoveEmptyScope and RemoveEmptySwitch logic into Op::fold method and modify tests.
Here are my questions during the development process:
Since the NumResults for BrOp (and several other operations) is 0, when performing
legalizeWithFold
, it always enters the following block:mlir/lib/Transforms/Utils/DialectConversion.cpp
If this process occurs during
-cir-flat-to-llvm
or-cir-to-mlir
, it will again enter:mlir/lib/Transforms/Utils/DialectConversion.cpp
This results in double folding, causing errors. My current solution is to perform a
MergeCleanupsPass
before entering-cir-flat-to-llvm
and-cir-to-mlir
to avoid the described situation.After implementing
cir.Scope::fold
, issues arise in--debug
mode for unknown reasons. The problem occurs at the following statements:mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
In the ToMLIR path, should we perform the FlattenCFGPass? If we perform createFlattenCFGPass, the lowering of SCF will fail. However, if we do not perform it, it seems difficult to handle cir.goto.