Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR][Transforms] Move MergeCleanupsPass logic into Op::fold method #690

Closed
wants to merge 1,620 commits into from

Conversation

Kritoooo
Copy link
Contributor

@Kritoooo Kritoooo commented Jun 17, 2024

This pr is a part of #593 .
Move RemoveEmptyScope and RemoveEmptySwitch logic into Op::fold method and modify tests.
Here are my questions during the development process:

  1. Since the NumResults for BrOp (and several other operations) is 0, when performing legalizeWithFold, it always enters the following block:
    mlir/lib/Transforms/Utils/DialectConversion.cpp

    // An empty list of replacement values indicates that the fold was in-place.
    // As the operation changed, a new legalization needs to be attempted.
    if (replacementValues.empty())
      return legalize(op, rewriter);

    If this process occurs during -cir-flat-to-llvm or -cir-to-mlir, it will again enter:
    mlir/lib/Transforms/Utils/DialectConversion.cpp

    // If the operation isn't legal, try to fold it in-place.
    // TODO: Should we always try to do this, even if the op is
    // already legal?
    if (succeeded(legalizeWithFold(op, rewriter))) {
      LLVM_DEBUG({
        logSuccess(logger, "operation was folded");
        logger.startLine() << logLineComment;
      });
      return success();
    }

    This results in double folding, causing errors. My current solution is to perform a MergeCleanupsPass before entering -cir-flat-to-llvm and -cir-to-mlir to avoid the described situation.

  2. After implementing cir.Scope::fold, issues arise in --debug mode for unknown reasons. The problem occurs at the following statements:
    mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp

    #ifndef NDEBUG
          Operation *dumpRootOp = getDumpRootOp(op);
    #endif // NDEBUG
          if (foldResults.empty()) {
            // Op was modified in-place.
            notifyOperationModified(op);
            changed = true;
            LLVM_DEBUG(logSuccessfulFolding(dumpRootOp));
  3. In the ToMLIR path, should we perform the FlattenCFGPass? If we perform createFlattenCFGPass, the lowering of SCF will fail. However, if we do not perform it, it seems difficult to handle cir.goto.

Lancern and others added 30 commits February 3, 2024 06:52
This patch adds CIRGen for downcasting a pointer to the complete object
through `dynamic_cast<void *>`.

Together with llvm#426 , the full functionality of `dynamic_cast` should be
supported in CIRGen after this PR merges.
This PR adds support for constant arrays with trailing zeros.

The original `CodeGen` does the following: once a constant array contain
trailing zeros, a struct with two members is generated: initialized
elements and `zeroinitializer` for the remaining part. And depending on
some conditions, `memset` or `memcpy` are emitted. In the latter case a
global const array is created.
Well, we may go this way, but it requires us to implement
[features](https://github.com/llvm/clangir/blob/main/clang/lib/CIR/CodeGen/CIRGenDecl.cpp#L182)
that are not implemented yet.

Another option is to add one more parameter to the `constArrayAttr` and
utilize it during the lowering. So far I chose this way, but if you have
any doubts, we can discuss here. So we just emit constant array as
usually and once there are trailing zeros, lower this arrray (i.e. an
attribute) as a value.

I added a couple of tests and will add more, once we agree on the
approach. So far I marked the PR as a draft one.
…nters (llvm#400)

This also adds a missing check whether the pointer returned from
`memchr` is null and changes the result to `last` in that case.
Emit the false-branch of the consteval if statement, if any.
Originally, the location associated with a function is checked to be an
`mlir::FileLineColLoc` before the function is lowered to an LLVMIR
FuncOp. However, runtime function declarations do not have such
locations. This patch further allows `mlir::UnknownLoc` to be associated
with a function.
This PR adds a support for const structs with bitfields.
Now only global structs are supported, the support of the local ones can
be added more or less easily - there is one ugly thing need to be done
though)

So .. what is all about.
First of all - as usually, I'm sorry for the big PR. But it's hard to
break it down to peaces. The good news is that in the same time it's a
copy-pasta from the original codegen, no surprises here. Basically, the
most hard place to read is `ConstantAggregateBuilder::addBits` copied
with minimum of changes.

The main problem - and frankly speaking I have no idea why it's done
this way in the original codegen - is that the data layout is different
for such structures, I mean literally another type is used. For
instance, the code:
```
struct T {
  int X : 15;
  int Y : 6;
  unsigned Z : 9;
  int W;
};

struct T GV = { 1, 5, 256, -1};
```
is represented in LLVM IR (with no CIR enabled) as:

```
%struct.T = type { i32, i32 }
%struct.Inner = type { i8, i32 }

@gv = dso_local global { i8, i8, i8, i8, i32 } ...
```
i.e. the global var `GV` is looks like a struct of single bytes (up to
the last field, which is not a btfield).
And my guess is that we want to have the same behavior in CIR. So we do.

The main problem is that we have to treat the same data differently -
and this is why one additional `bitcast` is needed when we create a
global var. Actually, there was a comment there - and I really wonder
where it came from. But anyways, I don't really like this and don't see
any good workaround here. Well, maybe we may add a kind of map in order
to store the correspondence between types and do a bitcast more wisely.
The same is true for the const structs with bitfields defined locally.
The minimal bug repro:
```
#include <stdbool.h>
#include <stdint.h>
void bar() {
  bool x = true;
  uint8_t y = (uint8_t)x;
}
```
Fails on verification stage:
```
loc("repro.c":5:24): error: integer width of the output type is smaller or equal to the integer width of the input type
fatal error: error in backend: The pass manager failed to lower CIR to LLVMIR dialect!
```
The problem is that in some cases lowering from CIR emits the invalid
zext operation. PR fixes this issue by emitting the llvm.bitcast instead
of llvm.zext in such cases.
The change is taken from original codegen.
The issue is that the CIR codegen assumes that function pointer is
always result of cir.load op.
But it isn't true because the funcion pointer may be result of other
operations (f.e cir.call).
This PR adds support for global compound literals.
The implementation is almost the same as in original codegen. But the
original codegen can reuse the value of emitted compound literal global
variable in case then the init expression of new variable and this
variable are the same.
It's easy to implement this feature. But I can't find any test-case then
this feature will be applied. So I decided to ignore this optimization
opportunity to avoid mistakes.
Here is the next step in VLA support.
Basically, these changes handle different expressions, like `int
(*a[5])[n]` or `sizeof(a[n])`.

I took tests from the original `codegen` - they don't check anything,
just verify we don't fail.

There is still an issue with a proper cleanup - there are cases when
`stack_save` doesn't dominate a corresponded `stack_restore`. For
example in the next example:

```
void test(unsigned x) {
  while (1) {
    char a[x];
    if (x > 5)
      break;
    ++x;
  }
}
```
Look like `break` here doesn't lead to `stack_restore`. But I would say
this is less related to VLA, though probably I need to fix this as well.
This PR fixes the issue connected with folding a simple boolean
expresion pattern (f.e. `0 && RHS = 0`).
The problem is that the scalar expression emitter always creates a
`cir.bool` attribute as a result of expression. But in some cases the
result expression should be a `cir.int` attr.
The change is taken from the original llvm codegen.
Originally those are only used for debug info generation, so get a bit more
specific on what's missing here.
When replacing the no-proto functions with it's real definition, codegen
assumes that only `cir.call` operation may use the replaced function.
Such behaviour leads to compilation error because of the
`cir.get_global` op can also use the function to get pointer to
function.
This PR adds handle the case with `cir.get_global` operation and fixes
the issue.
This PR adds missing case to lowerCirAttrAsValue.
…r.try

The final destination here is to support cir.try_calls that are not within a
`try {}` statement in C++.  This only affect untested paths that will
assert a bit later than before, testcase coming soon.
The second part of the job started in llvm#412 , now about local structures.

As it was mentioned previously, sometimes the layout for structures with
bit fields inited with constants differ from the originally created in
`CIRRecordLayoutBuilder` and it cause `storeOp` verification fail due to
different structure type was used to allocation.
This PR fix it.
An example:
```
typedef struct {
  int a : 4;
  int b : 5;
  int c;
} D;

void bar () {
  D d = {1,2,3};
}
```

Well, I can't say I'm proud of these changes - it seems like a type
safety violation, but looks like it's the best we can do here.

The original codegen doesn't have this problem at all, there is just a
`memcpy` there, I provide LLVM IR just for reference:

```
%struct.D = type { i16, i32 }

@__const.bar.d = private unnamed_addr constant { i8, i8, i32 } { i8 33, i8 0, i32 3 }, align 4

; Function Attrs: noinline nounwind optnone uwtable
define dso_local void @bar() #0 {
entry:
  %d = alloca %struct.D, align 4
  call void @llvm.memcpy.p0.p0.i64(ptr align 4 %d, ptr align 4 @__const.bar.d, i64 8, i1 false)
  ret void
}
```
In llvm#426 we confirmed that CIR needs a `cir.unreachable` operation to
mark unreachable program points
[(discussion)](llvm#426 (comment)).
This PR adds it.
This PR adds support for evaluating constants in member exprs.
The change is taken from original codegen.
- Add it to functions but not yet on calls.
- Add more skeleton for tagging function attributes.
- Testcases

One more incremental step towards cir.try_call outside cir.try scopes.
The next step in inline-assembly support: we add instruction operands!
Nothing interesting, just some copy-pasta from the `codegen` with some
sort of simplifications for now.

Well, I'm not sure `functional-type` is the best way to print operands
though it's used in mlir's `InlineAsmOp`. But anyways, maybe you have a
better idea.

There are two or three steps ahead, so we are not that far from being
able to run something!
piggynl and others added 17 commits June 6, 2024 15:54
Without this patch, CIR CodeGen continue to generate in the same block
after `cir.break` and `cir.continue`, which would cause verification
error because `cir.break` and `cir.continue` should appear at the end of
blocks.

This patch creates a new dangling block after generating `cir.break` and
`cir.continue` to fix the issue.

This will fix llvm#323.
* New `CastKind::addrspace_cast` for `cir.cast`
* `TargetCIRGenInfo::performAddrSpaceCast` helper for non-constant
values only
* CIRGen for address space casting of pointers and references
* Lowering to LLVM
)

This PR adds support for ` __sync_bool_compare_and_swap` and `
__sync_val_compare_and_swap`.
…f.condition, scf.while (llvm#636)

This pr intruduces CIRConditionLowering and CIRWhileLowering for
lowering to scf.
This pr introduces CIRIfOpLowering for lowering cir.if to scf.if
Assumptions about values having a defining op can be misleading when block
arguments are involved.
When loweringPrepare cg.var_arg for AArch64, we create multiple basic
blocks, but didn't really get ordering of the blocks in the blocklist of
the parent region right. That is, we didn't make sure the last of the
block list is the naturally last block (exit) of the region. This PR
fixes this problem.

If we don't fix this problem, FlattenCFGPass will fail verification
because CIRScopeOpFlattening in this pass is onlyy expecting to see
cir.yield op in the last block of the region's block list.
…d store (llvm#674)

Continue the work of llvm#613 .

Original CodeGen treat vec3 as vec4 to get aligned memory access. This
PR enable these paths.
This patch implements the lowering of function calls that receive and
return void. In practice, nothing has to be done (at least for the x86
ABI), so this case is used as a primer for the target lowering library
since it helps populate the base logic for handling calling convention
lowering of function calls.
This patch implements the lowering of function definitions with no
arguments and returns. In pratice, nothing has to be done (at least for
the x86 ABI), so this case is used as a primer for the target lowering
library since it helps populate the base logic for handling calling
convention lowering of function definitions.
Continue the work of llvm#652 . Test the branch of null pointer expressions
with side effects.
…method (llvm#663)

This pr is a part of llvm#593 .
Move RemoveRedundantBranches logic into BrOp::fold method and modify
tests.
…lvm#684)

This PR is to fix the issue llvm#658 .
Now we can get the correct result using the following command.
```
echo "void test(int *){}" |  ./build/Debug/bin/clang -cc1 -triple x86_64-unknown-linux-gnu -fclangir -fno-clangir-direct-lowering -emit-mlir -o -
```
result:
```
module attributes {cir.lang = #cir.lang<c>, cir.sob = #cir.signed_overflow_behavior<undefined>, cir.triple = "x86_64-unknown-linux-gnu", dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<!llvm.ptr, dense<64> : vector<4xi64>>, #dlti.dl_entry<i1, dense<8> : vector<2xi64>>, #dlti.dl_entry<f16, dense<16> : vector<2xi64>>, #dlti.dl_entry<i32, dense<32> : vector<2xi64>>, #dlti.dl_entry<i8, dense<8> : vector<2xi64>>, #dlti.dl_entry<i16, dense<16> : vector<2xi64>>, #dlti.dl_entry<!llvm.ptr<272>, dense<64> : vector<4xi64>>, #dlti.dl_entry<!llvm.ptr<271>, dense<32> : vector<4xi64>>, #dlti.dl_entry<!llvm.ptr<270>, dense<32> : vector<4xi64>>, #dlti.dl_entry<f128, dense<128> : vector<2xi64>>, #dlti.dl_entry<f64, dense<64> : vector<2xi64>>, #dlti.dl_entry<f80, dense<128> : vector<2xi64>>, #dlti.dl_entry<i128, dense<128> : vector<2xi64>>, #dlti.dl_entry<i64, dense<64> : vector<2xi64>>, #dlti.dl_entry<"dlti.stack_alignment", 128 : i64>, #dlti.dl_entry<"dlti.endianness", "little">>, llvm.data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"} {
  func.func @test(%arg0: memref<i32> loc(fused[#loc3, #loc4])) {
    %alloca = memref.alloca() {alignment = 8 : i64} : memref<memref<i32>> loc(#loc7)
    memref.store %arg0, %alloca[] : memref<memref<i32>> loc(#loc5)
    return loc(#loc2)
  } loc(#loc6)
} loc(#loc)
```
And the test/CIR/Lowering/ThroughMLIR/dot.cir now passes the test, so I
have removed the XFAIL flag.
Some function attributes are also callsite attributes, for instance,
nothrow. This means they are going to show up in both. We don't support
that just yet, hence the PR.

CIR has an attribute `ExtraFuncAttr` that we current use as part of
`FuncOp`, see CIROps.td. This attribute also needs to be added to
`CallOp` and `TryCalOp`.

Right now, In `CIRGenCall.cpp: AddAttributesFromFunctionProtoType` fills
in `FuncAttrs`, but doesn't use it for anything. We should use the
`FuncAttrs` result to populate constructing a `ExtraFuncAttr` and add it
to the aforementioned call operations.
@Kritoooo
Copy link
Contributor Author

I also noticed that #482 has not been updated for a long time. If needed, I can implement the logic in Op::fold.

@bcardosolopes
Copy link
Member

This results in double folding, causing errors. My current solution is to perform a MergeCleanupsPass before entering -cir-flat-to-llvm and -cir-to-mlir to avoid the described situation.

Nice catch! Why this process happens twice? Or why/what-circunstances double folding is getting trigger? Is flattening also calling into fold operations? Your solution seems fine, but I'm afraid we need to understand the problem to prevent this "double folding" to pop up whenever we introduce new passes.

  1. After implementing cir.Scope::fold, issues arise in --debug mode for unknown reasons. The problem occurs at the following statements:

This needs not to happen, I can't tell from quick looking, sounds like you need debugger to the rescue.

  1. In the ToMLIR path, should we perform the FlattenCFGPass? If we perform createFlattenCFGPass, the lowering of SCF will fail.

Right, if you lower out of structured control flow you loose a bunch of the MLIR nice stuff.

However, if we do not perform it, it seems difficult to handle cir.goto.

I'd work on testcases that make sense for you or your goal, instead of just trying to reach some type of lowering completeness out of CIR. I'd expect code full of goto's not to be the most interesting stuff someone would want to use MLIR for.

@bcardosolopes
Copy link
Member

I also noticed that #482 has not been updated for a long time. If needed, I can implement the logic in Op::fold.

Feel free to work on #482, that hasn't had any update so time to make progress regardless

@bcardosolopes
Copy link
Member

On another note, this PR is implementing both scope and switch, if only one of these operations are causing the said problems, I'd suggest you first land the part that matters, and the "harder parts" you add on the description can be left out to another PR.

@Kritoooo
Copy link
Contributor Author

Nice catch! Why this process happens twice? Or why/what-circunstances double folding is getting trigger? Is flattening also calling into fold operations?

The double folding occurs during applyPartialConversion (ToLLVM or ToMLIR). In applyPartialConversion, there is an attempt to legalize an Op.

if (failed(applyPartialConversion(ops, target, std::move(patterns))))
signalPassFailure();

LogicalResult mlir::applyPartialConversion(
ArrayRef<Operation *> ops, const ConversionTarget &target,
const FrozenRewritePatternSet &patterns, ConversionConfig config) {
OperationConverter opConverter(target, patterns, config,
OpConversionMode::Partial);
return opConverter.convertOperations(ops);
}

In convertOperations:
for (auto *op : toConvert)
if (failed(convert(rewriter, op)))
return rewriterImpl.undoRewrites(), failure();

In convert:
LogicalResult OperationConverter::convert(ConversionPatternRewriter &rewriter,
Operation *op) {
// Legalize the given operation.
if (failed(opLegalizer.legalize(op, rewriter))) {

During the legalization, an isn't legal Op is legalized with legalizeWithFold.
// If the operation isn't legal, try to fold it in-place.
// TODO: Should we always try to do this, even if the op is
// already legal?
if (succeeded(legalizeWithFold(op, rewriter))) {
LLVM_DEBUG({
logSuccess(logger, "operation was folded");
logger.startLine() << logLineComment;
});
return success();
}

In legalizeWithFold, operations with empty results undergo legalization again.
// An empty list of replacement values indicates that the fold was in-place.
// As the operation changed, a new legalization needs to be attempted.
if (replacementValues.empty())
return legalize(op, rewriter);

After returning to legalize, since it is still an isn't legal Op, it undergoes legalizeWithFold again. This seems to be a loop.
Therefore, if the result of the fold is empty, we need to complete the folding before the ToLLVM and ToMLIR passes, that is, before applyPartialConversion, otherwise, the above situation will occur.

This needs not to happen, I can't tell from quick looking, sounds like you need debugger to the rescue.

It only happens when running:

./build/Debug/bin/cir-opt -cir-to-mlir --debug ./clang/test/CIR/Lowering/ThroughMLIR/scope.cir

And from my tests, the error occurs when the scf dialect is loaded:

Load new dialect in Context scf
ImplicitTypeIDRegistry::lookupOrInsert(mlir::DestinationStyleOpInterface)
ImplicitTypeIDRegistry::lookupOrInsert(mlir::ParallelCombiningOpInterface)
ImplicitTypeIDRegistry::lookupOrInsert(mlir::detail::OpToOpPassAdaptor)
//===-------------------------------------------===//
Processing operation : 'cir.scope'(0x574b7d06fe20) {
ImplicitTypeIDRegistry::lookupOrInsert(mlir::OpTrait::HasRecursiveMemoryEffects<Empty>)
} -> success : operation was folded
//===-------------------------------------------===//
** Modified: 'cir.scope'(0x574b7d06fe20)
// *** IR Dump After Successful Folding ***
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:

However, when I remove the scf dialect and run:

./build/Debug/bin/cir-opt -cir-to-mlir --debug ./clang/test/CIR/Lowering/ThroughMLIR/scope.cir

the problem disappears:

ImplicitTypeIDRegistry::lookupOrInsert(mlir::bufferization::AllocationOpInterface)
ImplicitTypeIDRegistry::lookupOrInsert(mlir::RuntimeVerifiableOpInterface)
ImplicitTypeIDRegistry::lookupOrInsert(mlir::DestructurableTypeInterface)
ImplicitTypeIDRegistry::lookupOrInsert(mlir::detail::OpToOpPassAdaptor)
//===-------------------------------------------===//
Processing operation : 'cir.scope'(0x57874bee12a0) {
ImplicitTypeIDRegistry::lookupOrInsert(mlir::OpTrait::HasRecursiveMemoryEffects<Empty>)
} -> success : operation was folded
//===-------------------------------------------===//
** Modified: 'cir.scope'(0x57874bee12a0)
// *** IR Dump After Successful Folding ***
cir.scope {
}

Through the debugger, I pinpointed the issue to occur during Op->dump(), which seems to be related to AsmPrinter. However, I am unsure what changes the loading of the scf dialect introduces that cause this issue. From my tests, running cir-opt --debug with ToLLVM and without the scf dialect on ToMLIR does not produce any issues. Of course, running without --debug also does not cause any problems.

Now, I have moved the logic of ScopeOp::fold back to MergeCleanups and added comments, hoping that someone will be able to fix it :)

@bcardosolopes
Copy link
Member

Thanks for the full explanation, very nice! I wish I knew more about SCF to perhaps provide some insight on the root cause here =(

Now, I have moved the logic of ScopeOp::fold back to MergeCleanups and added comments, hoping that someone will be able to fix it :)

That works for me. One more promissing alternative though, is to enable MergeCleanups in voidConvertCIRToMLIRPass::runOnOperation() as a pre-requisite pass, which should get rid of the scope before you hit SCF and friends. wdyt?

Also, you need to update the PR because we just did a rebase against upstream!

@bcardosolopes
Copy link
Member

Should we close this PR?

@Kritoooo Kritoooo closed this Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.