Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR][CIRGen] Improve switch support for unrecheable code #528

Closed
wants to merge 1,536 commits into from

Conversation

wenpen
Copy link
Contributor

@wenpen wenpen commented Apr 1, 2024

Support non-block case and statementw that don't belong to any case region, fix #520 #521

bcardosolopes and others added 30 commits November 3, 2023 15:23
…m#357)

This PR fixes lowering of the next code:
```
void foo(int x, int y) {
    switch (x) {
        case 0:
            if (y)
                break;
            break;
    }
}
```
i.e. when some sub statement contains `break` as well. Previously, we
did this trick for `loop`: process nested `break`/`continue` statements
while `LoopOp` lowering if they don't belong to another `LoopOp` or
`SwitchOp`. This is why there is some refactoring here as well, but the
idea is stiil the same: we need to process nested operations and emit
branches to the proper blocks.

This is quite frequent bug in `llvm-test-suite`
This is how both libc++ and libstdc++ implement iterator in std::array, stick
to those use cases for now. We could add other variations in the future if there
are others around.
- Check whether container is part of std, add a fixed list of
available containers (for now only std::array)
- Add a getRawDecl method to ASTRecordDeclInterface
- Testcases
This was a bit half backed, give it some love.
Inspired by similar work in libc++, pointed to me by Louis Dionne
and Nikolas Klauser.

This is initial, very conservative and not generalized yet: works
for `char`s within a specific version of `std::find`.
…ents.

Before this fix conversion of flat offset to GlobalView indices could
crash or compute invalid result.
`ScopeOp` may end with `ReturnOp` instead of `YieldOp`, that is not
expected now. This PR fix this.
The reduced example is:
```
int foo() {
    {
        return 0;
    }
}
```
This is quite frequent bug in `llvm-test-suite`
One more step towards variable length array support.
This PR adds one more helper for the `alloca` instruction and re-use the
existing ones.

The reason is the following: right now there are two possible ways to
insert alloca: either to a function entry block or to the given block
after all the existing alloca instructions. But for VLA support we need
to insert alloca anywhere, right after an array's size becomes known.
Thus, we add one more parameter with the default value - insertion
point.

Also, we don't want copy-paste the code, and reuse the existing helpers,
but it may be a little bit confusing to read.
This PR adds `cir.ternary` lowering. There are two approaches to lower
`cir.ternary` imo:
1. Use `scf.if` op.
2. Use `cf.cond_br` op.

I choose `scf.if` because `scf.if` + canonicalization produces
`arith.select` whereas `cf.cond_br` requires scf lifting. In many ways
`scf.if` is more high-level and closer to `cir.ternary`.

A separate `cir.yield` lowering is required since we cannot directly
replace `cir.yield` in the ternary op lowering -- the yield operands may
still be illegal and doing so produces `builtin.unrealized_cast` ops. I
couldn't figured out a way to solve this issue without adding a separate
lowering pattern. Please let me know if you know a way to solve this
issue.
This PR fixes the next case
```
typedef struct { } A;

A create() { A a; return a; }

void foo() {
    A a;
    a = create();
}
```
i.e. when a struct  is assigned to a function call result
…vmgh-352) (llvm#363)

The error manifested in code like
```
int a[16];
int *const p = a;

void foo() {
  p[0];
}
```
It's one the most frequent errors in current llvm-test-suite.

I've added the test to globals.cir which is currently XFAILed, I think
@gitoleg will fix it soon.

Co-authored-by: Bruno Cardoso Lopes <bcardosolopes@users.noreply.github.com>
This PR addresses llvm#248 .

Currently string literals are always lowered to a `cir.const_array`
attribute even if the string literal only contains null bytes. This
patch make the CodeGen emits `cir.zero` for these string literals.
Currently, codegen of lvalue comma expression would crash:

```cpp
int &foo1();
int &foo2();

void c1() {
    int &x = (foo1(), foo2());  // CRASH
}
```

This simple patch fixes this issue.
This PR addresses llvm#90. It introduces a new type constraint `CIR_AnyType`
which allows CIR types and MLIR floating-point types. Present `AnyType`
constraints are replaced with the new `CIR_AnyType` constraint.
Arrays can be first declared without a known bound, and then defined
with a known bound. For example:

```cpp
extern int data[];

int test() { return data[1]; }

int data[3] {1, 2, 3};
```

Currently `clangir` crashes on generating CIR for this case. This is due
to the type of the `data` definition being different from its
declaration. This patch adds support for such a case.
Breaks the pass into smaller more manageable rewrites.
…IdiomRecognizer. (llvm#389)

Some tests started failing under `-DLLVM_USE_SANITIZER=Address` due to
trivial use-after-free errors.
Like SCF's `scf.condition`, the `cir.condition` simplifies codegen of
loop conditions by removing the need of a contitional branch. It takes a
single boolean operand which, if true, executes the body region,
otherwise exits the loop. This also simplifies lowering and the dialect
it self.

A new constraint is now enforced on `cir.loops`: the condition region
must terminate with a `cir.condition` operation.

A few tests were removed as they became redundant, and others where
simplified.

The merge-cleanups pass no longer simplifies compile-time constant
conditions, as the condition body terminator is no longer allowed to be
terminated with a `cir.yield`. To circumvent this, a proper folder
should be implemented to fold constant conditions, but this was left as
future work.

Co-authored-by: Bruno Cardoso Lopes <bcardosolopes@users.noreply.github.com>
Once the LexicalScope goes out of scope, its cleanup process will also
check if a return was set to be yielded, and, if so, generate the yield
with the respective value.

ghstack-source-id: 9305d2ba5631840937721755358a774dc9e08b90
Pull Request resolved: llvm#312
Instead of returning a boolean indicating whether the statement was
handled, returns the ReturnExpr of the statement if there is one. It
also adds some extra bookkeeping to ensure that the result is returned
when needed. This allows for better support of GCC's `ExprStmt`
extension.

The logical result was not used: it was handled but it would never fail.
Any errors within builders should likely be handled with asserts and
unreachables since they imply a programmer's error in the code.

ghstack-source-id: 2319cf3f12e56374a52aaafa4304e74de3ee6453
Pull Request resolved: llvm#313
Adds support for GCC statement expressions return values as well as
StmtExpr LValue emissions.

To simplify the lowering process, the scope return value is not used.
Instead, a temporary allocation is created on the parent scope where the
return value is stored. For classes, a second scope is created around
this temporary allocation to ensure any destructors are called.

This does not implement the full semantics of statement expressions.

ghstack-source-id: 64e03fc3df45975590ddbcab44959c2b49601101
Pull Request resolved: llvm#314
@wenpen wenpen force-pushed the switch_support_single_case branch from 5223c3c to 9ae5d1f Compare May 10, 2024 11:06
Comment on lines 470 to 473
// TODO: Rewrite the logic to handle ReturnStmt inside SwitchStmt, then
// clean up the code below.
if (currLexScope->IsInsideCaseNoneStmt)
return mlir::success();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found many sample code that failed due to incorrect terminator in block, e.g.

  switch(a) {
  case 0:
    break; 
    int x = 1;
  }
  switch(a) {
  case 0:
    return 0;
    return 1;
    int x = 1;
  }
for (;;) {
  break;
  int x = 1;
}

Looks like it's another large work, so I just skip ReturnStmt here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, can you file a new issue and list these?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm opposed to return mlir::success(); because it will just silently skips something we don't know how to handle, I rather these things fails or crash, so that it's clear that they aren't implemented? What happens when you remove this return?

@wenpen wenpen requested a review from bcardosolopes May 11, 2024 06:14
@wenpen wenpen marked this pull request as ready for review May 11, 2024 06:59
@@ -328,6 +328,14 @@ mlir::LogicalResult CIRGenFunction::buildLabelStmt(const clang::LabelStmt &S) {
// IsEHa: not implemented.
assert(!(getContext().getLangOpts().EHAsynch && S.isSideEntry()));

// TODO: After support case stmt crossing scopes, we should build LabelStmt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any TODO in CIRGen should be TODO(cir)

@@ -2027,6 +2031,8 @@ class CIRGenFunction : public CIRGenTypeCache {
// Scope entry block tracking
mlir::Block *getEntryBlock() { return EntryBlock; }

bool IsInsideCaseNoneStmt = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this, reasons below.

// and clean LexicalScope::IsInsideCaseNoneStmt.
for (auto *lexScope = currLexScope; lexScope;
lexScope = lexScope->getParentScope()) {
assert(!lexScope->IsInsideCaseNoneStmt &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you remove this code? Also, why doesn't it work to just walk the scope up until you find a switch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, we won't need this assert anymore if we could keep the case none stmt somehow as you suggested.

What happens if you remove this code?

Remove this code won't cause incorrect behavior currently (as we didn't support goto in that case yet), but I think it may produce strange error message in the future.

switch (int x) {
foo:
  x = 1;
  break;
case 2:
  goto foo;
}

We need to avoid erasing the CaseNoneStmt containing label foo.

why doesn't it work to just walk the scope up until you find a switch?

Refer to the below code, we need to guarantee the removed Stmt won't contain any LabelStmt, whether the LabelStmt is inside another nested switch or not.

switch(x) {
  switch(x) {
  case 1:
foo:
    break;
  }
  break;
case 1:
  goto foo;
}

Comment on lines 470 to 473
// TODO: Rewrite the logic to handle ReturnStmt inside SwitchStmt, then
// clean up the code below.
if (currLexScope->IsInsideCaseNoneStmt)
return mlir::success();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, can you file a new issue and list these?

Comment on lines 470 to 473
// TODO: Rewrite the logic to handle ReturnStmt inside SwitchStmt, then
// clean up the code below.
if (currLexScope->IsInsideCaseNoneStmt)
return mlir::success();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm opposed to return mlir::success(); because it will just silently skips something we don't know how to handle, I rather these things fails or crash, so that it's clear that they aren't implemented? What happens when you remove this return?

@@ -704,6 +717,22 @@ CIRGenFunction::buildSwitchCase(const SwitchCase &S, mlir::Type condType,
llvm_unreachable("expect case or default stmt");
}

mlir::LogicalResult CIRGenFunction::buildCaseNoneStmt(const Stmt *S) {
// Create orphan region to skip over the case none stmts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you are creating an orphan region, this mean that anything emitted inside a buildCaseNoneStmt will never execute, right? The problem if a orphan region is that it won't get attached to anything, so it really adds no value (not even for unrecheable code analysis). If so, better just to split the current basic block A into two: B and C. A should jump to C and you emit the code in B.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find a good place to hold the block of CaseNoneStmt.

For example

void f(int x) {
  switch(x) {
    break;
  }
}

There is no region inside SwitchOp, so we have to put the break block outside SwitchOp, which cause verification failed: 'cir.break' op must be within a loop or switch.

Did I misunderstand something? Looking forward to your suggestions~

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point, but if you go for the current approach you might as well skip this codegen entirely, because what you are emitting won't ever be attached to anything. I think it's safer to mimic the original codegen here, what is Clang currently doing for OG codegen?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should create a SwitchOp with at least one default region and delete that at the end if it ends up unused?

@wenpen wenpen force-pushed the switch_support_single_case branch from f726860 to 7a61b3c Compare May 17, 2024 05:48
Copy link
Contributor Author

@wenpen wenpen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also feel the solution about #521 is not very natural, so I'll be happy to modify it if you have some ideas. Or I could revert the change and only solve #520 in this pr, if you think the definition of SwitchOp should be changed firstly. Thanks!

@@ -704,6 +717,22 @@ CIRGenFunction::buildSwitchCase(const SwitchCase &S, mlir::Type condType,
llvm_unreachable("expect case or default stmt");
}

mlir::LogicalResult CIRGenFunction::buildCaseNoneStmt(const Stmt *S) {
// Create orphan region to skip over the case none stmts.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find a good place to hold the block of CaseNoneStmt.

For example

void f(int x) {
  switch(x) {
    break;
  }
}

There is no region inside SwitchOp, so we have to put the break block outside SwitchOp, which cause verification failed: 'cir.break' op must be within a loop or switch.

Did I misunderstand something? Looking forward to your suggestions~

// and clean LexicalScope::IsInsideCaseNoneStmt.
for (auto *lexScope = currLexScope; lexScope;
lexScope = lexScope->getParentScope()) {
assert(!lexScope->IsInsideCaseNoneStmt &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, we won't need this assert anymore if we could keep the case none stmt somehow as you suggested.

What happens if you remove this code?

Remove this code won't cause incorrect behavior currently (as we didn't support goto in that case yet), but I think it may produce strange error message in the future.

switch (int x) {
foo:
  x = 1;
  break;
case 2:
  goto foo;
}

We need to avoid erasing the CaseNoneStmt containing label foo.

why doesn't it work to just walk the scope up until you find a switch?

Refer to the below code, we need to guarantee the removed Stmt won't contain any LabelStmt, whether the LabelStmt is inside another nested switch or not.

switch(x) {
  switch(x) {
  case 1:
foo:
    break;
  }
  break;
case 1:
  goto foo;
}

Comment on lines +470 to +474
// TODO(cir): Rewrite the logic to handle ReturnStmt inside SwitchStmt, then
// clean up the code below.
if (currLexScope->IsInsideCaseNoneStmt)
return mlir::success();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm opposed to return mlir::success(); because it will just silently skips something we don't know how to handle, I rather these things fails or crash, so that it's clear that they aren't implemented? What happens when you remove this return?

buildReturnStmt() assume there is exactly one return block in a region, and there is one region in a lexical scope, the only exceptions are switch scope, which has multiple regions. The related code is

    mlir::Block *getOrCreateRetBlock(CIRGenFunction &CGF, mlir::Location loc) {
      unsigned int regionIdx = 0;
      if (isSwitch())
        regionIdx = SwitchRegions.size() - 1;
      if (regionIdx >= RetBlocks.size())
        return createRetBlock(CGF, loc);
      return &*RetBlocks.back();
    }

So if we remove the return here, the following code will cause crash. regionIdx will be -1, and we'll call RetBlocks .back() with empty RetBlocks

int f(int x) {
  switch(x) {
    return 0;
  }
  return 1;
}

By the way, I believe the current implementation of getOrCreateRetBlock() about switch is incorrect and also should be solved after changing definition of SwitchOp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I believe the current implementation of getOrCreateRetBlock() about switch is incorrect and also should be solved after changing definition of SwitchOp.

Right, we should fix the logic, not take shortcuts like returning mlir::success(). Can you elaborate on what do you mean by changing the definition of SwitchOp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I posted my thought in #528 to discuss it, thanks~

@bcardosolopes
Copy link
Member

I'm going to resume reviewing this, sorry for the delay!

@@ -704,6 +717,22 @@ CIRGenFunction::buildSwitchCase(const SwitchCase &S, mlir::Type condType,
llvm_unreachable("expect case or default stmt");
}

mlir::LogicalResult CIRGenFunction::buildCaseNoneStmt(const Stmt *S) {
// Create orphan region to skip over the case none stmts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point, but if you go for the current approach you might as well skip this codegen entirely, because what you are emitting won't ever be attached to anything. I think it's safer to mimic the original codegen here, what is Clang currently doing for OG codegen?

Comment on lines +470 to +474
// TODO(cir): Rewrite the logic to handle ReturnStmt inside SwitchStmt, then
// clean up the code below.
if (currLexScope->IsInsideCaseNoneStmt)
return mlir::success();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I believe the current implementation of getOrCreateRetBlock() about switch is incorrect and also should be solved after changing definition of SwitchOp.

Right, we should fix the logic, not take shortcuts like returning mlir::success(). Can you elaborate on what do you mean by changing the definition of SwitchOp?

@@ -704,6 +717,22 @@ CIRGenFunction::buildSwitchCase(const SwitchCase &S, mlir::Type condType,
llvm_unreachable("expect case or default stmt");
}

mlir::LogicalResult CIRGenFunction::buildCaseNoneStmt(const Stmt *S) {
// Create orphan region to skip over the case none stmts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should create a SwitchOp with at least one default region and delete that at the end if it ends up unused?

@bcardosolopes bcardosolopes changed the title [CIR][CIRGen] Enhance switch [CIR][CIRGen] Improve switch support for unrecheable code Jun 6, 2024
@bcardosolopes
Copy link
Member

I landed #611 which has some comments related to this PR (cc: @piggynl)

@lanza lanza self-requested a review as a code owner June 21, 2024 19:50
bruteforceboy pushed a commit to bruteforceboy/clangir that referenced this pull request Oct 2, 2024
Make logic cleaner and more extensible.

Separate collecting `SwitchStmt` information and building op logic into
different functions.
Add more UT to cover nested switch, which also worked before this pr.

This pr is split from llvm#528.
Hugobros3 pushed a commit to shady-gang/clangir that referenced this pull request Oct 2, 2024
Make logic cleaner and more extensible.

Separate collecting `SwitchStmt` information and building op logic into
different functions.
Add more UT to cover nested switch, which also worked before this pr.

This pr is split from llvm#528.
keryell pushed a commit to keryell/clangir that referenced this pull request Oct 19, 2024
Make logic cleaner and more extensible.

Separate collecting `SwitchStmt` information and building op logic into
different functions.
Add more UT to cover nested switch, which also worked before this pr.

This pr is split from llvm#528.
@smeenai
Copy link
Collaborator

smeenai commented Oct 29, 2024

This was superseded by #1006; thank you for laying the groundwork for it!

@smeenai smeenai closed this Oct 29, 2024
lanza pushed a commit that referenced this pull request Nov 5, 2024
Make logic cleaner and more extensible.

Separate collecting `SwitchStmt` information and building op logic into
different functions.
Add more UT to cover nested switch, which also worked before this pr.

This pr is split from #528.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Assertion failure on switch statement with non-block substatement