Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failure on switch statement with case label in nested block #522

Open
dkolsen-pgi opened this issue Mar 24, 2024 · 11 comments
Open

Comments

@dkolsen-pgi
Copy link
Collaborator

ClangIR hits an assertion failure when a switch statement contains a case label that is within a nested block statement.

int g(int x) {
  switch (x) {
    case 1:
      return -1;
    {
      case 2:
        return -20;
    }
  }
  return x;
}
clang++: .../clang/lib/CIR/CodeGen/CIRGenStmt.cpp:300: 
mlir::LogicalResult cir::CIRGenFunction::buildSimpleStmt(const clang::Stmt*, bool): 
Assertion `0 && "Should not get here, currently handled directly from SwitchStmt"' failed.

While code like this is unlikely to appear in production and will usually be found in test suites that try to break the compiler, it is legal code in both C and C++ and should not trigger an internal compiler error.

@bcardosolopes
Copy link
Member

While code like this is unlikely to appear in production

Famous "last words" haha, I just hit this in prod! Some "temporary" workaround is using this.

Now that @gitoleg work has landed we probably have the machinery to solve this. Any chance (or any plans) you might work on this soon @gitoleg? (cc. @wenpen)

@gitoleg
Copy link
Collaborator

gitoleg commented May 10, 2024

I will take a look in the next few days, sure

@gitoleg
Copy link
Collaborator

gitoleg commented May 14, 2024

Now that @gitoleg work has landed we probably have the machinery to solve this.

This is all about codegen, so I would say the previous work has nothing to do with it.

Speaking about the current issue, it's still not obvious for me how to handle it properly, though I think I can suggest an approach.
First of all, I'm not sure we do the proper thing when we attach random statements to the previous case (or I miss something important here):

   for (auto *c : compoundStmt->body()) {
      if (auto *switchCase = dyn_cast<SwitchCase>(c)) {
        res = buildSwitchCase(*switchCase, condType, caseAttrs);
      } else if (lastCaseBlock) {  // !!! This clause
        // This means it's a random stmt following up a case, just
        // emit it as part of previous known case.
        mlir::OpBuilder::InsertionGuard guardCase(builder);
        builder.setInsertionPointToEnd(lastCaseBlock);
        res = buildStmt(c, /*useCurrentScope=*/!isa<CompoundStmt>(c));
      } else {
        llvm_unreachable("statement doesn't belong to any case region, NYI");
      }

      lastCaseBlock = builder.getBlock();

      if (res.failed())
        break;
    }

The original codegen even doesn't emit any random code as far I can see, and I think we need to do the same. We could extend the current issue example (will refer it as example 1) to make it a little more interesting (example 2):

int g(int x) {
  int y = 1;
  switch (x) {
    case 1:
      return -1;
    {
      y = 42;
      case 2:
        return y;
    }
  }
  return x;
}

The y = 42 statement is not taken in account and g(2) returns exactly 1. So I would say, we need to ignore this statement as well or at least place it in some unreachable scope.

Now, what I would do. I would create a small class for the switch stmt processing with several fields like case attributes, maybe switch variable type - and maintain a pointer to it in the CIRGenFunction.h. Once we enter buildSwitchStmt in the CIR codegen, we would save the current pointer in the temp var, create a new instance of this class, store the pointer into the member field. Once we leave the buildSwitchStmt we restore the pointer. Almost the same is done in the original codegen.

So we will able to handle nested switch statements and we can fix the example 1 .
The second example is less trivial ( and one need to take it into account when will fix the issue, even if it will be me :) ) . But I believe it's doable as well.

@bcardosolopes what do you think? Is it a good/workable solution? Or I miss some details here?

@bcardosolopes
Copy link
Member

This is all about codegen, so I would say the previous work has nothing to do with it.

Right, I wasn't implying that, sorry for the misunderstanding. My initial thinking is that we could probably map these weird inner cases with synthetic gotos/labels, since they violate some scope control-flow.

Speaking about the current issue, it's still not obvious for me how to handle it properly, though I think I can suggest an approach.

Thanks for looking into this.

First of all, I'm not sure we do the proper thing when we attach random statements to the previous case (or I miss something important here):
...
The original codegen even doesn't emit any random code as far I can see, and I think we need to do the same. We could extend the current issue example (will refer it as example 1) to make it a little more interesting (example 2):
..
The y = 42 statement is not taken in account and g(2) returns exactly 1. So I would say, we need to ignore this statement as well or at least place it in some unreachable scope.

If the original codegen is doing that (I haven't checked), I agree we should do the same - It'd probably be nice to maintain the statement for the sake of providing unrecheable diagnostics later on, e.g. even though we ignore the emission, we could emit an empy scope with source loc on a basic block that is unrecheable. But just initially supporting lowering works for me. I also don't see any current warnings for these types of unrecheable statements in clang.

Now, what I would do. I would create a small class for the switch stmt processing with several fields like case attributes, maybe switch variable type - and maintain a pointer to it in the CIRGenFunction.h. Once we enter buildSwitchStmt in the CIR codegen, we would save the current pointer in the temp var, create a new instance of this class, store the pointer into the member field. Once we leave the buildSwitchStmt we restore the pointer. Almost the same is done in the original codegen.

Yep, we currently track some of it already as part of LexicalScopes, we could increment it with more needed info.

So we will able to handle nested switch statements and we can fix the example 1. The second example is less trivial ( and one need to take it into account when will fix the issue, even if it will be me :) ) . But I believe it's doable as well.
@bcardosolopes what do you think? Is it a good/workable solution? Or I miss some details here?

Sounds great to me, thanks for sharing a solution. Do you have any interest to work on this?

@gitoleg
Copy link
Collaborator

gitoleg commented May 16, 2024

@bcardosolopes

Sounds great to me, thanks for sharing a solution. Do you have any interest to work on this?

I do) But I can't promise I'll do it soon. So if someone wants to do it earlier, just let me know, e.g. right here. Otherwise I'll return to this issue later and keep you informed about the progress/problems etc.

@bcardosolopes
Copy link
Member

@wenpen do you want to work on this one?

@bcardosolopes
Copy link
Member

@piggynl this could be a good issue to tackle as well.

@wenpen
Copy link
Contributor

wenpen commented Jun 9, 2024

I'm still working on #528, and I feel these problems should be resolved in a unified solution.
So I need to do some more test, will update my proposal soon when I figure out it.

@wenpen
Copy link
Contributor

wenpen commented Jun 12, 2024

I have a proposal to fix this issue by modifying the definition of SwitchOp

What's the limitation of current SwitchOp?

Current SwitchOp have a list of caseAttr, one caseAttr is corresponding to one region.
But case could be used very flexibly just like goto. consider the following code

switch (x) {
  while (y) {
    if (z) {
      case 1:
      ...
    } else {
      case 2:
      ...
    }
  }
}

The case is contained by WhileOp, SwitchOp should have only one block containing WhileOp, and we can't split it into two regions.

How does the original clang codegen handle switch?

CodeGenFunction class have a pointer member llvm::SwitchInst *SwitchInsn to maintain the current switch information. SwitchInsn will record the case value and destination block, which is similar to SwitchFlatOp. When emit SwitchStmt, it will update the pointer SwitchInsn, and recursively process the body statement by calling EmitStmt(). When emit CaseStmt, it will call SwitchInsn->addCase(...) to store the case value and destination block.

Proposal for the new definition of SwitchOp

As the behavior of switch/case is similar to goto/label, we could refer to the design of goto.

Specifically, add a new op CaseOp to store the case value, then SwitchOp have exactly one region. And we won't handle CaseStmt in buildSwitchStmt, instead we would do the work in buildSimpleStmt.
The above example code would generate cir

cir.scope {
  cir.switch (%1) {
    cir.scope {
      cir.while {
        cir.condition(%2)
      } do {
        cir.scope {
          cir.if %3 {
            cir.case 1
            ...
          } else {
            cir.case 2
            ...
          }
        }
      }
    }
  }
}

In FlattenCFG, we could replace SwitchOp with SwitchFlatOp simply. Just walk in the SwitchOp, collect the case value and (destination) blocks when we find out CaseOp, then we could remove SwitchOp & CaseOp and build SwitchFlatOp.

Besides, it could resolve some other switch related issues, including unreachable code(#520 #521) and incorrect implementation of getOrCreateRetBlock (talked about in #528).

@bcardosolopes
Copy link
Member

bcardosolopes commented Jun 12, 2024

@wenpen thanks for taking the time to propose a holistic solution, very nice writeup.

I think it overall makes sense, the part that bothers me is having a design that prioritize the corner case of the language in detriment of the common use cases, but I think this is hinting at the right direction.

Given your proposal, few questions:

  • How would a "normal" switch use case be represented by this, can you paste few examples?
  • Have you thought about some mixed approach? Examples:
    • As the CaseAttr becomes a CaseOp, perhaps cir.case could optionally have regions to maintain the old behavior, so that cases with underlying scopes could be directly represented. Because new cir.switch will only have one region, this means a "normal" switch use case would be just a sequential number of cir.cases.
    • What's your plan for handling fallthroughs? For breaks, we'll likely need a region within a cir.case to represent it, or do you have alternative ideas?
    • If you look at cir.try_call, it might or might not be within a direct cir.try, doing similar for cir.case/cir.switch makes sense too.

Thanks!

@wenpen
Copy link
Contributor

wenpen commented Jun 14, 2024

How would a "normal" switch use case be represented by this, can you paste few examples?

Here is an example code to discuss the mixed (common and corner) case.

switch(x) {
  statement0;
case 1:
case 2:
  statement1;
  {
    case 3
    statement2;
  }
  break;
case 4:
  statement3;
case 5:
  statement4;
  break;
default:
  statement5;
}
cir.scope {
  cir.switch (%1) {
    statement0

    cir.case (anyof, [1, 2])
    statement1
    cir.scope {
      cir.case (equal 3)
      statement2
      cir.yield
    }
    cir.break
    
    cir.case (equal 4)
    statement3

    cir.case (equal 5)
    statement4
    cir.break

    cir.case (default)
    statement5
    cir.yield
  }
}

As the CaseAttr becomes a CaseOp, perhaps cir.case could optionally have regions to maintain the old behavior, so that cases with underlying scopes could be directly represented. Because new cir.switch will only have one region, this means a "normal" switch use case would be just a sequential number of cir.cases.

Good point! I feel it's a better structure to make cir.case have a region for "normal" use case. So, I'd like to keep the current logic to scan CaseStmt in switch compound body. And the other (who nested in other op) CaseStmt won't have any region.
The cir would be as below, and it will be easy to change cir.case to have exactly one region once we need.

cir.scope {
  cir.switch (%1) {
    statement0
    cir.case (anyof, [1, 2]) {
      statement1
      cir.scope {
        cir.case (equal 3)
        statement2
        cir.yield
      }
      cir.break
    }
    cir.case (4) {
      statement3
      cir.yield
    }
    cir.case (5) {
      statement4
      cir.break
    }
    cir.case (default) {
      statement5
      cir.yield
    }
    cir.yield
  }
}

What's your plan for handling fallthroughs? For breaks, we'll likely need a region within a cir.case to represent it, or do you have alternative ideas?

My thought is same with the current implement: use cir.break and cir.yield for exit and fallthroughs, then replace them with cir.br in FlattenCFG.cpp. The above code show the usage.

If you look at cir.try_call, it might or might not be within a direct cir.try, doing similar for cir.case/cir.switch makes sense too.

Yes, it looks similar. But when I tried some code to see the cir, it crashed. Have created issue 688 for it. Will have another try later.

Thanks for your suggestions~ @bcardosolopes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants