New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve switch statement decompilation #1258

Merged
merged 11 commits into from Oct 7, 2018

Conversation

Projects
None yet
3 participants
@Chicken-Bones
Contributor

Chicken-Bones commented Aug 14, 2018

This PR contains a number of commits aimed at improving code quality surrounding [potential] switch statements. It primarily addresses two issues currently producing unnecessary gotos.

SwitchDetection.UseCSharpSwitch is over-aggressive, resulting in the replacement of condition trees with much poorer looking switch statements. A number of heuristics are added to better discern which representation to use.

LoopDetection.DetectSwitchBody, FindExitPoint and associated loop partitioning code is unaware of the interaction between switch statements and continue resulting in poor exit point selection when cases can leave the switch body via a branch to a continue block

The test cases added with each commit are the best place to look for a concise before/after. Unfortunately the number of test cases required to ensure correctness would be massive, so we have to resort to inspecting the round trip tests.

The only switch related gotos remaining in the round trip tests are in NRefactory.MonoCSharp/Tokenizer|Convert|MemberCore and Newtonsoft.Json.Utilities/ConvertUtils, all containing "goto case" statements. I have a solution based on condition tree flow analysis, but it is comparatively massive for addressing such a small target, so I've left it for future review.

I'd be happy to discuss wider design decision via PM or some form of instant messaging if this is a lot to review.

Edit: I am unsatisfied with the quality of the LoopDetection changes, but decided to leave invasing refactoring to the maintainers.

@christophwille

This comment has been minimized.

Show comment
Hide comment
@christophwille

christophwille Aug 14, 2018

Member

Please note that @siegfriedpammer is on vacation so you might not get technical feedback for some time. Just wanted to let you know so you don't feel "ignored".

Member

christophwille commented Aug 14, 2018

Please note that @siegfriedpammer is on vacation so you might not get technical feedback for some time. Just wanted to let you know so you don't feel "ignored".

@@ -230,15 +231,36 @@ bool MatchSwitchVar(ILInstruction inst)
return inst.MatchLdLoc(out switchVar);
}
bool MatchSwitchVar(ILInstruction inst, out long sub)
{
if (inst.MatchBinaryNumericInstruction(BinaryNumericOperator.Sub, out var left, out var right) && right.MatchLdcI(out sub))

This comment has been minimized.

@dgrunwald

dgrunwald Sep 1, 2018

Member

I'm assuming Roslyn uses a plain sub here? You should also check the overflow flag, as-is you're also matching sub.ovf/sub.ovf.un.

@dgrunwald

dgrunwald Sep 1, 2018

Member

I'm assuming Roslyn uses a plain sub here? You should also check the overflow flag, as-is you're also matching sub.ovf/sub.ovf.un.

// if (comp(V OP val))
trueValues = MakeSetWhereComparisonIsTrue(comp.Kind, val, comp.Sign);
if (sub != 0)
trueValues = new LongSet(trueValues.Intervals.Select(i => ShiftInterval(i, sub)));

This comment has been minimized.

@dgrunwald

dgrunwald Sep 1, 2018

Member

The ShiftInterval logic here doesn't seem quite right to me. Is there any reason you did not use trueValues.AddOffset(sub)?
AddOffset should be a better match for the wrap-around semantics of sub. (see also: SwitchAnalysis.AnalyzeSwitch())

@dgrunwald

dgrunwald Sep 1, 2018

Member

The ShiftInterval logic here doesn't seem quite right to me. Is there any reason you did not use trueValues.AddOffset(sub)?
AddOffset should be a better match for the wrap-around semantics of sub. (see also: SwitchAnalysis.AnalyzeSwitch())

This comment has been minimized.

@Chicken-Bones

Chicken-Bones Sep 2, 2018

Contributor

Roslyn always uses unsigned casts with the sub, but I did just miss AddOffset

@Chicken-Bones

Chicken-Bones Sep 2, 2018

Contributor

Roslyn always uses unsigned casts with the sub, but I did just miss AddOffset

/// <summary>
/// Returns the children in a loop dominator tree, with an optional exit point
/// Avoids returning continue statements when analysing switches (because increment blocks can be dominated)

This comment has been minimized.

@dgrunwald

dgrunwald Sep 1, 2018

Member

This function isn't returning statements, but control flow nodes. I guess Avoids returning continue statements should be Avoids returning the target block of continue statements?
After all, continue; is fine within a switch, we just don't want to move the increment block itself.

@dgrunwald

dgrunwald Sep 1, 2018

Member

This function isn't returning statements, but control flow nodes. I guess Avoids returning continue statements should be Avoids returning the target block of continue statements?
After all, continue; is fine within a switch, we just don't want to move the increment block itself.

This comment has been minimized.

@Chicken-Bones

Chicken-Bones Sep 2, 2018

Contributor

Should be changed to Avoids returning continue statement target nodes (increment blocks)

@Chicken-Bones

Chicken-Bones Sep 2, 2018

Contributor

Should be changed to Avoids returning continue statement target nodes (increment blocks)

@Chicken-Bones

This comment has been minimized.

Show comment
Hide comment
@Chicken-Bones

Chicken-Bones Sep 3, 2018

Contributor

Awaiting further feedback before adding a fixup commit. Any idea when siegfriedpammer will be back?

Contributor

Chicken-Bones commented Sep 3, 2018

Awaiting further feedback before adding a fixup commit. Any idea when siegfriedpammer will be back?

@dgrunwald

This comment has been minimized.

Show comment
Hide comment
@dgrunwald

dgrunwald Oct 3, 2018

Member

I'm not quite sure whether switch body detection is even in the right place (talking about the pass order).
I put it together with loop detection mostly because I wanted to reuse the logic for finding the body blocks.
Maybe it really ought to be somewhere in the big interleaved transform step instead.
So I thought about this for a while:

Fundamental differences between the C# control flow constructs:

  • if etc. has single entry-point, and a single "natural" exit point (allowing additional exits via goto/break/continue/return),
    but don't provide targets for break/continue.
  • loops have single entry-point, single-exit point (allowing additional exits only via goto/return).
    That loop exit point also serves as target for break.
    The target for continue is more complicated.
  • switch has single entry-point, single-exit point (allowing additional exits via goto/continue/return).
    The switch exit point also serves as target for break.

This means:

  • finding a nice exit point for "if" requires knowing about break/continue, so that those can be accepted while goto is avoided.
    This means ConditionDetection should run after loop and switch bodies are detected.
  • finding a nice exit point for "switch" requires knowing about continue, so that those can be accepted while goto is avoided.
  • finding a nice exit point for loops doesn't require anything knowledge about anything but return statements, so it's safe to do loops first.

Unfortunately, the "continue" business is tricky and depends on the loop type.
In particular, for "for" loops, only some types of C# statements are allowed as increment statement.
So we need to run all our statement transforms before we can tell whether something is a valid "increment block".

Basically, we gain the information about what is a valid expression/statement (for if conditions and increment-blocks) in a bottom-up manner.
But we also need to know about break/continue targets during this bottom-up pass constructing if statement (basically passing the "current break/continue target" in a top-down manner).

When there's only if and loops in play, there's an easy approach: first find the loops,
then use the bottom-up-construction for expressions/statements and structured control flow with if.
Though even here, we'd need to take care that candidates for increment blocks are visited before any potential continue-statement is -- otherwise it's possible that an increment block isn't detected as such (e.g. because it contains not-yet-lifted nullable operations) by the time ConditionDetection runs, thus does not consider a branch to the increment block as "continue" and ends up picking the wrong exit point. We don't do this yet, but it's at least plausible that we could, with some additional care in the order we visit the control flow blocks.

But switch adds significant complexity: the "break" part of switch should work as in loops (thus needing the currently implemented approach), while the "continue" part of switch needs a bottom-up approach similar to how "if" is currently handled. I don't see any way to satisfy both parts at once. Since "continue" is somewhat rare within switch, that means we should keep treating switch similar to loops.

So I think your changes here are the right way to go. It feels like a hack to use HighLevelLoopTransform.GetIncrementBlock() at this point where it isn't reliable yet because the increment block isn't transformed; but I think the reasoning above explains that it is a pretty much unavoidable hack.
Fortunately most for loops have a pretty simple i++ increment block, so it's pretty difficult to actually hit this issue with our pass ordering. You'd need a loop with non-trivial increment block containing a switch containing a continue; statement. And unless I'm mistaken, any false positives or false negatives in IsIncrementBlock() will just result in redundant gotos.

Member

dgrunwald commented Oct 3, 2018

I'm not quite sure whether switch body detection is even in the right place (talking about the pass order).
I put it together with loop detection mostly because I wanted to reuse the logic for finding the body blocks.
Maybe it really ought to be somewhere in the big interleaved transform step instead.
So I thought about this for a while:

Fundamental differences between the C# control flow constructs:

  • if etc. has single entry-point, and a single "natural" exit point (allowing additional exits via goto/break/continue/return),
    but don't provide targets for break/continue.
  • loops have single entry-point, single-exit point (allowing additional exits only via goto/return).
    That loop exit point also serves as target for break.
    The target for continue is more complicated.
  • switch has single entry-point, single-exit point (allowing additional exits via goto/continue/return).
    The switch exit point also serves as target for break.

This means:

  • finding a nice exit point for "if" requires knowing about break/continue, so that those can be accepted while goto is avoided.
    This means ConditionDetection should run after loop and switch bodies are detected.
  • finding a nice exit point for "switch" requires knowing about continue, so that those can be accepted while goto is avoided.
  • finding a nice exit point for loops doesn't require anything knowledge about anything but return statements, so it's safe to do loops first.

Unfortunately, the "continue" business is tricky and depends on the loop type.
In particular, for "for" loops, only some types of C# statements are allowed as increment statement.
So we need to run all our statement transforms before we can tell whether something is a valid "increment block".

Basically, we gain the information about what is a valid expression/statement (for if conditions and increment-blocks) in a bottom-up manner.
But we also need to know about break/continue targets during this bottom-up pass constructing if statement (basically passing the "current break/continue target" in a top-down manner).

When there's only if and loops in play, there's an easy approach: first find the loops,
then use the bottom-up-construction for expressions/statements and structured control flow with if.
Though even here, we'd need to take care that candidates for increment blocks are visited before any potential continue-statement is -- otherwise it's possible that an increment block isn't detected as such (e.g. because it contains not-yet-lifted nullable operations) by the time ConditionDetection runs, thus does not consider a branch to the increment block as "continue" and ends up picking the wrong exit point. We don't do this yet, but it's at least plausible that we could, with some additional care in the order we visit the control flow blocks.

But switch adds significant complexity: the "break" part of switch should work as in loops (thus needing the currently implemented approach), while the "continue" part of switch needs a bottom-up approach similar to how "if" is currently handled. I don't see any way to satisfy both parts at once. Since "continue" is somewhat rare within switch, that means we should keep treating switch similar to loops.

So I think your changes here are the right way to go. It feels like a hack to use HighLevelLoopTransform.GetIncrementBlock() at this point where it isn't reliable yet because the increment block isn't transformed; but I think the reasoning above explains that it is a pretty much unavoidable hack.
Fortunately most for loops have a pretty simple i++ increment block, so it's pretty difficult to actually hit this issue with our pass ordering. You'd need a loop with non-trivial increment block containing a switch containing a continue; statement. And unless I'm mistaken, any false positives or false negatives in IsIncrementBlock() will just result in redundant gotos.

private readonly IDictionary<ControlFlowNode, int> continueDepth = new Dictionary<ControlFlowNode, int>();
public LoopContext(ControlFlowGraph cfg, ControlFlowNode contextNode)
{

This comment has been minimized.

@dgrunwald

dgrunwald Oct 3, 2018

Member

Can you comment some more on what this constructor is supposed to be doing?
Why is it analyzing the control flow reachable from the switch head, but stopping at loop heads?

Is it trying to follow the control flow up from the switch through the loop back-edge (a continue; statement) to the loop head, thus discovering the outer loop containing the switch?

If so, doesn't that fail for the following code?

while (true) {
  switch (i) {
     case 0:
        while (i < 10) i++;
        continue;
     default:
        return;
  }
}

The outer loop head is not reachable from the switch without going through an inner loop head, but your traversal stops at the inner loop head.

@dgrunwald

dgrunwald Oct 3, 2018

Member

Can you comment some more on what this constructor is supposed to be doing?
Why is it analyzing the control flow reachable from the switch head, but stopping at loop heads?

Is it trying to follow the control flow up from the switch through the loop back-edge (a continue; statement) to the loop head, thus discovering the outer loop containing the switch?

If so, doesn't that fail for the following code?

while (true) {
  switch (i) {
     case 0:
        while (i < 10) i++;
        continue;
     default:
        return;
  }
}

The outer loop head is not reachable from the switch without going through an inner loop head, but your traversal stops at the inner loop head.

This comment has been minimized.

@dgrunwald

dgrunwald Oct 3, 2018

Member

Oh, n.Dominates(contextNode) is not the typical loop-head check. My example is no problem, since there's no loop head detected for the inner loop.
Still not quite sure which loop heads this constructor is supposed to find. All loop heads reachable from contextNode that also dominate the contextNode? It doesn't quite do that, though at least it should usually find the head of the loop that a continue; would target.

A different approach here would be to just iterate through the nodes dominating contextNode, and check if any of them is a loop head:

for (ControlFlowNode n = contextNode.ImmediateDominator; n != null; n = n.ImmediateDominator) {
   if (n.Predecessors.Any(p => n.Dominates(p))) {
      loopHeads.Add(n);
   }
}

This should give you the heads of all loops containing contextNode, already ordered from innermost to outermost.

@dgrunwald

dgrunwald Oct 3, 2018

Member

Oh, n.Dominates(contextNode) is not the typical loop-head check. My example is no problem, since there's no loop head detected for the inner loop.
Still not quite sure which loop heads this constructor is supposed to find. All loop heads reachable from contextNode that also dominate the contextNode? It doesn't quite do that, though at least it should usually find the head of the loop that a continue; would target.

A different approach here would be to just iterate through the nodes dominating contextNode, and check if any of them is a loop head:

for (ControlFlowNode n = contextNode.ImmediateDominator; n != null; n = n.ImmediateDominator) {
   if (n.Predecessors.Any(p => n.Dominates(p))) {
      loopHeads.Add(n);
   }
}

This should give you the heads of all loops containing contextNode, already ordered from innermost to outermost.

This comment has been minimized.

@Chicken-Bones

Chicken-Bones Oct 3, 2018

Contributor

"heads of all loops containing contextNode" is the goal yes, so that continues (and their depth) can be identified.
I believe the example you've given will find loops which dominate the switch but don't contain it. My approach also ensures there's a back edge to the loop head from the switch's successor tree.

@Chicken-Bones

Chicken-Bones Oct 3, 2018

Contributor

"heads of all loops containing contextNode" is the goal yes, so that continues (and their depth) can be identified.
I believe the example you've given will find loops which dominate the switch but don't contain it. My approach also ensures there's a back edge to the loop head from the switch's successor tree.

This comment has been minimized.

@dgrunwald

dgrunwald Oct 4, 2018

Member

The problem is that we're running at a point where loop bodies haven't been detected yet. "loops containing contextNode" is not quite defined yet.
Code dominated by a loop head could end up in the loop, even if there's no back-edge:

outer_loop_head:
while (...) {
    while (...) {
       if (...) {
         switch (...) { ... goto outer_loop_head; }
         return;
       }
   }
}

But such a block dominated by the loop without back-edge could also end up outside the loop -- ultimately it's up to loop detection's decision.
My code snipper approximates "heads of all loops containing contextNode" by computing "heads of all loops dominating contextNode" (thus giving an upper bound). Your code seems to give a lower bound instead, so really the question is: which type of error is better here, false positive or false negative?

Maybe we could change the pass ordering to detect loops before we start dealing with switches? I don't see a reason why SwitchDetection needs to run before LoopDetection; and loops->switches->ifs seems like it would cause less issues.

@dgrunwald

dgrunwald Oct 4, 2018

Member

The problem is that we're running at a point where loop bodies haven't been detected yet. "loops containing contextNode" is not quite defined yet.
Code dominated by a loop head could end up in the loop, even if there's no back-edge:

outer_loop_head:
while (...) {
    while (...) {
       if (...) {
         switch (...) { ... goto outer_loop_head; }
         return;
       }
   }
}

But such a block dominated by the loop without back-edge could also end up outside the loop -- ultimately it's up to loop detection's decision.
My code snipper approximates "heads of all loops containing contextNode" by computing "heads of all loops dominating contextNode" (thus giving an upper bound). Your code seems to give a lower bound instead, so really the question is: which type of error is better here, false positive or false negative?

Maybe we could change the pass ordering to detect loops before we start dealing with switches? I don't see a reason why SwitchDetection needs to run before LoopDetection; and loops->switches->ifs seems like it would cause less issues.

This comment has been minimized.

@Chicken-Bones

Chicken-Bones Oct 4, 2018

Contributor

The key is that if there's no back edge, then there's no continue statement, so you'll never encounter the false negative anyway.

@Chicken-Bones

Chicken-Bones Oct 4, 2018

Contributor

The key is that if there's no back edge, then there's no continue statement, so you'll never encounter the false negative anyway.

// match do-while condition
if (pred.Successors.Count <= 2) {
if (HighLevelLoopTransform.MatchDoWhileConditionBlock((Block)pred.UserData, out var t1, out var t2) &&

This comment has been minimized.

@dgrunwald

dgrunwald Oct 3, 2018

Member

This is quite unreliable here; any non-trivial loop condition might take up multiple blocks at this point in time.
The "do-while condition block" doesn't exist until much later in the transformation pipeline. Not sure what could be done about that...

@dgrunwald

dgrunwald Oct 3, 2018

Member

This is quite unreliable here; any non-trivial loop condition might take up multiple blocks at this point in time.
The "do-while condition block" doesn't exist until much later in the transformation pipeline. Not sure what could be done about that...

@Chicken-Bones

This comment has been minimized.

Show comment
Hide comment
@Chicken-Bones

Chicken-Bones Oct 3, 2018

Contributor

Good analysis. I wasn't able to achieve a provably correct or satisfactorily general approach here like I was with ConditionDetection, but I'm convinced such isn't possible with the current architecture, as you've mentioned. Particularly the fact that switch statements are often optimised to condition trees meant I had to build prediction heuristics. This implementation still provides great improvement in the vast majority of cases.

Proper detection would rely on two things.

  1. outer loops being constructed first (currently inner loops and switches must be constructed first due to some ambiguity in loop contents but there may be another way around this)
  2. high level loop transforms (and any pre-requisite transforms) being performed on the outer loop before anything that depends on continue statements (condition/switch detection)
Contributor

Chicken-Bones commented Oct 3, 2018

Good analysis. I wasn't able to achieve a provably correct or satisfactorily general approach here like I was with ConditionDetection, but I'm convinced such isn't possible with the current architecture, as you've mentioned. Particularly the fact that switch statements are often optimised to condition trees meant I had to build prediction heuristics. This implementation still provides great improvement in the vast majority of cases.

Proper detection would rely on two things.

  1. outer loops being constructed first (currently inner loops and switches must be constructed first due to some ambiguity in loop contents but there may be another way around this)
  2. high level loop transforms (and any pre-requisite transforms) being performed on the outer loop before anything that depends on continue statements (condition/switch detection)
@dgrunwald

This comment has been minimized.

Show comment
Hide comment
@dgrunwald

dgrunwald Oct 4, 2018

Member

As a next step, I think we should try changing up the pass order to be:

  • Loop(Body)Detection
  • Switch(Head)Detection
  • the other switch-head-detecting transforms
  • SwitchBodyDetection

The guesswork about the "continue target block" seems unavoidable, but at least we get to avoid the guesswork about which loops contain the switch.
Maybe we could also run switch-body-detection within the pass that detects the switch head -- so instead of the SwitchUsesGoto() heuristic, we could ask "does switch-body-detection find a single unambiguous exit point?", and abort constructing the switch head if it doesn't.

Member

dgrunwald commented Oct 4, 2018

As a next step, I think we should try changing up the pass order to be:

  • Loop(Body)Detection
  • Switch(Head)Detection
  • the other switch-head-detecting transforms
  • SwitchBodyDetection

The guesswork about the "continue target block" seems unavoidable, but at least we get to avoid the guesswork about which loops contain the switch.
Maybe we could also run switch-body-detection within the pass that detects the switch head -- so instead of the SwitchUsesGoto() heuristic, we could ask "does switch-body-detection find a single unambiguous exit point?", and abort constructing the switch head if it doesn't.

@Chicken-Bones

This comment has been minimized.

Show comment
Hide comment
@Chicken-Bones

Chicken-Bones Oct 4, 2018

Contributor

In general, I support said changing of pass order. The only issue I ran into when I attempted it was related to loops (and switch containers) being constructed inner first #915 Part of the switch body would end up outside the loop, and have to be poached from the parent container into the child. I didn't go any further with this because control flow graphs are restricted to container context, and I got quite good results with the current approach.

Contributor

Chicken-Bones commented Oct 4, 2018

In general, I support said changing of pass order. The only issue I ran into when I attempted it was related to loops (and switch containers) being constructed inner first #915 Part of the switch body would end up outside the loop, and have to be poached from the parent container into the child. I didn't go any further with this because control flow graphs are restricted to container context, and I got quite good results with the current approach.

@dgrunwald

This comment has been minimized.

Show comment
Hide comment
@dgrunwald

dgrunwald Oct 7, 2018

Member

Well, let's keep future improvements for the future.

I'm merging the branch as-is. It's already a huge improvement, no need to wait for the perfect solution.

Member

dgrunwald commented Oct 7, 2018

Well, let's keep future improvements for the future.

I'm merging the branch as-is. It's already a huge improvement, no need to wait for the perfect solution.

@dgrunwald dgrunwald merged commit 481e05e into icsharpcode:master Oct 7, 2018

1 check passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment