New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate "match-like" switch statements #65

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
2 participants
@fanzier
Copy link
Contributor

fanzier commented Sep 14, 2016

This PR implements support for the most common kind of switch statements, namely those that correspond to a match expression in Rust.

Before merging, please consider these things that I'm not sure about:

  • I made the pattern types polymorphic in the expression type. This made the implementation easier since we can store patterns of CExprs and Rust.Exprs. This isn't done anywhere else in the code tho. Thoughts?
  • translatePattern would be simply a traverse if the pattern type implemented Traversable. This would require the DeriveTraversable, another GHC extension.
  • Most importantly, the C standard says we should apply integer promotions (if I understood it correctly). This is not implemented. But are conversions even possible in Rust patterns?

The integer promotions are performed on the controlling expression. The constant expression in each case label is converted to the promoted type of the controlling expression. If a converted value matches that of the promoted controlling expression, control jumps to the statement following the matched case label. Otherwise, if there is a default label, control jumps to the labeled statement. If no converted case constant expression matches and there is no default label, no part of the switch body is executed.

@jameysharp

This comment has been minimized.

Copy link
Owner

jameysharp commented Sep 26, 2016

I'm feeling torn about this patch.

I think you've done an excellent job identifying the right constraints for a purely syntax-directed translation of switch statements, and the subset we can handle using your approach is broad enough to be pretty useful. I've seen lots of cases in musl-libc and elsewhere that should be fixed by your patch. So, nice work!

On the other hand, this approach requires a relatively complex implementation, since it has to check all the constraints. If we address issue #30, to handle goto statements, then the resulting control-flow graph analysis subsumes all of this, and the switch-specific code becomes almost trivial. Of course, the CFG analysis is complicated too, but it will handle more cases.

I'm pretty sure I should merge your pull request, with the understanding that eventually we'll replace most of it with the CFG analysis for #30, because I'm not sure when that'll happen and meanwhile your patch is usable right now. 😄

Given that conclusion, let's talk about details!

There's a semantic mismatch between C's case labels and Rust's match arms, that I hadn't noticed before trying to understand your question about integer promotion.

  • A case label is an "integer constant expression" (C99 6.8.4.2 paragraph 3), as defined in C99 6.6.
  • Rust pattern syntax, on the other hand, consists of "some combination of literals, destructured arrays or enum constructors, structs and tuples, variable binding specifications, wildcards (..), and placeholders (_)."

Patterns are not general expressions; you can't pattern-match on 3+2 like you can in C. This is why the integer-promotion thing is confusing: type-casts are another kind of expression that aren't allowed in match patterns.

I'm thinking that, for the moment, the most general option is to not generate match statements, but generate if/else ladders instead. That's annoying because you may have to introduce a temporary variable to avoid evaluating the controlling expression multiple times, but so it goes. So a statement like this:

switch(f(x)) {
case (short) 1:
case 5...10:
    return g(x);
default:
    return h(x);
case 3 + 9:
    return i(x);
}

Might translate to something like this:

{
    let tmp_ : i32 = f(x) as i32;
    if tmp_ == (1i32 as (i16) as (i32)) || (tmp_ >= 5i32 && tmp_ <= 10i32) {
        return g(x);
    } else if tmp_ == 3i32 + 9i32 {
        return i(x);
    } else {
        return h(x);
    }
}

If you adopt that approach, you can revert the changes to AST.hs, which don't reflect Rust's actual syntax very well—as noted above.

How does all that sound to you?

@jameysharp jameysharp referenced this pull request Oct 28, 2016

Closed

implement goto in C #30

@jameysharp

This comment has been minimized.

Copy link
Owner

jameysharp commented Oct 28, 2016

I've pushed a first cut at generating control-flow graphs to a new cfg branch. Commit 5fbe21c:

  • demonstrates constructing a CFG from the language-c AST,
  • demonstrates pattern-matching and rewriting that CFG for simplifications and other transformations,
  • and provides corrode-cfg, a command-line tool to inspect the CFG produced for a given C source file.

My implementation is not up to my preferred standard (it has zero documentation, for one thing) but I wanted to get something out for folks to play with.

I'd love it if you'd write a pull request that extends walkStatement in src/Language/Rust/Corrode/CFG/C.hs to handle the CSwitch, CCase, and CDefault AST constructors, and optionally CCases as well. This should be easier than what you already did, because:

  1. You don't have to check whether it's "match-like", since we'll handle that later when we convert the CFG to Rust control flow. I think you'll be able to translate any switch statement to a CFG without too much trouble, including nested cases (as in Duff's device) and fall-through cases.
  2. You don't have to group adjacent case labels together, because a CFG simplification pass that I already wrote will do that for you automatically.

Given the way I'm currently structuring the CFG, you'll have to construct an if-else ladder that tests each case sequentially, and jumps to the default label as the final else-branch. (That means you'll create two basic blocks for every case label: one to contain the code following the label, and another to test whether that's the right block to jump to.) Conveniently, constructing CExprs of the form discriminant == case will cause Corrode to do the usual arithmetic conversions, including integer promotion, so you won't need to worry about that aspect of the C standard.

I recommend using a Writer monad to write out case expressions and the labels they should branch to as you encounter them, then use listen/censor when translating the switch statement to collect all the cases. Since there's already a Reader monad in CSourceCFGT, and handling goto statements will require a State monad, you should just replace the Reader monad with a combined RWS monad.

Once we have switch statements converted to CFG form (and once the CFG branch is more complete), you could work on recognizing match-like control flow in the CFG so we don't have to generate ugly if-else ladders most of the time, but for now just the above would be great.

@fanzier

This comment has been minimized.

Copy link
Contributor Author

fanzier commented Oct 28, 2016

Cool, the cfg branch sounds exciting! :) I'd love to work on that but I don't have a lot of time at the moment. But I'll try!

[Also, I'm sorry I didn't reply to your comment before. I had a lot of things to take care of and forgot about this PR...]

@jameysharp

This comment has been minimized.

Copy link
Owner

jameysharp commented Nov 5, 2016

In case somebody gets to implementing switch before I do: In commit a834b82 on the cfg branch, I've integrated the control-flow graph logic into the main C.md, instead of the separate CFG/C.hs module I had it in before. You should be able to add support for case/default labels and CSwitch statements to interpretStatement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment