-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structuring switch statements #13
Comments
Some quick replies:
My attitude towards that was: "well, I'm ok with having just if/if else chains" ;-).
I believe I read up on that somewhere, and it makes good sense. But in all fairness, I don't remember ever seeing such a code - for balanced binary tree. Well, maybe because it's indeed spaghetti to human eye, and I didn't recognize an optimize search tree in cases I've seen.
The reason for that would be that you're doing something wrong ;-). And the usual way to resolve it would be: 1) make a sample file with by minifying number of branches to process in it; 2) remove other unimportant code from it; 3) NEW! (of the last month) contribute as a testcase; 4) patiently work towards resolving issues with it. Re: 3), I've added f15358b which might address some concerns with contributing real-world code as testcases. It's quite adhoc, and btw, I recommend to generate PseudoC files with addresses prefixed to each instruction anyway (or it's pretty hard to review stages of processing).
Yes, what should be structured first, should be structured first. SABl currently does it in a typical monte-carlo way, which is neither optimal nor ideal. But that "allows" one to tighten up matching patterns (like a case you witnessed, where While matcher matched completely unrelated node and created broken graph, I bet you have a similar case). Refactoring that to a proper post-order matching is in my todo list, which requires first growing test corpus. |
Sure, that's how it should be done. No idea if you can find such nodes all at once, and then convert all at once, or would collapse one by one into Switch node. Majority of current structuring passes are such incremental (I believe I can put it that way). I'd also suggest to start with handling just "proper structural switch", without C hacks like missing break's. |
Btw, I played once with handling jumptables, yeah (was quite adhoc). |
I suppose that the output diff should contain the desired |
Looks simple enough to be grokkable by a human. Please submit as a PR.
No-no, the idea is that testsuite always passes and enables fear-free refactoring. So, initial submission should contain whatever master produces, and it should be updated as master updates (hopefully with improvements to output, not regressions). |
Btw, @maximumspatium, didn't you want to improve/elaborate .dot CFG output? ;-) (just trying to shove more work items from my queue to yours ;-) ). |
@maximumspatium : Continuing from #19 (comment), did you see: https://github.com/pfalcon/ScratchABlock/blob/master/decomp.py#L176 ? That should look familiar, and can be summarized as: SABl already contains passes to structure sequences of if statements as a switch like statement ("well, I'm ok with having just if/if else chains"). Exception they work with "linear" sequences, not trees. New pass(es) you write would thus subsume that processing (error in #19 (comment) hints about that). Of course, in my list, first my passes should be "recovered" to work as expected (they did once upon a time). The testcase for that is https://github.com/pfalcon/ScratchABlock/tree/master/tests/decomp/030-fun_bfffd0eb |
@maximumspatium, When dealing with if's, pay attention to abnormal selection patterns. (Google |
Generally, there's a gap with true structuredness and C-structuredness, which is more expressive. It may seem that it applies to loops, but it turns out it applies even to For example, following nice-looking C if is only nice-looking thanks to 2 returns:
That however contains abnormal selection. Converted to fully structured shape, it's more verbose and less nicely looking. So, the point is that SABl so far follows "fully structured" way, which means boring splitting nodes to resolve abnormal selection. But to get nice concise C-structuredness, that will later need to be reverted ;-I. Or maybe not, maybe we can detect ladder-style abnormal selection (when one branch is dominated by header, and another - not) and apply multiple-return transformation. It's hard to decide what to do when and even harder to actually do and prove that it's right. So, apparently stick to fully structured way for now. At least that makes it MISRA-compliant decompiler, LOL. |
Hello Paul,
I recently picked up another real-world code snippet that makes use of the
switch
control operator.You'll find its Pseudo-C code here.
Recovering switch statements is another big task, so I decided to start somewhere and to conduct a couple of experiments.
Compiler usually employ at least the following two strategies in order to convert (aka lower) high-level
switch
blocks into assembly:The above mentioned snippet utilizes the former strategy. Although SABl does currently emit the correct C-output (what's the good thing you can be proud of!), humans will have hard time understanding that
if-else
spaghetti.Because I wasn't able to find any suitable algorithm in the (de)compilation literature I tried to code up a recognition of such structures. Here is what I got so far.
Although I'm now getting all switch-related CFG nodes recognized and classified properly, I have no success trying to collapse all these nodes into a generic switch control class. I replace the header node with Switch(BBlock), remove all related nodes from CFG and then connect the header node with the switch successor node. This unfortunately seems to screw up the graph.
It looks like I'd need to implement another collapsing strategy because switch cases can contain other nested control structures to be structured first.
I'm thinking about removing all search nodes (i.e. the nodes required for binary search) and construct a n-way node branching to the
case
nodes. That would be probably the result of converting a jump table into CFG.Do you have any suggestions on how to proceed?
The text was updated successfully, but these errors were encountered: