-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dealing with abnormal selection pattern #26
Comments
Copied from #13 (comment): Currently, SABl doesn't test for abnormal selection thoroughly, which leads to incorrect decompilation, e.g. https://github.com/pfalcon/ScratchABlock/blob/master/tests/decomp/_xtos_set_interrupt_handler_arg/_xtos_set_interrupt_handler_arg.lst.exp.clean Generally, there's a gap with true structuredness and C-structuredness, which is more expressive. It may seem that it applies to loops, but it turns out it applies even to For example, following nice-looking C if is only nice-looking thanks to 2 returns:
That however contains abnormal selection. Converted to fully structured shape, it's more verbose and less nicely looking. So, the point is that SABl so far follows "fully structured" way, which means boring splitting nodes to resolve abnormal selection. But to get nice concise C-structuredness, that will later need to be reverted ;-I. Or maybe not, maybe we can detect ladder-style abnormal selection (when one branch is dominated by header, and another - not) and apply multiple-return transformation. It's hard to decide what to do when and even harder to actually do and prove that it's right. So, apparently stick to fully structured way for now. At least that makes it MISRA-compliant decompiler, LOL. |
Below a couple of unsorted thoughts on the topic: From what I've read so far, the current approach to CFG structuring is searching for predefined node configurations and to transform the corresponding CFG regions to high-level structures. If no match has been found, node splitting is performed. Selecting nodes to be split seems to be not trivial. Several algorithms have been proposed in the past. Notably, two papers propose some different approaches to CFG structuring:
Anyway, CFGs of real-world programs often don't satisfy the restriction "single entry, single exit". My recent decompilation target (pure C) has shown that anything more complex than a single loop or a couple of ifs requires some kind of handling of abnormal selection paths. The question is what method should be implemented. I noticed that the common case of abnormal selection path is caused by return statements inside a control structure. We may start by investigating such cases first. I'll try to hack a program for searching for abnormal selections and to sample their CFGs... |
The above mentioned definition seems to be valid for a subset of abnormal selection paths unless I'm missing something. What's about return statements inside of if/else:
The return statement in the code above will be transformed either to an immediate return node (i.e. node without successors) or to a jump to another return node. Both of them introduce abnormal selection paths. |
So, as you know, I didn't look into this for quite a while. So, it's a chance to check if years of previous reading left any mark, or I'm as tabula-rasa as before, and be shamed for talking nonsense. So, my answer is no, the definition above is necessary and sufficient definition of abnormal selection (even more specifically, there're 2 definitions, and they're equivalent).
My response is that nothing's abnormal here, the graph is well-structured.
So, we (well, algorithms) work on normalized graphs, where there's a single entry/single exit indeed. This is semantically equivalent to "immediate return node (i.e. node without successors)". But when it's formulated on single-exit graph, it becomes clear that nothing abnormal is there. |
So, this definitions leads 2 to ways to detect abnormal selection nodes:
Does algo 2 work as expected? Not quite, and we found a bug in the original "equivalent" definition too. Consider a do-while node. So, there's header node, and we normally arrive to it via normal flow, unlabeled edge. Then later we have a condition node which loops back to header. As it's condition, it has labeled edges. So, need to adjust 2nd part of definition: "In other words, it's an if/if-else, which has forward-edges into its "false"/"true" branches, and something else jumps directly into "true" or "false" branch either. Back-edges are loops and are ok." Formulating initial definition in terms of dominance helped us catch that bug - indeed, dominance is quite related to DFS numbering, and thus criterion for being a back-edge. So, comparing these 2 algos with the current code of match_abnormal_sel, shows how naive the latter is ;-). And as it warns, it's context-dependent, while algos above should be context-free, and thus move the entire structuring system to be "finite Church-Rosser". Actually, we can just apply abnormal selection detection in the very beginning and fix up the graph once and for all. Action items: implement both algos, compare the results, be ready to eat my hat ;-). |
Anyway, for now we need to solve this specific issue, so in 5709440 I pushed solution I drafted yet then in January. |
I'll give it a try... |
Algo 1 (with dominators computation) is sketched in daf0f4f
How can I distinguish "normal" edges from "anything else" condition edges in an unprocessed CFG? It seems that only IFs "true" edges have the "cond" property attached while "false" edges don't have "cond". |
For your _xtos_set_interrupt_handler_arg example, a better structuring strategy seems to emit multiply return statements:
That's what I'd do as human being. The two conditional assignments to $a2 have been converted to the ternary operator; the identical return statements in the both if/else branches have been moved out and collapsed by means of common subexpression elimination. This produces a much better code. The funny part is that code now closely resembles its original 😃 What transformation in SABl funnels all return nodes to a single one? Can that be reverted later to produce a nicer output? |
Really? That sucks, because I thought they have "cond", just set to None. Looking at parser.py, what you says seems to be true. Well, could add a pass to make it like that, then of course would need to make sure it doesn't break anything. (It shouldn't, because "cond" should be queried as |
Sure, that's quite obvious. (Except should be configurable. Or specifically, should prefer multiple return's to goto's, but should allow to prefer multiple returns or single exit otherwise.)
So, I did my part - different transformations are now properly (almost!) categorized to allow to look up them easily. Now your part to eyeball them to know which exist, and know answers to questions like above.
Yep, it needs to be. Just need to figure out when (also mostly obvious) and how (match them afterwards). But before that, needs to refactor the stuff, which I'm doing. And fix various goofs, which are many, e.g. 8ad0352. Whole DFS numbering is hosed (and not only in my code, in the literature many obvious-to-mention things are skipped which gives an impression of "well-kept secret" to them). (Of course another explanation is that it's so trivial that everyone understands it reading "between the lines", except for stupid me, lol.) |
Oops! I'll take a look in my books tonight. I'm eager to see how it was implemented elsewhere. Some books provide detailed algos as pseudo code... |
Algos is not everything. Overall concept description (i.e. how different concepts mix and match (and don't mix)) is what important. Ok, if you're eager to look in books, here's quiz for you: what's common (and uncommon) between DFS number and preorder and postorder traversals? |
What do you mean with "DFS number"? Iterating over the graph nodes and assigning an increasing unique number to each node when it's first visited? |
Well, I would mostly mean the same what others mean (google up "dfs number"?), modulo misunderstandings and goofs on my side. But the latter largely depends on how others define it, and not on its own, but in the constellation of related concepts. Again, the question is "How DFS number and preorder and postorder traversals of a graph relate, if in any way at all?". I don't see that question be clearly answered in majority of texts. Take as example 1997 Muchnick - in his presentation those are 3 notions, without clear definition of their exact relation.
|
Anyway, quiz time is over, fix time is on. The whole story is here: 14b4344 |
G@0gle search doesn't reveal anything useful or even related to graphs, hence the question. Moreover, I don't remember seeing the term "dfs number" anywhere else except some tutorials on DFS where each graph node has a number assigned to it, mainly for a better visualization and referencing. If "DFS number is the order in which the vertices are explored" (a definition found somewhere in the web), then the answer to the question "How DFS number and preorder and postorder traversals of a graph relate, if in any way at all?" is simple: DFS number is just another name for pre-order. Regarding the preorder / postorder traversals of a digraph, Muchnick's book gives two definitions: preorder traversal of an acyclic digraph is a traversal in which each node is processed before its descendants while postorder traversal of the same graph is in which each node is processed after its descendants. While this theoretical definition looks fine and dandy, I've learned elsewhere that there are at least two major problems with DFS traversals over digraphs:
|
Well, once you know the right answer (and the right answer went in the commit linked), it's always easy to lay any theory behind it ;-). The matter is again, to be able to navigate the ocean of concepts and be able to to tell how some of them relate and how some differ. And it's easy to mix up things. And so far I didn't meet someone who can unmix up them in "real time" (i.e. max in days' worth of time after being made, not months' or years').
A note from someone who's sharpening his eye: they never use terms like "children" for arbitrary graph. There, we have "predecessors" and "successors". "Children"/"descendants" are reserved for tree-like structures (e.g. spanning tree of a graph).
Good note. That's of course why SABl so far had only postorder numbers. That's also why I initially wanted to rush to add them on detecting the goof discussed, but decided against (again, so far), to avoid sweeping changes on the tests. It's of course used too (e.g. edge classification), and will be added on real demand. Sweeping changes was always a major concern to me. I now try to downprioritize it - there're so many things to change, many bugs to fix, and even their number of goofs to facepalm thru. And the testsuite is at something of it now, so refactoring is easy. (And so, more sweeping changes to come.) |
Reference: https://academic.oup.com/comjnl/article-pdf/20/1/45/975771/200045.pdf
Abnormal selection pattern is an
if
/if-else
pattern, where branch(es) are not dominated by the pattern header. In other words, it's an if/if-else, where something else jumps directly into "true" or "false" branch, besides the "if" condition checking code itself.In the real code, this pattern occurs quite often, because it's pretty low-hanging optimization for code like:
It's only natural to transform it to:
The text was updated successfully, but these errors were encountered: