-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loop decompilation not supported? #11
Comments
Ok, I see ;-) :
|
Yup, the "out of ssa" code is not written yet :) |
I implemented a simple ssa back transform with copy assignments in bc990c0. It does not do variable coalescing yet but I will attempt to implement that at some point. |
Nice, looking forward to find bugs in it ;-). Btw, the title of thsi ticket was "loop decompilation", I'm not sure if it's helpful reminded or not that it's still not implemented yet. Btw, I experimented with loop recognition and trivial case are, well, trivial. Slightly less trivial cases are very well possible. (And real real-world cases are expectedly hard.) |
Ah, right. Arranging loops into higher level while() and for() in the decompiled output is not implemented yet. Its one of the next things in my todo list. I guess I'll reopen this. |
I rewrote the entire control flow "reconstruction" code in the last week. The old way of combining blocks was just not working, this new way will evolve a lot better I think.
I'd be interested in your more complex use cases. There are very few complex cases in the tests currently, I would love examples where my code just fails, or the decompilation is incorrect. |
Thanks for the updates!
So, that all bumps into #9. How to pass them to you, in what form they should be? Actually, I had to scratch my head how to run even the example in the description of this ticket - you codebase doesn't have a utility which can be fed with an IR file. In the end I dug out my older test with inline IR, then scratched my had what PYTHONPATH to run it with (made a runner shell script this type). And in the end what I get is:
I.e. it doesn't even use modern Python raise ;-). |
What about assembly?
Ah, that example is hitting one of the cases for which I didn't have a test yet. Thanks for reminding me about it, I will work on it. I was asking the question because I want to know if you actually have a set of use-cases of complex loops that you test other decompilers with? |
Anyway, answering your question, you can hopefully find a lot of inspiration in decomp/decomp#172 . As you can see, I was so excited by Van Emmerik's claims that "structuring problem is largely solved" and quoting Cifuentes, that I didn't get lazy to dig out her code, 16-bit compiler, and tested it. Then I tested Van Emmerik's own stuff. What I saw is terrible things ;-). That ticket also quotes https://github.com/pfalcon/ScratchABlock , which is thing I started as a "response" to your comments on #9. The human-readable/writable, externalized IR is the core of the project (note that for structuring, only control transfer instructions are required, so the rest of syntax is just literal so far ;-) ), a utility to read that format, apply a transformation, and render result as a basic block list and a .dot file is the central user interface, and tests are written not with self.assertsWhichWerePutInPythonAsAJavaDiversion(), but by running aforementioned utility and then diffing results - i.e., everything is highly inviting for human being to come by and hack. I actually hacked that up in 2-3 days after our discussion in #9, now leisurely refactor/clean up and spool to github. Finally, the ticket above also contains an example of non-trivial, but still human-graspable CFG. My code can structure it a bit, and I already have understanding what the other parts are, and how to structure them, though I'm not exactly sure the result would be close to the original code. The whole title of that ticket says that just shape of (sub)CFG is not enough to properly recover structure, at the very least, also analyzing and collating graphs based on branch condition codes is required, possible in non-trivial way. Good example of that is while loop. All folks represent it based on assumption that it starts with unconditional jump into loop header. But everyone who learned assembler first, high-level language then, knows that while loop is not that. While is if followed by do while, i.e.
To match the above, one needs to match equivalence of conditions on different edges. No paper I read so far (and I've read quite a lot by now) discussed the above trivial example. So, structuring problem is largely solved. |
I have them in PseudoC assembler (which is what any decompiler should use IMHO). But original is in Xtensa assembler if you care. |
I ran :
thru
dec.step_until(step_decompiled)
, and not really getting any decompiled code, output is SSA basic blocks, the same as fordec.step_until(step_decompiled)
.Does that mean that loops are not supported yet? Note that SSA of non-looping constructs is trivial matter (like converting out of SSA). The real complications start with loops. And I wonder how sound is your out-of-SSA algorithm.
The text was updated successfully, but these errors were encountered: