Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loop decompilation not supported? #11

Closed
pfalcon opened this issue May 18, 2015 · 10 comments
Closed

Loop decompilation not supported? #11

pfalcon opened this issue May 18, 2015 · 10 comments

Comments

@pfalcon
Copy link

pfalcon commented May 18, 2015

I ran :

          i = 0;
    100:  if (i >= 100) goto 400;
          i = i + 1;
          goto 100;
    400:  return i;

thru dec.step_until(step_decompiled), and not really getting any decompiled code, output is SSA basic blocks, the same as for dec.step_until(step_decompiled).

Does that mean that loops are not supported yet? Note that SSA of non-looping constructs is trivial matter (like converting out of SSA). The real complications start with loops. And I wonder how sound is your out-of-SSA algorithm.

@pfalcon
Copy link
Author

pfalcon commented May 18, 2015

Ok, I see ;-) :

  def remove_ssa_form(self):
    """ transform the flow out of ssa form. """

    if self.has_theta_expressions():
      return #raise RuntimeError('not yet implemented')

@EiNSTeiN-
Copy link
Owner

Yup, the "out of ssa" code is not written yet :)

@EiNSTeiN-
Copy link
Owner

I implemented a simple ssa back transform with copy assignments in bc990c0. It does not do variable coalescing yet but I will attempt to implement that at some point.

@pfalcon
Copy link
Author

pfalcon commented May 25, 2015

Nice, looking forward to find bugs in it ;-).

Btw, the title of thsi ticket was "loop decompilation", I'm not sure if it's helpful reminded or not that it's still not implemented yet. Btw, I experimented with loop recognition and trivial case are, well, trivial. Slightly less trivial cases are very well possible. (And real real-world cases are expectedly hard.)

@EiNSTeiN-
Copy link
Owner

Ah, right. Arranging loops into higher level while() and for() in the decompiled output is not implemented yet. Its one of the next things in my todo list. I guess I'll reopen this.

@EiNSTeiN- EiNSTeiN- reopened this May 25, 2015
@EiNSTeiN-
Copy link
Owner

I rewrote the entire control flow "reconstruction" code in the last week. The old way of combining blocks was just not working, this new way will evolve a lot better I think.

Btw, I experimented with loop recognition and trivial case are, well, trivial. Slightly less trivial cases are very well possible. (And real real-world cases are expectedly hard.)

I'd be interested in your more complex use cases. There are very few complex cases in the tests currently, I would love examples where my code just fails, or the decompilation is incorrect.

@pfalcon
Copy link
Author

pfalcon commented Jun 16, 2015

Thanks for the updates!

I'd be interested in your more complex use cases.

So, that all bumps into #9. How to pass them to you, in what form they should be? Actually, I had to scratch my head how to run even the example in the description of this ticket - you codebase doesn't have a utility which can be fed with an IR file. In the end I dug out my older test with inline IR, then scratched my had what PYTHONPATH to run it with (made a runner shell script this type).

And in the end what I get is:

$ ./run.sh ir_test.py 
more than one group
Traceback (most recent call last):
  File "ir_test.py", line 21, in <module>
    dec.step_until(step_decompiled)
  File "/mnt/hdd/projects-3rdparty/RevEng/Decompilers/PYTHON/decompiler/src/decompiler.py", line 413, in step_until
    for step in self.step_generator:
  File "/mnt/hdd/projects-3rdparty/RevEng/Decompilers/PYTHON/decompiler/src/decompiler.py", line 509, in steps
    self.ssa_tagger.remove_ssa_form()
  File "/mnt/hdd/projects-3rdparty/RevEng/Decompilers/PYTHON/decompiler/src/ssa.py", line 372, in remove_ssa_form
    t.transform()
  File "/mnt/hdd/projects-3rdparty/RevEng/Decompilers/PYTHON/decompiler/src/ssa.py", line 565, in transform
    self.rename_groups(phi, groups)
  File "/mnt/hdd/projects-3rdparty/RevEng/Decompilers/PYTHON/decompiler/src/ssa.py", line 557, in rename_groups
    raise 'not implemented'
TypeError: exceptions must be old-style classes or derived from BaseException, not str

I.e. it doesn't even use modern Python raise ;-).

@EiNSTeiN-
Copy link
Owner

How to pass them to you, in what form they should be?

What about assembly?

And in the end what I get is:

Ah, that example is hitting one of the cases for which I didn't have a test yet. Thanks for reminding me about it, I will work on it.

I was asking the question because I want to know if you actually have a set of use-cases of complex loops that you test other decompilers with?

@pfalcon
Copy link
Author

pfalcon commented Jun 16, 2015

Anyway, answering your question, you can hopefully find a lot of inspiration in decomp/decomp#172 . As you can see, I was so excited by Van Emmerik's claims that "structuring problem is largely solved" and quoting Cifuentes, that I didn't get lazy to dig out her code, 16-bit compiler, and tested it. Then I tested Van Emmerik's own stuff. What I saw is terrible things ;-).

That ticket also quotes https://github.com/pfalcon/ScratchABlock , which is thing I started as a "response" to your comments on #9. The human-readable/writable, externalized IR is the core of the project (note that for structuring, only control transfer instructions are required, so the rest of syntax is just literal so far ;-) ), a utility to read that format, apply a transformation, and render result as a basic block list and a .dot file is the central user interface, and tests are written not with self.assertsWhichWerePutInPythonAsAJavaDiversion(), but by running aforementioned utility and then diffing results - i.e., everything is highly inviting for human being to come by and hack. I actually hacked that up in 2-3 days after our discussion in #9, now leisurely refactor/clean up and spool to github.

Finally, the ticket above also contains an example of non-trivial, but still human-graspable CFG. My code can structure it a bit, and I already have understanding what the other parts are, and how to structure them, though I'm not exactly sure the result would be close to the original code. The whole title of that ticket says that just shape of (sub)CFG is not enough to properly recover structure, at the very least, also analyzing and collating graphs based on branch condition codes is required, possible in non-trivial way. Good example of that is while loop. All folks represent it based on assumption that it starts with unconditional jump into loop header. But everyone who learned assembler first, high-level language then, knows that while loop is not that. While is if followed by do while, i.e.

if (c) {
  do {
    ...
  } while (c);
}

|
v

while (c) {
  ...
}

To match the above, one needs to match equivalence of conditions on different edges.

No paper I read so far (and I've read quite a lot by now) discussed the above trivial example. So, structuring problem is largely solved.

@pfalcon
Copy link
Author

pfalcon commented Jun 16, 2015

What about assembly?

I have them in PseudoC assembler (which is what any decompiler should use IMHO). But original is in Xtensa assembler if you care.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants