Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port fcd to use Remill #51

Open
pgoodman opened this issue Dec 9, 2017 · 6 comments
Open

Port fcd to use Remill #51

pgoodman opened this issue Dec 9, 2017 · 6 comments

Comments

@pgoodman
Copy link

pgoodman commented Dec 9, 2017

This is kind of a long-shot, but I think possibly worth it. According to your blog, fcd did not use McSema at the time because it was stuck on LLVM 3.5. Since then, McSema has been re-implemented with the new version 2 and works using on all of LLVM 3.5 through 5.0.

More importantly, though, the actual instruction semantics have been factored out into an independent instruction lifting library, called Remill. Remill supports x86 and x86-64 (with the mmx, x87, sse, and avx instruction sets), as well as aarch64. It is heavily tested, fairly modular, and will be continually supported by Trail of Bits.

If you're interested in this possibility then please let me know!

@fay59
Copy link
Owner

fay59 commented Dec 11, 2017

Hi Peter! I'm sort of interested, but I can't really work on this right now. Would you like to try it? At first glance, you probably only need to rip out the contents of translation_context.cpp and put Remill in there.

(With that said, there's some degree of possibility that some pattern matching that fcd does to detect flag computations stop working.)

@mewmew
Copy link

mewmew commented Dec 13, 2017

In addition, it would be really cool to port capstone2llvmir to fcd as well. This way multiple binary to LLVM IR backends could be utilized and compared. Some may support various architectures better than others, etc.

From the capstone2llvmir README:

At the moment, the library can translate the following instruction sets:

  • ARM (32-bit + Thumb extension) -- core instruction set.
  • Mips (32/64-bit) -- core instruction set.
  • PowerPC (32/64-bit) -- core instruction set.
  • x86 (16/32/64-bit) -- core instruction set.

@pgoodman
Copy link
Author

You can track the progress of this work here: https://github.com/trailofbits/fcd

@cryslith
Copy link

@pgoodman Out of curiosity, why was that repo archived?

@pgoodman
Copy link
Author

@cryslith We were mostly successful in moving away from Capstone and to using Remill. Ultimately, we discovered that the control-flow restructing algorithms existing in fcd worked in specific situations, but when applied to a wide variety of "weird" code, ended up breaking. To solve these problems, we needed to bring a solver into the mix. Further, fcd implements its own AST data structures, which themselves were incomplete/insufficient for several tasks. We realized that a far more useful tool would generate Clang ASTs. Thus, we started the project Rellic with the narrower scope/focus of being the best system for reversing/converting LLVM IR into C (via Clang ASTs). Beyond this, we can plug in other tools, e.g. Anvill, to get machine code to "nice" LLVM bitcode (given a specification that is similar in spirit to a function prototype).

@mewmew
Copy link

mewmew commented Nov 12, 2019

Anvill, to get machine code to "nice" LLVM bitcode (given a specification that is similar in spirit to a function prototype).

Very happy to see Anvill taking shape. It's aim is essentially what I've always envisioned wanting from a binary lifter to LLVM IR.

Rellic is also cool, with it's pattern-independent control flow recovery algorithms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants