Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch function traversal from "breadth first" to "depth first" #14

Open
pfalcon opened this issue Nov 19, 2016 · 2 comments
Open

Switch function traversal from "breadth first" to "depth first" #14

pfalcon opened this issue Nov 19, 2016 · 2 comments

Comments

@pfalcon
Copy link
Owner

pfalcon commented Nov 19, 2016

pfalcon/ida-xtensa2#1 represents an issue with the current code traversal process of SAB: it tries to follow all code paths of the current function, before considering calls the current function makes. Unfortunately, some calls are effectively jumps and never return. Trying to decode bytes after such a call may lead to conflicts in byte types, which may affect decoding of other functions following such instructions. So, instead, should try to emulate how a real CPU does - do a depth-first traversal of as many functions as possible first, and consider return addresses from calls only after that. This should allow to detect natural stop-gaps for unconstrained and incorrect code propagation. E.g. if following a call instruction there's a literal data belonging to another function, then it doesn't make sense to try to interpret this data as an instruction.

This approach will however have problems with function boundary detection (such detection was the reason to adopt breadth-first scheme). So, it will need to be reworked.

@mewmew
Copy link

mewmew commented Apr 4, 2017

Related to function boundary detection, some inspiration may be sought from rev.ng

Talk at https://youtu.be/5CbuU4KwBCE

From slides at http://llvm.org/devmtg/2016-11/Slides/DiFederico-rev.ng.pdf

The function detection process

  1. Identify function calls and return instructions
  2. Create a set of candidate function entry points (CFEP):
    1. called basic blocks
    2. unused code pointers in global data (e.g., not jump tables)
    3. code pointers embedded in the code
  3. Compute the basic blocks reachable from each CFEP
  4. Keep a CFEP only if:
    1. it’s a called basic block, or
    2. it’s reached by a skipping jump instruction

Skipping jump instruction in this case refers to a jump instruction (think tail call), which jumps over the basic blocks of another function.

@pfalcon
Copy link
Owner Author

pfalcon commented Apr 5, 2017

@mewmew , Thanks, I noticed rev.ng, though was surprised (not really) that instead of being "suite of tools for static binary analysis", it's currently just a binary translator. Didn't see those slides, looked thru. Well... I saw a similar paper recently (https://syssec.mistakenot.net/papers/eurosp-2017.pdf), that one caught my attention with a title "Compiler-Agnostic Function Detection in Binaries". "Compiler-agnostic, really? How else could it be?"

So well, both that paper and rev.ng's try to deal with detection of all functions in a binary. Well, good (luck). That's something completely different than what ScratchABit does - it's purposed is to be a dumb tool, and detect only what guaranteedly can be detected. Beyond that, everything is offloaded to a human (which can do further analysis manually, or using heuristic plugins).

The topic of this ticket is how to do that automatic detection in a better way. Depth-first was implemented btw, I don't close this ticket because I'd like to review if there're some TODOs still left.

On related note, I recently did switch basic blocks in ScratchABlocks to end on a call (pfalcon/ScratchABlock@1446a83), following the idea that in machine code, a call is much closer to a jump, than to a mathematical definition of function. That's of course how even Cifuentes suggested to do it, I just still find that other people like the idea that call == function too, e.g. http://x64dbg.com/blog/2016/07/27/Control-flow-graph.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants