Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds frida script for gathering code coverage #17

Merged
merged 11 commits into from Oct 24, 2017

Conversation

yrp604
Copy link
Contributor

@yrp604 yrp604 commented Oct 20, 2017

Implements a frida script that gathers code coverage information and
saves it using drcov format.

Probably should be considered experimental, as it's the first thing I've
written with frida and Im sure I've screwed something up =p

Implements a frida script that gathers code coverage information and
saves it using drcov format.

Probably should be considered experimental, as it's the first thing I've
written with frida and Im sure I've screwed something up =p
@yrp604
Copy link
Contributor Author

yrp604 commented Oct 20, 2017

Of course right after submitting this I figured some other stuff to add/change. Will try to get it updated tonight.

This allows frida to filter inside the target by thread id. This is
probably only useful if you have other introspection into the process to
see which thread you're interested. However, on larger targets if you
can use this it improves results significantly.
@yrp604
Copy link
Contributor Author

yrp604 commented Oct 21, 2017

Going to create a new PR with additional features, maybe better perf.

@yrp604 yrp604 closed this Oct 21, 2017
@yrp604 yrp604 reopened this Oct 21, 2017
@yrp604
Copy link
Contributor Author

yrp604 commented Oct 21, 2017

I'm bad at git.

I started making some perf fixes before realizing some stuff about
frida.

So, the good news: these perf fixes do in fact speed up the client by
~10% or so.
The bad news: this tracer is still 2 orders of magnitude slower than
native execution. From memory, DR is approx 33% slower than native, and
pin is ~%50. So this is quite a bit worse. Intuitively, this makes sense
as all coverage data has to be IPC'd out to the `recv()`ing process.
There are ways this could be improved:
* Cache the bbs in frida address space and only IPC them out on client
detatch. However, if the client `exit()`'s or crashes, all coverage data
would be lost.
* Reduce allocations and copies inside the frida script. This is
complicated by javascripts lack of native u64's. I don't really know
javascript well enough to effectively optimize this.
* The one thats probably a good idea: Right now the javascript creates
the drcov objects and sends them to the python, which just is
responsible for uniquing them. If IPC is the big bottleneck, if we
uniqued them on the javascript side, we would reduce that and in theory
improve performance. Some initial tests of this were not successful, so
I'm going to have to play around with this more.

For what it's worth, frida offers two modes for tracing at the
granularity we care about: `compile` and `block`. `compile` will fire an
event the first time a bb is seen, whereas `block` will fire an event
every time a bb is seen. Switching to `compile` improves perf by an
order of magnitude, however any blocks that were hit before you attach
that you re-execute will not show up in your trace. It's pretty easy to
toggle between them, and if people think making this an option has
value, I'm not opposed. My thoughts were that in either case it's still
really slow in comparison, so we might as well be slow, accurate, and
easy to use.

So yeah, perf improvements, but they don't really matter \o/
We'll use the standard hackertext indicators, because this is a hacker
tool.
@gaasedelen
Copy link
Owner

Frida has been picking up traction lately, so this is an awesome contribution.

I'll test and review this soon :-)

@gaasedelen gaasedelen added this to the v0.7.0 milestone Oct 21, 2017
@gaasedelen gaasedelen self-assigned this Oct 21, 2017
@oleavr
Copy link

oleavr commented Oct 21, 2017

This looks like an awesome use-case for Stalker, great job! 👍

Just a few notes on the Frida-specific bits:

It is roughly two orders of magnitude slower than native execution

After some optimizations a few months back, the code generated by Stalker should only be a little bit slower. I wrote a benchmark that measured it on LZMA compression, and the ratio is currently somewhere between 1.x and 2.x slowdown. This is however after it has "warmed up" and has all the basic blocks cached, and has applied the back-patching optimizations and warmed up the inline caches. It is careful not to re-use blocks in case of self-modifying code, though, so you pay performance overhead every time it context-switches into the runtime to look up the target of a branch, followed by checking if the original code changed since it was compiled. Stalker achieves high performance by back-patching branches and updating inline caches once they're considered stable. You can configure the Stalker.trustThreshold property to 0 if you know that the code isn't self-modifying, and -1 to be paranoid (never back-patch or use inline caches), and N to assume it's stable after having been executed N times.

  • It drops coverage, especially near exit()

The recommended way to deal with this is to hook exit() and friends, and send a "flush" message to the other side and wait for an acknowledgment before returning from the hook. This is implemented in frida-trace here – feel free to lift it.

It cannot easily detect new threads being created, thus cannot instrument them

This will require additional hooks, per OS. Each thread will be recompiling the code, though, so the warm-up cost isn't shared between all of them. (Though this won't matter if the threads happen to execute wildly different code – so this really depends on the application.)
The reason Stalker doesn't share code across the threads is so each may have different instrumentation applied through the StalkerTransformer API. (Allowing you to add/remove/replace instructions on a per-thread basis.)

Self modifying code will confuse it

Stalker has been designed with this in mind, but it depends on how you configure Stalker.trustThreshold.

function make_maps() {

Did you run into issues with the ModuleMap API? It supports providing a filter function if you only care about certain modules and want faster lookups.

// It would be really nice to use 'compile' here instead of 'block',

You can safely use 'compile', Stalker only compiles code that it's about to execute, and it only compiles one basic block at a time. The compilation happens lazily as it's resolving each branch for the first time – the branch at the end of each basic block is replaced by a context-switch into Stalker's runtime. And after a while this may be backpatched to go directly to the generated code, depending on how you configure Stalker.trustThreshold.

@yrp604
Copy link
Contributor Author

yrp604 commented Oct 21, 2017

Awesome, thanks @oleavr, that is insanely helpful.

I've implemented the non-OS specific changes, and it results in a roughly 60x speedup on my little toy benchmark. My guess is the remaining latency that's being introduced is due to me streaming the events as opposed to caching and flushing. One of the potential use cases for this might be tracing applications that crash, in which case streaming events should be a bit better, albeit at the cost of speed and with the same coverage dropping near termination as exit().

For the OS specific things, I'm not really able to test them on all the platforms Frida supports, so I'm inclined to leave it OS agnostic and keep it as a general limitation, as it's not too onerous and the script it still quite useful without them.

Thanks again, that's exactly the type of feedback I was hoping for.

Incoperate @oleavr's feedback.

Three changes:
* Instead of instrumenting on 'block' events, we now instrument on
'compile' events. This dramatically improves the performance. Since
we're only doing block coverage anyways, and since all the blocks get
uniqued, we lose nothing from this change.
* Set the Stalker trust threshhold to 0 . This means we're completely
punting on self modifying code in favor speed, but IDA/Lighthouse can't
visualize self modifying code anyways, so again, we lose nothing.
* Use frida's ModuleMap api instead of making a worse version ourselves.
The first time reading the docs I misunderstood this API, but @oleavr
helpfully cleared it up!
Just a bit cleaner, missed a reference when I was editing it earlier
And delete pointless variable.
Add a few more comments, change a few names so when I have to edit it in
three years I hate myself a little less.
So one of the big benefits of frida is non-local tracing. We should
allow the user to do that.
@gaasedelen
Copy link
Owner

Wow, well we couldn't have gotten better eyes on it than that! Thanks for taking a look @oleavr :-)

@gaasedelen gaasedelen merged commit 1aea26d into gaasedelen:develop Oct 24, 2017
@yrp604 yrp604 deleted the frida branch January 4, 2018 05:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants