Non-drcov format support #41
Hey so, we've talked about this a bit but I wanted to document why lighthouse supporting other formats would be nice while it was still fresh.
So, drcov is useful in that there are easy, cross-platform tools to generate it, however it has some pretty significant shortcomings which I'm running into. Specifically drcov is made up of a header which gives the module maps and then a series of tuples (module id, bb offset, bb size). The main issue here is the bb size field. If you're generating a trace with someone that is aware of the bb sizes (e.g. a dbi), this is all cool, however if youre dumping a trace from something that is not bb aware (e.g. an emulator or collecting code coverage via sampling) you just have a list of PC values.
Assuming you have have a module map and a list of PC values there are a few things you could do:
Basically both these require pre-processing the coverage in IDA before loading, which is doable but is a pain in the ass.
So, I'm pretty agnostic with regards to what the actual format is, but the feature request is the ability to load any coverage data format which can be generated from the module mappings and a list of PC values.
The text was updated successfully, but these errors were encountered:
You are not the first to ask for the ability to load instruction traces, and I think tat it is a reasonable request. It is trivial to implement for the 'perfect' trace, but gets a bit murky for the general case.
I haven't sat down to enumerate the considerations we have to be mindful of for adding this feature. This is a good opportunity to do so, and welcome any of your thoughts.
0 - usability
Like loading from drcov files, it is important that loading a trace will just work without spamming users with dialogs and asking for their 'help' to identify relevent modules or to enter a base address (eg ASLR) for the module of interest. This is a shitty user experience which can become both tedious and annoying, especially when it's avoidable.
I also don't want menu options to choose between 10 different trace formats to load from. Novice users might not know what trace format they collected, or what to try to load it as.
1 - existing traces
There is no 'standardized' trace format, making the contents ambigious.
2 - custom traces
We could force lighthouse to only load custom traces with additional metadata (sort of like drcov) which can address all of the ambiguity stated above.
The problem with this approach is that we immediately lose support for existing tracing solutions such as the in-box PIN tool by Intel, a DynamoRIO equivalent, or virtually any other tracing technology out there.
3 - expectations vs reality
Right now, lighthouse does not paint 'coverage' or compute statistics for code that falls outside of defined functions. This is what I call 'unmapped coverage' which is entirely invisible to users at the present moment, and a skeleton in the lighthouse closet.
This is relevant to this issue, because instruction traces are more frequently used to capture abnormal execution (eg malware). In these cases, it will be more common for traces trace to 'collect coverage' on code or instructions that are not within defined functions (obfuscated). Even if you could capture an instruction trace, lighthouse might not be showing you evereything that is getting executd.
The biggest reason for this shortcoming is simply because lighthouse metadata aggregation works at a function level. We do not iterate over individual instructions outside of defined functions. Integrating this change will probably require some degree of re-architecting the metadata collection and storage process which I have not investigated.
These are just some immediate thoughts, I may add more later.
So after thinking about this and looking at my existing toolchain, I think the best option is a textfile with
E.g. we have
If we have
mod+off I think is the right way to go because its pretty much the simplest format thats still usable, has the fewest number of assumptions, etc.
There is now a rough implementation of mod+off available on the devlop branch (v0.8.4-DEV and newer). Simply try loading a mod+off file through the normal workflow.
Let me know if you experience any immediate issues or oddities, I'll be cleaning up the code soon.