Feature proposal: graph-based code coverage representation #103

Zherya · 2021-06-09T12:12:02Z

Hi there,

We are going to implement graph-based code coverage representation as a feature of LightHouse.

Problem that we solved

Now LightHouse doesn't have any features to find most important uncovered functions in the code. We need a convenient way to analyze code coverage as a system of functions.

Solution

We propose to use call graph-based representation of all reachable code from specified by the user function. After analysis of common graph representation formats we selected GraphML format because it is powerful enough to represent all that we want. It will look like (used Gephi as a visualizer):

On the graph: a vertex is a function, a vertex color represents percent of coverage (green - covered, red - not, and so on). Size of a vertex represents size of a function, cyclomatic complexity of a function can be represented, for example, by the amount of vertices of polygon. Also, to each function we can attach additional information, like amount of memory references, using custom attributes. It can be useful for automatic processing of a result graph.

First we are going to implement IDA Pro support only because we don't use Binary Ninja at all.

Could it be useful for community? We need your opinion/feedback.

In case of any questions, feel free to ask.

Dmitry Zheregelya,
Advanced Software Technology Laboratory,
Huawei

gaasedelen · 2021-06-10T00:33:15Z

I think that this is a good idea, visualizations can be very powerful for scenarios like this.

Some thoughts / considerations:

We propose to use call graph-based representation of all reachable code from specified by the user function.

In my experience, call graph visualizations can get overwhelming... fast.

You might want to consider an implementation similar to IDA's proximity view that allows you to explode out nodes (functions) to further explore the graph / callees. In this case, you could even represent the aggregated 'complexity' of a callee's underlying callgraph before expanding it out to see the individual parts.

Experiment with a few approaches and see what makes sense :-)

On the graph: a vertex is a function, a vertex color represents percent of coverage (green - covered, red - not, and so on). Size of a vertex represents size of a function, cyclomatic complexity of a function can be represented, for example, by the amount of vertices of polygon.

I think these are great ideas.

Also, to each function we can attach additional information, like amount of memory references, using custom attributes.

Just something to note for your implementation/changes -- Lighthouse does not collect memory references or call edges (i.e. foo() --calls--> bar()). You should look at how it collects the basic-block edges and possibly copy or re-work some of that code to capture the edges you need to build a call graph.

First we are going to implement IDA Pro support only because we don't use Binary Ninja at all.

Sure, don't worry about it. Based on what you have suggested, it seems like it would be pretty easy for me to port over any necessary changes to the binary ninja side.

Could it be useful for community? We need your opinion/feedback.

It's definitely something that Lighthouse would benefit from.

I know a few users have written a few 'lighthouse scripts' to locate the most complex 'frontier nodes' / sub-graphs for their coverage sets, and this sounds like a great solution to help users discover these types of cases (and more) visually.

If there are any other specific questions about the codebase or integration, I'll do my best to answer them as you proceed. Otherwise, it sounds like most of the work will simply be on the visualization / interaction side. I don't think this should require many 'core' changes to Lighthouse.

Zherya · 2021-06-10T08:03:00Z

@gaasedelen , thanks for your positive feedback!

And, unfortunately, I suppose, I haven't made myself clear, because we are not going to implement graph visualization by ourselves.

Although graph visualization right in disassembler might be easy to use and requires no additional actions from user, we believe it doesn't offer as much flexibility as we want.

What we are going to do is to implement a functionality that will export required graph in GraphML format (just as a simple text file). That exported file then can be visualized (and analyzed automatically, if you need) in any software that you want: for example, as we already mentioned, Gephi, or, maybe, Wolfram Mathematica or Maple.

We believe that it is more powerful in case of visualization than doing it right in disassembler. And it saves our time as we don't need to spend much time on visualization side :)

gaasedelen added the feature request label Jun 10, 2021

Zherya mentioned this issue Dec 9, 2021

Graph-based code coverage representation #111

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature proposal: graph-based code coverage representation #103

Feature proposal: graph-based code coverage representation #103

Zherya commented Jun 9, 2021

gaasedelen commented Jun 10, 2021 •

edited

Zherya commented Jun 10, 2021

Feature proposal: graph-based code coverage representation #103

Feature proposal: graph-based code coverage representation #103

Comments

Zherya commented Jun 9, 2021

gaasedelen commented Jun 10, 2021 • edited

Zherya commented Jun 10, 2021

gaasedelen commented Jun 10, 2021 •

edited