Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature proposal: graph-based code coverage representation #103

Open
Zherya opened this issue Jun 9, 2021 · 2 comments
Open

Feature proposal: graph-based code coverage representation #103

Zherya opened this issue Jun 9, 2021 · 2 comments

Comments

@Zherya
Copy link

Zherya commented Jun 9, 2021

Hi there,

We are going to implement graph-based code coverage representation as a feature of LightHouse.

Problem that we solved

Now LightHouse doesn't have any features to find most important uncovered functions in the code. We need a convenient way to analyze code coverage as a system of functions.

Solution

We propose to use call graph-based representation of all reachable code from specified by the user function. After analysis of common graph representation formats we selected GraphML format because it is powerful enough to represent all that we want. It will look like (used Gephi as a visualizer):

GraphML_example

On the graph: a vertex is a function, a vertex color represents percent of coverage (green - covered, red - not, and so on). Size of a vertex represents size of a function, cyclomatic complexity of a function can be represented, for example, by the amount of vertices of polygon. Also, to each function we can attach additional information, like amount of memory references, using custom attributes. It can be useful for automatic processing of a result graph.

First we are going to implement IDA Pro support only because we don't use Binary Ninja at all.

Could it be useful for community? We need your opinion/feedback.

In case of any questions, feel free to ask.

Dmitry Zheregelya,
Advanced Software Technology Laboratory,
Huawei

@gaasedelen
Copy link
Owner

gaasedelen commented Jun 10, 2021

I think that this is a good idea, visualizations can be very powerful for scenarios like this.

Some thoughts / considerations:

We propose to use call graph-based representation of all reachable code from specified by the user function.

In my experience, call graph visualizations can get overwhelming... fast.

You might want to consider an implementation similar to IDA's proximity view that allows you to explode out nodes (functions) to further explore the graph / callees. In this case, you could even represent the aggregated 'complexity' of a callee's underlying callgraph before expanding it out to see the individual parts.

Experiment with a few approaches and see what makes sense :-)

On the graph: a vertex is a function, a vertex color represents percent of coverage (green - covered, red - not, and so on). Size of a vertex represents size of a function, cyclomatic complexity of a function can be represented, for example, by the amount of vertices of polygon.

I think these are great ideas.

Also, to each function we can attach additional information, like amount of memory references, using custom attributes.

Just something to note for your implementation/changes -- Lighthouse does not collect memory references or call edges (i.e. foo() --calls--> bar()). You should look at how it collects the basic-block edges and possibly copy or re-work some of that code to capture the edges you need to build a call graph.

First we are going to implement IDA Pro support only because we don't use Binary Ninja at all.

Sure, don't worry about it. Based on what you have suggested, it seems like it would be pretty easy for me to port over any necessary changes to the binary ninja side.

Could it be useful for community? We need your opinion/feedback.

It's definitely something that Lighthouse would benefit from.

I know a few users have written a few 'lighthouse scripts' to locate the most complex 'frontier nodes' / sub-graphs for their coverage sets, and this sounds like a great solution to help users discover these types of cases (and more) visually.

If there are any other specific questions about the codebase or integration, I'll do my best to answer them as you proceed. Otherwise, it sounds like most of the work will simply be on the visualization / interaction side. I don't think this should require many 'core' changes to Lighthouse.

@Zherya
Copy link
Author

Zherya commented Jun 10, 2021

@gaasedelen , thanks for your positive feedback!

And, unfortunately, I suppose, I haven't made myself clear, because we are not going to implement graph visualization by ourselves.

Although graph visualization right in disassembler might be easy to use and requires no additional actions from user, we believe it doesn't offer as much flexibility as we want.

What we are going to do is to implement a functionality that will export required graph in GraphML format (just as a simple text file). That exported file then can be visualized (and analyzed automatically, if you need) in any software that you want: for example, as we already mentioned, Gephi, or, maybe, Wolfram Mathematica or Maple.

We believe that it is more powerful in case of visualization than doing it right in disassembler. And it saves our time as we don't need to spend much time on visualization side :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants