Support graphs and graph traversals #46

DerSaidin · 2017-09-27T15:48:34Z

There are many graph structures in static analysis which are useful to preserve in results. For example:

A value flow graph
- Show multiple values contributing to a value
A call graph
- Thanks to the graph structure, we can expand callsites to see more information about their side effects

Several of the existing properties of result could be abstracted and generalized in this manner:

codeFlows property
stacks property
relatedLocations property
Note that all of these have properties in common: location, message, These would be the vertices in the graph. (in the case of stacks, the stackFrame objects are the vertices and the stacks object is providing some of the graph edges/structure).

This would also allow the format to support other information which generally fits into a graph.
Having codeflows and stacks properties show the desire for this generalization/extensibility. What other similar properties will be wanted in the future that are not currently specified?

Each vertex would need some tag to identify what it means (i.e. this vertex is a stackFrame, this vertex is a value flow at an addition) and how vertices are expected to fit together (a stackFrame cannot flow into an addition, these should not appear in the same graph).
Tools doing their own graphs (not specified in SARIF) could still have a graph of vertices with a location and a message and their own meaning.

michaelcfanning · 2017-10-11T14:38:38Z

This is an interesting suggestion that's worth considering. The SATE format has a notion of linking related results that is another example of this kind of relationship. Their solution was simply to provide a universal identifier, allowing results to store lists of UIDs to other items. You're proposing something more expressive, obviously.

Besides the option of annotating a node/entity in the graph ('stack frame'), it's interesting to consider whether we annotate the edges instead, with most nodes simply being a code location. so a 'calls into' edge could point to a common location referenced by other entities.

All of this resurrects an historical question with SARIF, which is whether locations themselves should be stored as a distinct set of things (like we do with the 'files' object) with results and other things referencing items in the set. That would be a disruptive change to the current spec but would also be a support for what you've proposed here.

DerSaidin · 2017-10-11T17:37:52Z

Their solution was simply to provide a universal identifier, allowing results to store lists of UIDs to other items. You're proposing something more expressive, obviously.

More expressive in that it is a graph structure, not a flat list.

Less expressive in that each vertex in the graph can only be one strucure: a location and a message (plus whatever other appropriate field), not a UID referencing anything else in the SARIF file.

The graph structure I have in mind is replacing/generalizing structures within the result object, not allowing edges/references beyond the result object - I think that would be an orthogonal feature.

All of this resurrects an historical question with SARIF, which is whether locations themselves should be stored as a distinct set of things

I would note that (as I understand it) this question is orthogonal to using a graph strucure.

Each vertex in the graph structure could either provide the location directly without refering to a distinct set, or refer to locations in a distinct set.
Each location in the current structure could also provide the location directly or be changed to refer to locations in a distinct set.

AndrewPardoe · 2017-11-05T15:31:13Z

Different code analysis tools look at vertices and edges. If we went down this path would it make sense to be able to express either?

michaelcfanning · 2017-11-08T16:15:26Z

Andrew (@DerSaidin), do you have time to comment on this thread in general? How would a graph improve our ability to express a stack? Imagine a graph that represents a control flow graph, how would we represent a flow through it? Can you provide clarifying examples that show the promise of what you're proposing?

DerSaidin · 2017-11-08T18:56:45Z

Stack Example

void b() {
    zzz1(); //Dependency1
}
void a() {
    b();
    zzz2(); //Dependency2
    c();
    zzz4(); //Dependency4
    e();
}
void c() {
    d();
}
void d() {
    zzz3(); //Dependency3
}
void e() {
    zzz5(); //Dependency5 //Bug location
}

Without a graph

With the current SARIF proposal...

This array shall include every function call in the stack for which the tool has information, and the entries that are present shall occur in chronological order with the most recent (innermost) call first and the least recent (outermost) call last.

... We can have a list of frames going linearly up the stack, starting at Bug ...

frames: [
    { // stackFrame for e()
        locations = [ Dependency5 ]
    },
    { // stackFrame for a()
        locations = [ Dependency4, Dependency2 ]
    },
]

This might be displayed something like...

|==function a()==  (Frame_a)
|  Dependency2
|  Dependency4  

   -> call

    |==function e()==  (Frame_e)
    |  Dependency5

As a graph

With a graph, we can capture the location of all the dependencies accurately (The word "Frame" may be a little misused here, but it is convenient)...

Frame_a -> Frame_b -> Dependency1
        -> Dependency2
        -> Frame_c -> Frame_d -> Dependency3
        -> Dependency4
        -> Frame_e -> Dependency5

Note that the nodes in this graph have different types/tags. Some are tagged as frames, some are tagged as leaf dependencies.

And we can take advantage of more structured to display it with context...

|==function a()==  (Frame_a)
|

   -> call

    |==function b()==  (Frame_b)
    |  Dependency1  

   <- return

|==function a()==  (Frame_a continued)
|  Dependency2  

   -> call

    |==function c()==  (Frame_c)

       -> call

        |==function d==  (Frame_d)
        |  Dependency3  

       <- return

   <- return

|==function a()==  (Frame_a continued)
|  Dependency4  

   -> call

    |==function e()==  (Frame_e)
    |  Dependency5

For example, Dependency3 might be a write to a variable passed in from a().

The graph could also have an edge e.g. from Dependency3 to Dependency5.

This is a key difference between static and dynamic tools - a dynamic callstack is strictly linear, but a static analysis context is not (and not really a "stack").

Graphs of other Things

5.17.10. codeFlows property
5.17.11. stacks property

The codeFlows and stacks are specific cases of graphs.

What if I want to show a callgraph?

A graph structure is generalized, and applicable to other things.

lcartey · 2017-12-07T17:33:56Z

The idea of having graphs associated with the result object is definitely an interesting one. However, I'm not sure that it replaces or generalizes the existing elements. I think the current elements (codeFlows and, to a certain extent, stackFrames) represent paths through some notional graph, rather than representing a graph directly.

As an example, consider a call graph, with the vertices representing functions and edges representing calls. A result could represent a path through that call graph as a codeFlow, as an explanation for that result. That path could even visit the same function vertex in the graph multiple times. However, it is not currently possible to represent the call graph itself in the SARIF file.

@DerSaidin In your example you have what is essentially a call graph, but the structured display you provide is actually a path through that graph. I think that structured display can therefore be encoded as a codeFlow using the current SARIF spec, but the graph itself cannot currently be represented.

I think the question here is whether a graph representation makes sense in addition to the existing path representation. One area I think it could be useful is to help SARIF viewers provide a more sophisticated path display mechanism, in particular when:

The path occurs in a nested graph. It is quite natural to represent program structure as a nested graph reflecting the AST, and using this structure can improve the display of results, for example by collapsing a node to hide the edges within that node.
Multiple paths occur in the same graph. If provided with a unifying graph, a viewer could display the graph with multiple paths overlaid, rather than listing each path individally.

There may also be cases where a graph is useful independently of a path for a result, but I don't have a concrete example.

lcartey · 2017-12-07T17:39:45Z

I have opened a separate issue (#71) related to allowing edge properties on codeFlows.

DerSaidin · 2018-01-10T14:13:07Z

I like the distinction between paths through some a graph, rather than the graph itself. This is true, it is usually paths that are displayed.

If there was graphs in addition to paths, then I think the paths should refer to nodes of a graph. At this point the paths are essentially part of the graph.

I think graph vs path is like vector graphics vs bitmap. The vector graphics has more structured information and details, you can render various bitmaps views from it.

Having vector graphics is good because the structure can be useful - you can turn off some layers, or change colors more easily.
Having bitmaps is good because it is simple to display; you just send pixels strait to the screen, without the complexity of rendering something first.

As always, it is a trade off. Complexity vs flexibility.

michaelcfanning · 2018-01-25T23:05:25Z

As per most recent TC discussion, we are excluding this feature from this SARIF release. Larry will add some additional detail from notes taken during that session.

ghost · 2018-01-25T23:25:50Z

The upshot of our discussion in the last TC meeting was that

The full graph information would be huge.
We didn't want to be in the business of specifying a generic graph structure for the various types of graphs we might want to represent.

ghost · 2018-04-18T22:14:19Z

Closed by 20f0042.

michaelcfanning changed the title ~~Does the result object support graph information?~~ Consider: should the result object support graph information? Nov 3, 2017

michaelcfanning closed this as completed Jan 25, 2018

michaelcfanning reopened this Feb 26, 2018

ghost added enhancement impact-non-breaking-change CSD.1 Will be fixed in CSD.1. labels Mar 5, 2018

ghost assigned michaelcfanning Mar 5, 2018

michaelcfanning changed the title ~~Consider: should the result object support graph information?~~ Provide support for graphs and graph traversals Mar 18, 2018

ghost pushed a commit that referenced this issue Apr 2, 2018

Change draft for #46 (graphs)

63c7d31

ghost pushed a commit that referenced this issue Apr 2, 2018

Push the right change draft for #46 this time.

6245ce8

ghost pushed a commit that referenced this issue Apr 5, 2018

Revised change draft for #46 (graphs): nested traversals

60edafc

ghost pushed a commit that referenced this issue Apr 13, 2018

Update change draft for #46: identity nested traversals by id.

c8f93cb

ghost changed the title ~~Provide support for graphs and graph traversals~~ Support graphs and graph traversals Apr 18, 2018

ghost pushed a commit that referenced this issue Apr 18, 2018

Merge changes for #46 (graphs and graph traversals).

20f0042

ghost closed this as completed Apr 18, 2018

ghost added the resolved-fixed label Apr 18, 2018

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support graphs and graph traversals #46

Support graphs and graph traversals #46

DerSaidin commented Sep 27, 2017

michaelcfanning commented Oct 11, 2017

DerSaidin commented Oct 11, 2017

AndrewPardoe commented Nov 5, 2017

michaelcfanning commented Nov 8, 2017

DerSaidin commented Nov 8, 2017 •

edited

Loading

lcartey commented Dec 7, 2017

lcartey commented Dec 7, 2017

DerSaidin commented Jan 10, 2018

michaelcfanning commented Jan 25, 2018 •

edited

Loading

ghost commented Jan 25, 2018

ghost commented Apr 18, 2018

Support graphs and graph traversals #46

Support graphs and graph traversals #46

Comments

DerSaidin commented Sep 27, 2017

michaelcfanning commented Oct 11, 2017

DerSaidin commented Oct 11, 2017

AndrewPardoe commented Nov 5, 2017

michaelcfanning commented Nov 8, 2017

DerSaidin commented Nov 8, 2017 • edited Loading

Stack Example

Without a graph

As a graph

Graphs of other Things

lcartey commented Dec 7, 2017

lcartey commented Dec 7, 2017

DerSaidin commented Jan 10, 2018

michaelcfanning commented Jan 25, 2018 • edited Loading

ghost commented Jan 25, 2018

ghost commented Apr 18, 2018

DerSaidin commented Nov 8, 2017 •

edited

Loading

michaelcfanning commented Jan 25, 2018 •

edited

Loading