Consider adding codeflow.state to capture initial execution state for things like static variables #168

michaelcfanning · 2018-05-09T16:47:20Z

No description provided.

michaelcfanning · 2018-05-31T20:43:08Z

@lgolding, thinking this over, this data is very specific for sure to dynamic analysis tools and not static.
I think we can close this in any case. You can think of this information as 'state' that is provided to a code flow, and so a dynamic analysis results author could populate that property with this class of information.

do you agree?

ghost · 2018-05-31T20:57:30Z

@michaelcfanning I agree that it is dynamic-specific, and for that reason I would agree to relabel this from CSD.2 to future.

At that point, it's moot, but -- I'm not sure that this is "state", if by that, you mean the property codeFlowLocation.state. That property is intended to provide a debugger-like Watch Window experience. You could certainly do this:

{                    # A codeFlowLocation object.
  "state": {
    "Accept": "text/plain",
    ....
  }
}

... but it seems like a bit of an abuse.

If we're not "really" supporting this, IMO the proper place is the result's property bag -- and then the tool can make use of hierarchical property bag property names:

{                    # A result object.
  ...
  "properties": {
    "Header/Accept": "text/plain",
    ...
  }
}

michaelcfanning · 2018-05-31T21:02:44Z

Disagree. Our position is that if we have properties that reasonably can accommodate output from a dynamic analysis tool, then we don't need to require separation for some reason.

What I'm arguing for here is that the forms data, web request details, etc. are state that are associated with a result code flow. This is conceptually no different than a static analysis tool that populates state with some assumptions that are driving the analysis.

What I have in mind in general is populating an unparsed proper, e.g., 'User Agent'., the value of which is the relevant unparsed string. But producers are free to populate as they see fit.

ghost · 2018-05-31T21:45:56Z

What I have in mind in general is populating an unparsed proper[ty?], e.g., 'User Agent'., the value of which is the relevant unparsed string. But producers are free to populate as they see fit.

Then I'm not sure what you're suggesting. You clearly disagree with using result.properties. But I can't tell if you want to use codeFlowLocation.state or something else. You talk about "state ... associated with a result code flow", but as the spec stands, there is no state associated with a codeFlow (or with a threadFlow). The state is associated with a codeFlowLocation, and that seems inappropriate to me, because these request parameters are not location-specific; they'll never change from one location to the next.

michaelcfanning · 2018-06-19T23:13:44Z

How do users specify the 'inaugural' state then? This is a useful question to raise. Consider a code flow that consists of three distinct threads. How should a preliminary value of a shared static variable be specified?

Perhaps we should add a state property somewhere else. Probably at the root of the code flow. For anything thread specific, this can be specified at the first thread flow location?

ghost · 2018-06-20T16:16:15Z

I agree with you that the root of the code flow is the right place for this information. I still don't agree that the thing you're describing (forms data and HTTP headers) is "state," because it never changes. If we're going to have to add a new property anyway, I'd prefer not to give it that name. It's more like "parameters". But I'm not going to lie down in front of the "state" train.

Another option is to use "properties":

{                 # A codeFlow object
  "properties": {
    "webScanner/headers/UserAgent": "Mozilla/5.0 (iPhone; ...",
    "webScanner/forms/Address/Street": "555-55th Street"
  }
}

michaelcfanning · 2018-06-20T16:17:58Z

huh? state that does not change is state. an analysis that starts with an assumption around a variable that is never modified during analysis is a very common scenario.

ghost · 2018-06-20T16:24:09Z

Ok, I yield. We add codeFlow.state, and include non-normative spec language (a "NOTE") explaining how you can use this for headers and forms data.

michaelcfanning · 2018-06-20T16:42:56Z

:)
Actually, I'm not advocating for any spec text on using this property for headers and forms data. That's a specific use for dynamic analysis. Instead, I've been thinking along the lines of our usual strategy in this case: consider how a dynamic analysis tool would sensibly utilize SARIF constructs. It seems to me that the 'state' property could sensibly hold this data. As you point out, another sensible approach would be to break apart the individual properties and put them in the properties bag.

As part of this discussion, I believe we've uncovered a useful gap in the format from the perspective of a static analysis tool. It is not clear where a code flow would specify initial state for static variables, other data in a multi-thread scenario. So my suggestion is to add codeFlow.state and restrict spec language to the static analysis scenario.

It's an interesting idea to consider some non-normative language around how to utilize SARIF for dynamic analysis scenarios where there's sufficient overlap (and to describe perhaps where that gaps are too extreme). But this seems like a low priority compared to other open items.

ghost · 2018-06-20T16:50:14Z

Having thought more about this, I suggest codeFlow.initialState for parallelism with graphTraversal.initialState.

ghost · 2018-06-20T17:01:54Z

But even so "state" is problematic for the "headers/forms data" scenario. Here's why:

As the user navigates a code flow, the debugger-like "watch window" is going to display the state belonging to the current location. For example, suppose you are on location 1, whose state property is

"state": {
  "x": "42",
  "y": "54"
}

The watch window displays

x    42
y    54

Now you navigate to location 2, whose state property is

"state": {
  "x": "33"
}

What should the viewer display? Should it display

x    33
y    54

... assuming that x has changed but y has not? Or should it display

x    33

... assuming that y has gone out of scope?

I claim that it should just display x. If the SARIF producer wants you to display the unchanged value of y, it should include it in the state property.

What does that imply for an initialState like this?

"initialState": {
  "x": "0",
  "UserAgent": "Mozilla/5.0..."
}

To me it implies that UserAgent will disappear from the watch window as soon as you navigate to the first location.

That's why I think that persistent information like this, which makes sense throughout the code flow, (and which you might want to view throughout your navigation through the code flow), needs a separate slot in the format.

"properties" works for that.

michaelcfanning · 2018-06-20T17:12:35Z

properties does not work for this, because properties could contain some arbitrary data that's not intended to be viewable and does not allow for general viewer consumption.

do me a favor. stop talking about the headers/forms example. you'll note I retitled this issue. let's revisit your example: how do we project that a viewer can most usefully report out on things like static variables that may not be associated with the current selected code location.

This is a good question to answer. The answer is not to use the properties bag. Any thoughts?

ghost · 2018-06-20T17:22:20Z

"Static" variables can change from location to location, just as local variables can:

"locations": [
  {
    "x": "42,
    "s_count": "5"
  },
  {
    "x": "42",
    "s_count": "6"
  }
]

The producer should add them to the state property of any location where it makes sense. There's no need to treat them specially by mentioning them in a codeFlow-level property.

But if you mean "constant" variables, rather than "static" ones -- there I think is the answer. We could define a new property codeFlow.constants. The viewer would display them in a window separate from the state. This would also cover the Scenario Who Must Not Be Named.

michaelcfanning · 2018-06-20T17:29:57Z

I am focused on this specific question: how does a viewer differentiate between local vs. static variables. In general, when debugging, all viewable static variables do no appear in watch windows. This data can be quite copious. Users can explicitly add a static to a watch window, but this is an explicit question.

Now consider your step-through scenario. Does we expect all state to be viewable in a common window? Is it useful to distinguish between variables of varying scope? etc. I believe we need to do some thinking here.

You have raised a new concern around constants. Maybe that's valuable, I'm not sure. Some debuggers also have a place for 'automatics', a place to show implied variables that may not be explicit in source code (such as an exception instance that pops up out of nowhere during execution).

I appreciate all the good thinking/speculation here. We would do be well-advised to collect some traces from various tools to drive progress. Probably should create a bucket issue around all the new topics that have turned up as this conversation has turned into a more systematic review of this area.

ghost · 2018-06-20T18:26:00Z

One way that a SARIF viewer differs from a debugger is that the SARIF producer gets to decide which variables the user will find interesting at each location in the code flow. So it might be less important to segregate variables by type than it is in a debugger. For example, the producer could say:

const x                 42
return value            null
s_count                 5

michaelcfanning · 2018-06-20T20:50:32Z

This is a good point, you are noting that one position we could have is that a step-through of a code flow is an extremely directed experience, in that the SARIF data is explicitly designed to show only the relevant information. That's a good value. On the other hand, this implies a certain rigidity, we've limited the ability of viewers to provide alternate/competing views.

I actually thought that our spec said that the state shouldn't recapitulate data except for what has changed. This is an optimization that prevents the need to keep repeating state values for data that hasn't changed. Does our spec actually say that? If not, we need yet another issue on code flows.

Which continues to be my most urgent concern. With all due respect to our highly intelligent discussion here, I won't really feel like we're making progress until we're driving SARIF v2 examples through multiple tools through multiple viewers. Have you and Chris taken a look at SDV yet in our current implementation of various things?

ghost · 2018-06-20T21:07:25Z

The spec does not say that state should only mention values that have changed. If we want to make it say that, we have to figure out how to tell the viewer "You know that variable y we were just showing? It's not important any more (maybe it went out of scope; maybe we just don't care about it any more), so stop showing it."

With regard to SDV: The sarif-v2 branch of the SARIF SDK includes all the code flow changes, except for (1) the rename of codeFlowLocation to threadFlowLocation, and (2) the addition of threadFlowLocation.timestamp.

That means that the SDV-to-SARIF converter is already up to date with those changes, and the tests demonstrate that the converter produces valid SARIF in the new format. What more SDV-related validation did you have in mind?

michaelcfanning · 2018-06-20T21:39:18Z

I would like someone to please assess all existing SARIF v2 samples, whether produced directly or by conversion, in whatever SARIF v2 code flow consuming mechanisms exist today. These include a web control, a VS Code extension and a VS add-in, minimally. I am not attempting to provide a complete test methodology for evaluating SDV. I am hoping that you, or Chris, or another resource, can take this on.

Send me mail offline if this isn't clear and we need to work against a more clearly articulated plan. This discussion is getting off topic.

michaelcfanning · 2019-03-15T16:08:09Z

E-BALLOT PROPOSAL:

In the codeFlow object:
- Add property immutableState of type object with string-valued properties, to hold information like HTTP headers and forms data. That was the initial motivation for this issue.
- Add property initialState of type object with string-valued properties (parallel to the existing property graphTraversal.initialState).
In the graphTraversal object:
- Add property immutableState of type object with string-valued properties (parallel to the new property codeFlow.immutableState).
In the spec, clarify that threadFlowLocation.state and edgeTraversal.finalState must mention every relevant state variable, even those unchanged since the previous step, because otherwise, if a state variable was present at step n but missing at step n + 1, you couldn't tell if it still existed but its value was unchanged, or if it had gone out of scope.

ghost · 2019-03-16T21:21:35Z

E-BALLOT PROPOSAL

Enable the representation of "immutable state": state that remains constant throughout the traversal of a graph or a code flow. This is useful for capturing, for example, HTTP request headers or forms data.

Also enable the representation of "initial state": mutable state captured before the start of a code flow or graph traversal. The graphTraversal object already defined an initialState property for this purpose.

SCHEMA CHANGES

In the codeFlow object:
- Add property initialState of type object with string-valued properties.
- Add property immutableState of type object with string-valued properties.
In the graphTraversal object:
- Add property immutableState of type object with string-valued properties.

kupsch · 2019-03-18T19:25:56Z

This could be a separate ticket, but it is a change to the description for the state value, so it relates to this ticket.

For many of the tools that I have seen that include flows or traversals, when explaining a flow or graph traversal state, they specify the value of an expression as a literal value (expressible in this current design), or as a constraint on the value (currently not not expressible). Such as

{
  "i":  "i < 5",
  "j":  "j >= 10 && j < 100",
  "p":  "p != NULL ",
  "r":  "r == NULL",
  "s":  "strlen(s) > 100",
  "t":  "buffer t is not NULL terminated"
}

This state constraint information that many tools report, could incorporate this without changing the schema by describing that a magic string value such as "{expr}" is used to represent a constraint expression. If "{expr}" exists in the value then it is a constraint (the value may substitute the actual expression for {expr}. If "{expr}" does not exist in the value then this is an equality constraint equivalent to "{expr} == VALUE" where VALUE is the value of the JSON property.

The above constraints would then become a state object of

{
  "i":  "{expr} < 5",
  "j":  "{expr} >= 10 && {expr} < 100",
  "p":  "{expr} != NULL ",
  "r":  "NULL",
  "s":  "strlen({expr}) > 100",
  "t":  "buffer {expr} is not NULL terminated"
}

For viewer just displaying the values above would probably be deciphered with the correct meaning by most developers, and advanced viewers could do more to indicate a general constraint instead of just equality.

This draft contains all the changes through "e-ballot #2," which opened on Friday March 15 and will close on Friday March 22. It contains changes for ballot issues #168, #291, #309, #320, #321, #326, #335, and #341, as well as for previously approved issue #340. It does _not_ contain changes for any issues from "e-ballot #3," which will open on Friday March 22 and close on Friday March 29.

ghost · 2019-03-29T15:40:20Z

Approved in e-ballot #2.

ghost · 2019-04-06T21:09:08Z

@kupsch I will open an issue for your "arbitrary expression" suggestion, and I'll take care of it. Since all the state-ish properties are just string-to-string dictionaries, there's no downside to this, it's non-breaking, it's just words in the spec, and I have the time.

@michaelcfanning FYI

michaelcfanning added CSD.1 Will be fixed in CSD.1. 2.1.0-CSD.1 Will be fixed in SARIF v2.1.0 CSD.1. and removed CSD.1 Will be fixed in CSD.1. labels May 30, 2018

michaelcfanning changed the title ~~Consider adding placeholder for things like forms data and request headers for dynamic analysis tools~~ Consider adding codeflow.state to capture initial execution state for things like static variables Jun 20, 2018

michaelcfanning added the p2 Priority 2 issue to close label Jan 24, 2019

michaelcfanning added the tc-34 label Mar 15, 2019

ghost pushed a commit that referenced this issue Mar 15, 2019

Change draft + merge for #168: immutable threadFlow state

3c7273c

ghost self-assigned this Mar 15, 2019

ghost added enhancement impact-non-breaking-change code-flows labels Mar 15, 2019

ghost added change-draft-available merged Changes merged into provisional draft. labels Mar 15, 2019

michaelcfanning removed the tc-34 label Mar 19, 2019

ghost added resolved-fixed design-approved The TC approved the design and I can write the change draft labels Mar 29, 2019

ghost closed this as completed Mar 29, 2019

ghost added e-ballot-2 and removed change-draft-available labels Mar 29, 2019

This was referenced Apr 6, 2019

Constraints in state objects #359

Closed

Define request and response objects #362

Closed

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider adding codeflow.state to capture initial execution state for things like static variables #168

Consider adding codeflow.state to capture initial execution state for things like static variables #168

michaelcfanning commented May 9, 2018

michaelcfanning commented May 31, 2018

ghost commented May 31, 2018

michaelcfanning commented May 31, 2018

ghost commented May 31, 2018 •

edited by ghost

michaelcfanning commented Jun 19, 2018

ghost commented Jun 20, 2018

michaelcfanning commented Jun 20, 2018

ghost commented Jun 20, 2018

michaelcfanning commented Jun 20, 2018

ghost commented Jun 20, 2018

ghost commented Jun 20, 2018

michaelcfanning commented Jun 20, 2018

ghost commented Jun 20, 2018

michaelcfanning commented Jun 20, 2018

ghost commented Jun 20, 2018 •

edited by ghost

michaelcfanning commented Jun 20, 2018

ghost commented Jun 20, 2018

michaelcfanning commented Jun 20, 2018

michaelcfanning commented Mar 15, 2019 •

edited by ghost

ghost commented Mar 16, 2019

kupsch commented Mar 18, 2019 •

edited

ghost commented Mar 29, 2019

ghost commented Apr 6, 2019

Consider adding codeflow.state to capture initial execution state for things like static variables #168

Consider adding codeflow.state to capture initial execution state for things like static variables #168

Comments

michaelcfanning commented May 9, 2018

michaelcfanning commented May 31, 2018

ghost commented May 31, 2018

michaelcfanning commented May 31, 2018

ghost commented May 31, 2018 • edited by ghost

michaelcfanning commented Jun 19, 2018

ghost commented Jun 20, 2018

michaelcfanning commented Jun 20, 2018

ghost commented Jun 20, 2018

michaelcfanning commented Jun 20, 2018

ghost commented Jun 20, 2018

ghost commented Jun 20, 2018

michaelcfanning commented Jun 20, 2018

ghost commented Jun 20, 2018

michaelcfanning commented Jun 20, 2018

ghost commented Jun 20, 2018 • edited by ghost

michaelcfanning commented Jun 20, 2018

ghost commented Jun 20, 2018

michaelcfanning commented Jun 20, 2018

michaelcfanning commented Mar 15, 2019 • edited by ghost

E-BALLOT PROPOSAL:

ghost commented Mar 16, 2019

E-BALLOT PROPOSAL

SCHEMA CHANGES

kupsch commented Mar 18, 2019 • edited

ghost commented Mar 29, 2019

ghost commented Apr 6, 2019

ghost commented May 31, 2018 •

edited by ghost

ghost commented Jun 20, 2018 •

edited by ghost

michaelcfanning commented Mar 15, 2019 •

edited by ghost

kupsch commented Mar 18, 2019 •

edited