implement more robust incident handling #50

ivg · 2020-02-13T12:59:01Z

This is more a BAP issue, of course, but let's discuss it here, on your side. I've noticed, that you guys are parsing incidents and incident location observations. I understand your motivation, as incidents are defined locally in a plugin and there is no programmatical interface to this observation. Of course, incidents should be published and an interface to them should be provided. So in the future, if you need any functionality in BAP, please do not hesitate to create an issue with a request, or just submit a PR, if you think it will be faster. It is much easier for us to be reactive (i.e., provide the necessary functionality on demand), rather than being proactive (guess what is needed by the community right now and what can be implemented later). The former is engineering the latter is the art :)

Now, back to the incidents, I'm currently working on improving the incidents facility, mostly on deduplication (as we have too many incidents that are somehow instances of the same problem). I don't want to break any existing interfaces (even the textual representation, so no worries here) and I'm introducing a few new observations, namely, incidence-new-class, incidence-new-instance, incidence-new-representative. In short, all incidence will be now grouped by their locality class, so that closely related instances could be processed in one go. The number of incidence reports will remain the same, but now they will be grouped. Also, each class will have a representative, which is the incidence that subsumes all other incidences. The representative will be updated every time a better candidate is found. Below is the excerpt from the current design documentation, if you have any suggestions or requirements, please feel free to share, while I'm still on it. I will provide a library interface to incidents (and in the future, I will probably provide a plugin that will process incidents). Once this interface is provided, for robustness, I would recommend you switching on the library interface, instead of parsing. Agreed?

 Each incident is represented by its name and a tuple of
 locations. Each location is represented as a trace. It is not
 requred, in general, that all traces that denote locations of an
 incident should belong to the same path of execution (i.e.,
 traces could easily be disjoint). The last (the most recent)
 point of the trace is called the _endpoint_ or just the location,
 when it is obvious from the context that we mean only the
 endpoint of the trace.

 Incidents of the same name are partitioned into locality classes,
 so each incident belongs to one and only one class. We say that
 an incident [i] subsumes an incident [j] iff the endpoint of each
 location in [j] is is in the trace of a corresponding location in
 [i].

 Two incidents belong to the same class, if one the incidents
 subsumes another. Of the set of incidents that belong to the same
 class, an incident that subsumes all other incidents is selected
 as a representative of the class. Note, that there could be more
 than one incidents, as they may have equal locations, which
 differ only in machine identifiers, which are not considered in
 the substring relation.


 When an incident is reported it either belongs to an existing
 class or starts its own class.

 When an incident is an instance of an existing class <cls>, then
 we make an observation:
 [(incident-new-instance <name> <cls> <p1> ... <pM>)]

 When an incident is not a member of any class a new class is
 created and an observation is made:

 [(incident-new-class <name> <cls> <p1> ... <pM>)].

 When a newly discovered instance of a class is selected as a new
 representative, in addition to the new-instance observation  we
 also make an observation

 [(incident-new-representative <name> <cls> <p1> ... <pM>)]

Also, subscribing @gitoleg on this discussion :)

The text was updated successfully, but these errors were encountered:

Enkelmann · 2020-02-13T14:18:34Z

I have to admit that this is not a pressing matter for us as we are moving away from symbolic execution (in favor of more dataflow analysis) right now anyway. But we will gladly switch to a library interface if you provide one. :-) Until then we should address the incident duplication by smarter parsing on the cwe_checker side.

PS: I will write some Issues for BAP with feature requests the next time i find time for it.

ivg · 2020-02-13T14:43:48Z

Speaking of the dataflow.... right now, like a second ago, I've just implemented liveness analysis for subroutines (because our Sub.free_vars are still... let's say it straight broken), so I'm thinking now, should I publish it in the interface or not. My decision was ... meh, let's postpone it. But if you guys need liveness right now, I can publish it right now. Your call)

Enkelmann · 2020-02-13T14:55:51Z

We don't need liveness analysis right now. But it would still be a nice thing to have and could be useful for us in the future.

Enkelmann · 2020-03-16T14:16:51Z

We have rewritten the incident parsing for the emulation based checks (PR #52 ), so that all incidents pointing to the same target are summarized as one CWE hit (containing all paths found by BAP in the description).

For the time being, this solves the issue on the cwe_checker side. We may revisit the parsing when the current improvements on BAP Primus hit the stable branch.

Enkelmann closed this as completed Mar 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement more robust incident handling #50

implement more robust incident handling #50

ivg commented Feb 13, 2020

Enkelmann commented Feb 13, 2020

ivg commented Feb 13, 2020

Enkelmann commented Feb 13, 2020

Enkelmann commented Mar 16, 2020

implement more robust incident handling #50

implement more robust incident handling #50

Comments

ivg commented Feb 13, 2020

Enkelmann commented Feb 13, 2020

ivg commented Feb 13, 2020

Enkelmann commented Feb 13, 2020

Enkelmann commented Mar 16, 2020