Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define driving principles for SARIF effort #1

Closed
michaelcfanning opened this issue Sep 20, 2017 · 7 comments
Closed

Define driving principles for SARIF effort #1

michaelcfanning opened this issue Sep 20, 2017 · 7 comments
Labels

Comments

@michaelcfanning
Copy link
Contributor

michaelcfanning commented Sep 20, 2017

We should articulate and maintain a set of driving principles for the SARIF format. These principles should define a vision for the format in general and be useful for resolving difficult design decisions. Below is a starter list that we should refine, add to or subtract from.

  1. SARIF is primarily designed to advance the industry by providing the best direct production format possible. Aggregating results from other formats is another important scenario but secondary to direct production.

  2. SARIF defines a range of data that shall be expressed in order to best support static analysis tooling. The specification describes a JSON implementation of this standard. It should be possible to define other implementations (such as XML).

  3. SARIF is designed for static analysis tools and any concept that generally applies for this scenario shall be considered for the format. SARIF can clearly be used for many dynamic analysis scenarios and we should consider augmenting the format for this class of tooling, but not in cases where what is proposed is applicable to the dynamic analysis domain only.

  4. SARIF is domain-agnostic; that is, it does not contain objects or properties that are specific to a single domain, such as security or compliance. However, SARIF might define specific values for properties that are specific to a single domain. For example, the proposed result.taxonomies property might define a dictionary entry whose key invokes a standard classification for memory safety issues only.

  5. The SARIF design is focused on expressing results as produced by a tool at a specific point-in-time and current excludes detailed thinking related to results management (associated result work item, false positive evaluation, etc.). These concepts may be addressed by defining or proposing 'profiles' that broaden SARIF's design surface area, contingent on progress with core work.

@ghost
Copy link

ghost commented Sep 20, 2017

As discussed in the TC meeting of 2017-09-20, I propose an additional principle:

  1. SARIF is domain-agnostic; that is, it does not contain objects or properties that are specific to a single domain, such as security or compliance. However, SARIF might define specific values for properties that are specific to a single domain. For example, the proposed result.taxonomies property might define a dictionary entry whose key is "CWE", even though that value is specific to the security domain.

@michaelcfanning
Copy link
Contributor Author

I've updated the principles with your suggestion but removed the statement that CWE's are specific to the security domain, as they actually address a range of quality issues. I myself set up this false comparison in conversation, my apologies.

I updated your example to refer to a categorization scheme (for memory safety problems) raised in the TC discussion by Henny.

@michaelcfanning
Copy link
Contributor Author

I've added a new suggestion:

5.The SARIF design is focused on expressing results as produced by a tool at a specific point-in-time and current excludes detailed thinking related to results management (associated result work item, false positive evaluation, etc.). These concepts may be addressed by defining or proposing 'profiles' that broaden SARIF's design surface area, contingent on progress with core work.

@DerSaidin
Copy link

The word "best" is used in 1 and 2, but the criteria for what is better is not defined.

Is a simple and easy design better?
Is a complex but comprehensive design better?
If a design is very good at aggregating results, but more complex to implement, is it better?

@michaelcfanning
Copy link
Contributor Author

Good point and I was just looking at this comment in my notes. 'Best' as defined here could refer to maximizing development possibilities for consumers. The goal is to allow viewers, data analytics engines, etc., the ability to provide rich features against data files.

We project an eco-system that, as it builds, creates the following virtuous cycle: 1) producers are motivated to build rich SARIF producers, because this relatively small effort enables a broad range of useful viewers, log processors and other features that target the format, 2) consumers are motivated to build features due to the leveraged development costs, one feature consuming SARIF allows that functionality to operate against all SARIF producers.

The model above suggests that our design focus should be focused on whether it is useful, in the main, for a consumer have access to a piece of proposed data (rather than whether tools, in the main, emit that data). This focus also requires us to consider whether a piece of information (such as a result rank) can be normalized in a way that allows features to aggregate, process and/or display output across all producers.

@michaelcfanning
Copy link
Contributor Author

  1. The primary purpose of SARIF is to enable low cost development of rich functionality (viewers, work item filers, etc.) that operates against a broad range of SARIF producers. A key design principle for all SARIF properties, therefore, is that any proposed data should be clearly useful in a consumption scenario.

  2. As an important but secondary concern, SARIF is designed to allow the output of existing tools to be normalized to a common format. In order to support the ability for consumers to process, display, etc., this information in an appropriate and consistent way, it must be possible to normalize any proposed SARIF data to a common form.

  3. The SARIF format specification should clearly describe the semantic meaning and intended purpose for all properties, to assist producers in populating this data with values that drive effective consumption.

@michaelcfanning
Copy link
Contributor Author

Principles reviewed/approved by TC and checked in at https://github.com/oasis-tcs/sarif-spec/blob/master/Documents/GuidingPrinciples.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants