Review comments from @kupsch #366

ghost · 2019-04-10T19:08:26Z

sarif-v2.0-csd02-provisional-kupsch-comments.txt

ghost · 2019-04-10T19:21:26Z

@kupsch, If you have questions about any of the dispositions let's discuss them offline to keep the size of this issue manageable. If necessary we can open new issues for individual concerns.

p.16 1.2 artifact definition: a database table should not be included; it is neither byte nor URI addressable

I didn't remove this. The raw table isn't addressable, but db's can be queryable by URL, and I want SARIF to accommodate a database analyzer that says "You're storing clear-text passwords in column Password in the User table".
p.22 1.3 UNICODE10: Unicode 12.0.0 is the current version

FIXED.
p.26 3.3.2 text property: The document should be readable if the reference is removed, and having the name present aids the reader: "that [RFC8259]" -> "that JSON [RFC8259]"

FIXED.
p.30 3.4.6 Guidance on the use of artifactLocation objects "... the root of the source code enlistment." -> "... the path to the root of the directory tree containing the analyzed source code."

FIXED. But I kept the word "absolute", "the absolute path to the root of the directory tree...". It's true that SRCROOT, for example, might not be absolute (it might in turn appeal to PROJECTROOT), but ultimately you get an absolute path.
p.33 3.7.3 Array properties with unique values: "in [JSCHEMA01]" -> "in JSON Schema [JSCHEMA01]"

FIXED.
p.33 3.8.2 Tags: Add a note that for results and rules the preferred mechanism is to use taxonomies.

FIXED (good idea!) I don't think you would always use taxonomies for results or rules. It's a heavier-weight mechanism than tags, and I don't think a team should have to define a taxonomy just to tag a result as "ship-blocking". But I did explain that taxonomies are available for more formal classifications.

Also, I didn't put this in a (non-normative) note; I made it a normative SHOULD NOT to use tags to label results according to a formal taxonomy. That seems to me the right level of guidance: "Use taxonomies for this unless you have a good reason not to.":

Tags SHOULD NOT be used to label a result or a rule as belonging to a category in a formal classification system such as the Common Weakness Enumeration [CWE™] (for example, by adding a tag "CWE/622"). Instead, taxonomies (§3.18.2) SHOULD be used for this purpose.
p.35 3.9 Date/time properties: typo: "thos" -> "this"

FIXED (previously)
p.35 3.9 Date/time properties: Add note that producers SHOULD include resolution of at least whole seconds for timestamps that are not product release dates.

DONE.
p.35 3.10.1 General: 'The ":" delimiter is omitted' -> 'The ":" delimiter is omitted in the authority"

MOOT, because of the next change in this list:
p.36 3.10.2 URIs that use the file scheme:
Add: Remove empty path segments ("//" -> "/")
Add: If the path is a directory include a trailing "/" to match URI semantics [RFC3986]
Add: When normalizing file scheme paths, consumers shall remove empty path segments ("//" -> "/" and trailing "/" -> "").

FIXED, but in the opposite way from what you suggested. The bullet list was an intentionally abbreviated set of examples of "things RFC3986 tells us to do when normalizing." To eliminate the impression that the list was comprehensive, I removed it and reduce the note to this:

NOTE 2: Features of this normalized form include, for example, using upper-case hexadecimal digits in percent-encoded characters and expressing the scheme component in lower-case. For the full specification of the normalized URI form, see RFC3986.
p.38 3.11.3 Plain text messages: I think the SHOULD be a single paragraph and discussion of of line break sequences should be removed. Having this restrictions makes it very difficult to write a plain text version of the markdown version. There are many tools that output multiple paragraph plain text messages and plain text consumers can display them just fine. Generally, a post-process should not modify line break sequences. Viewers should start a line break when a line break sequence is encountered is possible. If a consumer cannot display multiple lines then it MAY eliminate line breaks.

DONE. We removed the requirement for a single paragraph. We changed the advice against line breaks (normative SHOULD NOT) to explicit permission to use them (normative MAY).

We stated that a viewer SHOULD respect line breaks. We did not bother to say that a viewer that can't display line breaks MAY eliminate them: "SHOULD" means "do this unless you have a good reason not to," and if you've decided you have a good reason not to respect line breaks, it implies that you're going to eliminate them.
p.44 3.13.2 version: why is the version 2.1.0 and not 2.0.0?

Because early on we made the mistake of publishing a "2.0.0" schema without a SemVer pre-release suffix. Now we have customers at MS who support the old 2.0.0 version and need a way to distinguish it from the new, "real" one when it's published.
p.45 3.13.5 example: The example is incorrect: externalPropertyFileReferences should be an object not an array

FIXED.
p.50 3.14.7 language: "by [RFC5646]" -> "by Tags for Identifying Languages [RFC5646]"

FIXED.
p.51 3.14.14 originalUriBaseIds property: It would be useful for client UX if there was a description for each uriBaseId to aid users in determining the meaning of each uriBaseId. 3.4.4 mentions that producers and consumers need to agree on the meaning which is not possible for generic viewer. Adding an optional "description" multiFormatMessageString object to artifactLocation object would allow the producer to communicate this information to users.

DONE. We made the description property a message rather than a multiformatMessageString so it can be localized by adding an entry to globalMessageStrings.
p.66 3.18.5 toolComponent.guid property: Add: "The guid is a stable identifier for the tool and as such all version of the tool SHALL have the same value."

DONE. I wrote that guid "... provides a unique, stable identifier for the component. guid SHALL NOT vary between versions of a given component."
p.66 3.18.8 semanticVersion property: "and semantics specified by [SEMVER]." -> "and semantics of Semantic Versioning [SEMVER]." Also: "to SemVer" -> "to Semantic Versioning"

FIXED.
p.72 3.18.26 artifactIndices property: why not array of artifactLocation. This seems to be the only place where artifacts must be in the artifacts array.

DONE.
p.94 3.25.5 ruleId property: I think that restriction on converters synthesizing ruleIds should be eliminated.

As ruleId is the unique id of the rule, if converters are not allowed to synthesize them, then there needs to be another mechanism to identify the same rule. This restriction on converters should be removed. Converters SHOULD synthesize them if possible. Consumers should be aware that results from different converters are not comparable. The advice to give is that consumers should consistently use the same converter or understand that there may be differences. These differences may extend beyond just the ruleId. Using different converters for the same tool should be avoided.

DONE. I also reworked Appendix D, "Production of SARIF by converters", to incorporate this guidance; see below.
p.158 3.48.3 reportingDescriptorRelationship.kinds property: How about MAY contain and make the default be [ "relevant" ]? This is a reasonable default if a tool developer doesn't want to think about relationships.

DONE.
p.175 Appendix D (Informative) Production of SARIF by converters: A converter does not introduce any more or less complexity if it synthesizes properties, than a tool would if it makes mistakes in its output. The advice for converters is to not change the semantics of the output without changing the semantic version of the converter.

Since the ruleId is important for viewer and result management system, it SHOULD be synthesized if possible.

If you have different converter for the same tool's native result output, then they can produce incompatible value. I find it unlikely that there will be multiple converter implementations for a given tool.

Consumers of SARIF should be aware that conversion results from different converters are not comparable. Consumers can compare results only if the convert is the same and the sematic version of the converter and tool are compatible. Consumers should use the same converter if they wish to compare results in different SARIF files.

DONE. I reworked Appendix D as you suggested, and made it Normative: it has lots of normative statements in it; they were just lower-case. I fixed that.
p.177 Appendix F (Normative) Producing deterministic SARIF log files: The algorithms used by many tools are inherently non-deterministic as the tool's algorithms may do random sampling or random traversals of the graphs that represent the code. Generally, they produce mostly the same result set, but there may be small differences between runs. This fact should also included in this appendix.

FIXED. This seemed important enough that I pulled it into a separate sub-section, F.5, "Inherently non-deterministic tools", as follows:

The algorithms used by some tools are inherently non-deterministic because, for example, they perform random sampling or random traversals of the graphs that represent the code. Generally, these tools produce mostly the same result set, but there might be small differences between runs.
p.183 Appendix H (Informative) Diagnosing results in generated files: I don't see a need for different originalUriBaseIds any more; just have multiple entries that resolve to the same absolute URI, but have a different values for for properties such as artifactIndex, hashes and lastModifiedTimeUtc; and the role will include "modified".

DONE (except for the "modified" part; that means "changed in source control since the baseline run." We can discuss additional roles for generated files.)
p.187 Appendix J.2 (Informative) Examples: "list:add" -> "list::add"

FIXED.
p.190 Appendix J.4 (Informative) Examples: "/home/buildAgent/src" -> "file:///home/buildAgent/src"

FIXED.
p.192-193 Appendix J.4 (Informative) Examples: multiple times: '"mimeType": "text/x-c"' -> '"sourceLanguage": "c"'

FIXED. (The samples were actually C++)
p.193-197 Appendix J.4 (Informative) Examples: "list:add" -> "list::add"

FIXED.

ghost · 2019-04-11T00:09:31Z

SCHEMA CHANGES

Here are the schema changes resulting from Jim's feedback. Only one of them is breaking.

We are considering one additional change, but for now this is what we know:

In the artifactLocation object:
- Add a property description of type message, optional.
  
  @kupsch: This addresses your request for the UI to be able to display a description for an originalUriBaseIds entry.
In the toolComponent object:
- Remove the artifactIndices property. BREAKING
- Add a property locations of type artifactLocation[], optional, minItems: 0, default: []
  
  @kupsch: This addresses your point that toolComponent.artifactIndices was the only thing in the spec that would require theRun.artifacts to exist.
In the artifact object
- Add a property description of type message, optional.
In the reportingDescriptorRelationship object:
- For the kinds property:
  - Make it optional, not required.
  - Provide a default of [ "relevant" ]
    
    @kupsch: This addresses your point that tool authors might want to declare relationships without specifying their kind.

ghost self-assigned this Apr 10, 2019

ghost pushed a commit that referenced this issue Apr 13, 2019

Change draft + merge for issues #366, #367, #368, #369, and #370.

5c92483

ghost added 2.1.0-CSD.1 Will be fixed in SARIF v2.1.0 CSD.1. enhancement impact-breaks-consumers impact-breaks-producers merged Changes merged into provisional draft. resolved-fixed labels Apr 13, 2019

ghost closed this as completed Apr 13, 2019

This was referenced Apr 15, 2019

@kupsch feedback responses #376

Closed

Editorial issues from Yekaterina #383

Closed

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review comments from @kupsch #366

Review comments from @kupsch #366

ghost commented Apr 10, 2019

ghost commented Apr 10, 2019 •

edited by ghost

ghost commented Apr 11, 2019 •

edited by ghost

Review comments from @kupsch #366

Review comments from @kupsch #366

Comments

ghost commented Apr 10, 2019

ghost commented Apr 10, 2019 • edited by ghost

ghost commented Apr 11, 2019 • edited by ghost

SCHEMA CHANGES

ghost commented Apr 10, 2019 •

edited by ghost

ghost commented Apr 11, 2019 •

edited by ghost