Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tool.driver required inconsistent with externalization #406

Closed
ghost opened this issue May 10, 2019 · 10 comments
Closed

tool.driver required inconsistent with externalization #406

ghost opened this issue May 10, 2019 · 10 comments
Labels
2.1.0-CSD.1 Will be fixed in SARIF v2.1.0 CSD.1. bug impact-non-breaking-change merged Changes merged into provisional draft. resolved-fixed

Comments

@ghost
Copy link

ghost commented May 10, 2019

PROBLEM

This problem was pointed out by an internal MS customer:

  • run.tool is required.
  • toolComponent.driver is required (run.tool is a toolComponent object).
  • run.tool.driver is externalizable.
  • The spec says that if something is externalized, you can't repeat it in the root file.

So if you externalize run.tool.driver, you can’t supply it in the root file. That causes two problems: (1) The file is invalid because run.tool.driver is required, and (2) Someone reading the root file can’t easily tell which tool produced it.

Note that #2 is an inconvenience, but #1 is a genuine inconsistency in the spec: you’re allowed to externalize this thing, but the root file is invalid if you do.

PROPOSAL

Allow run.tool.driver to be present on the root file, even if it is externalized, and say that any properties you supply on run.tool.driver must match the corresponding properties in the externalized tool.driver.

This allows you to supply a minimal set of identifying properties in the root file:

{
  "tool": {
    "driver": {
      "name": "CodeScanner"
      "semanticVersion": "3.0.2"
    }
  },
  …
}

… and provide the full object externally.

@ghost ghost added bug 2.1.0-CSD.1 Will be fixed in SARIF v2.1.0 CSD.1. labels May 10, 2019
@ghost ghost self-assigned this May 10, 2019
@kupsch
Copy link

kupsch commented May 10, 2019

When we were discussing the tool design, this is the problem that my rejected design that had toolData for the tool localization was trying to prevent.

The design was essentially to put all the toolComponents in a toolData property that is an array of toolComponents, and have the tool driver and extensions just be similar to a toolComponentReferences (see below). This cleanly separates "what tool (driver and extension) used" from "a tool's static metadata". It also naturally fits the model for external array based properties.

The toolComponentReference-like object should have at minimum the guid, version info, and locations (since locations depends on the tool use and is not static set of data):

  • guid
  • index (if present in toolData)
  • semanticVersion
  • version
  • locations

The following addition properties should also optionally be allowed as they are useful to determine the tool in a human understandable way and how to find information about the tool:

  • name
  • fullName
  • product
  • productSuite
  • dottedQuadFileVersion
  • downloadUri
  • informationUri

The toolComponent's locations property could be removed, leaving the toolComponent with purely static data that can be generated at tool creation time.

@ghost
Copy link
Author

ghost commented May 15, 2019

@kupsch @michaelcfanning

Because we can't take any breaking changes at this point, I'm going to stick with my proposal above: Repeat any properties you want between the inlined and external version of run.tool.driver -- but any properties that are present in the inline version must match the external version.

@kupsch
Copy link

kupsch commented May 16, 2019

@lgolding @michaelcfanning

To make the ToolComponent a static file that can created once for each version of a tool, then there needs to be a mechanism to point to an external file of static data that is identical for each version of the tool (which we have), and also a mechanism to state the "locations" which are determined when the tool is run.

If you stick with the your proposal, it should state that common properties in an internal toolComponent and the external file represent a toolComponent SHALL be identical, but both the internal and external representation MAY have properties that are not found in the other. Generally the external file will have many properties not found in the internal toolComponent, and the internal toolComponent will at most have a locations array that is not found in the external toolComponent.

I think that it also needs to state that if you have both an internal toolComponent and external toolComponent then the guid must be present in both to know which internal toolComponent corresponds to which external toolComponent (not an issue with driver since there is only one, but it is an issue with extensions).

The lookup process then needs to be performed as if a new object was created that is the union of the set of properties and their values from both the internal toolComponent and the toolComponent in the external property file, and all lookups occur using this union object.

My proposal above avoids is simpler, but to move things forward, if you change the working to discuss combining unique properties in both the internal and external toolComponents, and add the guid requirement, then this is workable.

@michaelcfanning
Copy link
Contributor

To me, this scenario presents the same challenges as handling property bags in baseline comparison scenarios. It is not clear in this case whether a new property bag replaces the previous, or if the baseline version has precedence, or if some sort of merging should occur. At least in this scenario, it's entirely clear whether a property was explicitly persisted on each side. In the SARIF C# object model, we can't differentiate between explicitly provided default values (and our schema doesn't generally provide for making this explicit by providing null values in JSON everywhere).

Because of all this, I am hesitant to over-specify for this topic. In my opinion, given our design, anyone working with our externalized/cached data/property bags, will need to define relevant constraints/approaches for utilization. Since we can't identify a satisfying general approach, my proposal is that we leave it under-specified.

@ghost
Copy link
Author

ghost commented May 16, 2019

@michaelcfanning @kupsch

The principle here is: The external file is the source of truth. The spec says (or at least, it intends to say -- it seems I wasn't clear) that if you want to repeat some of the external properties in the root file, purely for the sake of reader convenience, then you are welcome to do that. That's all.

@dmk42
Copy link
Contributor

dmk42 commented May 16, 2019

@lgolding - That doesn't gore my ox, so I'm not objecting, but I just note that it might better serve speed of processing to go the opposite direction and say that the main file is the source of truth.

@ghost
Copy link
Author

ghost commented May 16, 2019

@dmk42 An analysis tool has to read the external file anyway to get the "big" data -- for example, the rules definitions.

A viewer is welcome to just display what's in the root file, secure in the knowledge that if it doesn't match the external file then it's dealing with invalid SARIF, in which case all bets are off.

A validator would need to open both files anyway, to verify that anything present in the root file matches the external file.

So I can't think of a scenario where the statement "the external file is the source of truth" slows down a consumer.

@michaelcfanning
Copy link
Contributor

Your position doesn't address Jim's concern, which is to have the flexibility to add properties in the root tool object that aren't present in the external file.

@kupsch
Copy link

kupsch commented May 16, 2019

If the external file is the source of the truth, then there is a large inefficiency issue here. If a tool vendor fully populates all of its notification and taxonomy data the toolComponent object will be large. If the external file is the source of the truth, and the tool emits the tool's locations data, then there must be a unique external file for each run, this means that there is a new guid. This is more effort for the tool as it can not just create a URI to a static file that is installed with the tool, or just producing a copy of this file, it must generate the toolComponent external file which is mostly boilerplate except the locations array. It is also inefficient for the consumers and result management system as they need to keep copies of these unique files or reconstruct them.

A ticket should be opened to fix this in a future version of SARIF.

@michaelcfanning
Copy link
Contributor

I tend to agree with you and that's why I would prefer to leave this under-specified. Larry's position is very clear and his process can be followed (which we all agree is a terrific value to look for in the design). If that clear process which can be followed limits utilization in undesirable ways, then we haven't bought ourselves anything.

It's better in my opinion to leave more flexibility in the spec. My sense is that this scenario will mostly be driven by larger systems (such as the SWAMP) that exercise a fair amount of control over the inputs to the system. If the SWAMP finds it desirable to populate some root level log properties but to only inject others into external files, that should be possible.

@ghost ghost added the merged Changes merged into provisional draft. label May 20, 2019
@ghost ghost closed this as completed May 20, 2019
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.1.0-CSD.1 Will be fixed in SARIF v2.1.0 CSD.1. bug impact-non-breaking-change merged Changes merged into provisional draft. resolved-fixed
Projects
None yet
Development

No branches or pull requests

3 participants