Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rule.deprecatedIds #293

Closed
michaelcfanning opened this issue Nov 29, 2018 · 3 comments
Closed

Add rule.deprecatedIds #293

michaelcfanning opened this issue Nov 29, 2018 · 3 comments
Labels
2.1.0-CSD.1 Will be fixed in SARIF v2.1.0 CSD.1. design-approved The TC approved the design and I can write the change draft domain-result-management enhancement impact-non-breaking-change resolved-fixed triage-approved

Comments

@michaelcfanning
Copy link
Contributor

michaelcfanning commented Nov 29, 2018

This request comes from our internal SARIF-driven multi-tool pipeline effort. A tool provider shipped a new version which included a significant refactoring of its rule set. Many rules were deemed too general (covering too many conceptual topics). This led to issues such as over-suppression and non-useful telemetry.

In one case, a single rule was broken into 14 new checks. The tools developer went to flight their analysis against a stable test set (to ensure no unexpected new results appeared or was dropped) and realized that the SARIF SDK result matching is completely broken, as the rule id is essential to the logical identity of a result.

To resolve the situation, we'd need to provide the result matcher a mapping table of old -> new rule ids. This data currently would need to be provided on the command-line or programmatically, which means the tool would need to have a facility for exporting this information or it would need to be curated somehow.

Since this is tool-specific knowledge, a preferable solution would be to add something like an array named rule.deprecatedRuleIds that contains legacy ids that map to current. This property does need to be an array, as tool developers don't always get it right the first time.

This array would be 'not required', minItems 0, and isUnique == true. During result matching, the matcher would consult this information in order to match two results that are identical except that the baselineId has changed (and is now referenced by a new rule with a new rule id, and the old rule id referenced in the deprecated ruleIds set).

I'm open to a new name. btw - this is a familiar problem for tools developers and really should have occurred to me already. E.g., Microsoft's FxCop tool has an internal rule name remapping feature that ensures existing SuppressMessage attributes continue to function. In the old model, a tool would wrap this entire universe (understanding of old rules, production of new results, mapping of results to in-source suppressions). In a SARIF environment, these things are decoupled and so it is helpful and appropriate for SARIF to help transfer this remapping information.

finally, for completeness, I'd guess that you may note that in some scenarios, remapping a more general id (e.g., rendered as an in source suppression) to a set of more specific terms doesn't literally resolve the over-suppression issue. That can be true for existing hard-coded suppressions. In a SARIF world, however, tools developers can refer to this information in a single result matching operation: the comparison of a baseline file produced by one version of a tool against a specific data set, to a new SARIF file produced by the new version of a tool, against the same data set. After this matching has occurred, all moving forward matches operate against the new ids only. we use the same approach to deal with changes in fingerprint generation. @lgolding

@ghost
Copy link

ghost commented Nov 30, 2018

@michaelcfanning I'm going to work through an example to make sure I understand the proposal.

I run a tool with an over-general rule CA1000, and I get this:

{
  "results": [
    {
      "ruleId": "CA1000",
      "ruleIndex": 0,
      "baselineState": "new",
      ...
    },
    {
      "ruleId": "CA1000",
      "ruleIndex": 0,
      "baselineState": "new",
      ...
    }
  ],
  "resources": {
    "rules": [
      {
        "id": "CA1000",
        ...
      }
    ]
  }
}

I decide the results are false positives, so I add in-source suppressions for both:

[SuppressMessage("CA1000", ...)]
...
[SuppressMessage("CA1000", ...)]

Now a new version of the tool breaks the rule into CA1001 and CA1002. I re-run the tool and get this:

{
  "results": [
    {
      "ruleId": "CA1001",
      "ruleIndex": 0,
      "baselineState": "existing",
      "suppressionStates": [
          "suppressedInSource"
      ],
      ...
    },
    {
      "ruleId": "CA1002",
      "ruleIndex": 1,
      "baselineState": "existing",
      "suppressionStates": [
          "suppressedInSource"
      ],
      ...
    }
  ],
  "resources": {
    "rules": [
      {
        "id": "CA1001",
        "deprecatedIds": [
            "CA1000"
        ],
        ...
      },
      {
        "id": "CA1002",
        "deprecatedIds": [
            "CA1000"
        ],
        ...
      }
    ]
  }
}

The result mapper understood that the new results[0] with "ruleId": "CA1001" matches the old results[0] with "ruleId": "CA1000", so it correctly marked the result with "baselineState": "existing".

Also, the tool detected "CA1001" and "CA1002" in the same locations it previously detected "CA1000". Again, it understood that it's the same result, so it correctly marked the results with "suppressedInSource".

Is that it? It doesn't even look like the engineer needs to edit the suppressions to mention the new rule ids, although that's probably a good idea.

@michaelcfanning
Copy link
Contributor Author

Yes, that's it. Yes, if the log file always contains these ids, in-source suppressions never need be changed (this is true today with the FxCop mechanism). Systems that depend exclusively on side-car suppression mechanisms can temporarily use this data as a 'bridge' mechanism to update the tool. Finally, log file data could be used to help in rewriting scenarios (of both sidecar files and also in-source suppressions).

@ghost
Copy link

ghost commented Dec 7, 2018

FYI since the property is in a rule object, I think deprecatedIds suffices as the property name.

@ghost ghost changed the title Add rule.deprecatedRuleIds Add rule.deprecatedIds Dec 7, 2018
ghost pushed a commit that referenced this issue Dec 8, 2018
@ghost ghost self-assigned this Dec 8, 2018
@ghost ghost added change-draft-available design-approved The TC approved the design and I can write the change draft labels Dec 8, 2018
ghost pushed a commit that referenced this issue Dec 9, 2018
ghost pushed a commit that referenced this issue Dec 13, 2018
@ghost ghost closed this as completed Dec 13, 2018
ghost pushed a commit that referenced this issue Jan 3, 2019
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.1.0-CSD.1 Will be fixed in SARIF v2.1.0 CSD.1. design-approved The TC approved the design and I can write the change draft domain-result-management enhancement impact-non-breaking-change resolved-fixed triage-approved
Projects
None yet
Development

No branches or pull requests

1 participant