Make result.ruleId a hierarchical string to accommodate "sub-rules" #146

michaelcfanning · 2018-04-17T00:25:40Z

Consider this case, a check such as 'Do Not Use Broken Cryptography Algorithms'. This check flags both Md5 and Sha1 usage. A method contains references to both algorithms. The user would like to suppress the MD5 use (because it is simply used for fast hashing purposes in a non-cryptographic context) without suppressing the SHA1 use (which, in fact, needs to be changed).

For this case, a baselining system needs to qualify the rule id that is used to create the suppression, for example:

FxCop.DoNotUseBrokenCryptopgraphyAlgorithms.Md5

Note that we previously used the 'toolFingerprintContribution' for this data. This concept really is a qualification of the rule id, however. Consider how this compares to an actual fingerprint contribution (such as when a tool grabs an encapsulating region of text surrounding an issue). The latter is, in fact, a partial fingerprint. The rule qualifier is a subtly different idea.

michaelcfanning · 2018-04-27T22:38:25Z

I will propose a format for this soon.

michaelcfanning · 2018-11-06T22:37:26Z

Simplest idea here would be to allow for the id to be a hierarchical string. We should perform some due diligence to make sure this doesn't break current static analysis providers.

michaelcfanning · 2018-11-27T17:01:58Z

@lgolding, can we author a change draft for this triage-approved item, at your convenience?

ghost · 2018-11-27T17:41:23Z

I marked it to-be-written and I'll do it today.

ghost · 2018-11-27T18:44:10Z

@michaelcfanning

This is a good idea. There are some complications to consider:

In your example, FxCop.DoNotUseBrokenCryptopgraphyAlgorithms.Md5 is actually a rule name, not a rule id; rule.id is required to be opaque. Should the spec require that either both or neither of the rule name and the rule id be hierarchical, and with the same number of components? So that, in your example, we might have rule.name.text be DoNotUseBrokenCryptopgraphyAlgorithms/Md5, and rule.id be CA5351/1?
Is it valid for result.ruleId to refer to only a left subset of the rule id? For example, if you have a set of rules parent/child1, parent/child2, can result.ruleId ever be just "parent"? It wouldn't make sense for your example, but it might for other examples.
Does there have to be a separate rule object for each sub-rule?

{
  "results": [
    {
      "ruleId": "CA5351/1",
      "ruleIndex": 0
      "message": {
        "messageId": "default"
      }
    },
    {
      "ruleId": "CA5351/2",
      "ruleIndex": 1,
      "message": {
        "messageId": "default"
      }
    }
  ],
  "resources": {
    "rules": [
      {
        "ruleId":  "CA5351/1",
        "messageStrings": {
          "default": "Don't use MD5. It's busted."
        }
      },
      {
        "ruleId":  "CA5351/2",
        "messageStrings": {
          "default": "Don't use SHA1. It's busted."
        }
      }
    ]
  }
}

Or is it ok to do this:

{
  "results": [
    {
      "ruleId": "CA5351/1",
      "ruleIndex": 0,
      "message": {
        "messageId": "default",
        "arguments": [
          "MD5"
        ]
      }
    {
      "ruleId": "CA5351/2",
      "ruleIndex": 0,
      "message": {
        "messageId": "default",
        "arguments": [
          "SHA1"
        ]
      }
    }
  ],
  "resources": {
    "rules": [
      {
        "ruleId":  "CA5351",
        "messageStrings": {
          "default": "Don't use {0}. It's busted."
        }
      }
    ]
  }
}

The latter has the disadvantage that the producer needs to make sure that the message arguments are consistent with the specified sub-rule.

ghost · 2018-11-27T18:50:09Z

Following up on point #3 above, some rule metadata might be common to all the sub-rules. Is it ok to do the following, which would require the rule object to have a parentIndex property?

{
  "results": [
    {
      "ruleId": "CA5351/1",
      "ruleIndex": 1
      "message": {
        "messageId": "default"
      }
    },
    {
      "ruleId": "CA5351/2",
      "ruleIndex": 2,
      "message": {
        "messageId": "default"
      }
    }
  ],
  "resources": {
    "rules": [
      {
        "ruleId": "CA5351",
        "shortDescription": {
          "text": "Don't use known busted crypto."
        },
        "fullDescription": {
          "text": Some crypto algorithms have been broken: SHA1, MD5, ... "
        }
      },
      {
        "ruleId":  "CA5351/1",
        "parentIndex": 0,
        "messageStrings": {
          "default": "Don't use MD5. It's busted."
        }
      },
      {
        "ruleId":  "CA5351/2",
        "parentIndex": 0,
        "messageStrings": {
          "default": "Don't use SHA1. It's busted."
        }
      }
    ]
  }
}

ghost · 2018-11-27T18:53:16Z

My opinions are:

Yes, require either or both of rule name and rule id to be hierarchical, with the same number of components.
No. Always require the fully qualified rule id.
Yes, require a separate rule object for every sub-rule. No, don't allow common metadata factoring.

The advantage of my answers to #2 and #3 is that it's the simplest to implement, and if we change our mind later, it's a non-breaking change.

michaelcfanning · 2018-11-27T19:07:19Z

FxCop.DoNotUseBrokenCryptopgraphyAlgorithms.Md5 is what FxCop refers to as a readable id. That is, it is intended to function as an opaque rule id does but also encode some readable information on the function of the rule. You're right, of course, rule.name is the SARIF place where this data would be persisted.

Let me clarify the request, which is for the result only to provide the ability to further qualify a result so that a more precise suppression can be applied. The request isn't intended to allow further granularity to be provided when populating the rules table.

IOW, you have proposed a new feature request. :) If you think this is valuable, create a tracking issue.

Here's the precise scenario this issue addresses:

There is a single rule 'DoNotUseBrokenCrypto', with corresponding rule metadata.
An analysis target, e.g., a method, might provoke two occurrences of this issue, by utilizing both SHA1 and MD5 in the same function. Both these algorithms are forbidden by the DoNotUseBrokenCrypto rule. Let us use CS1001, arbitrarily, as the ruleId
Each result could qualify its ruleId by providing a token that groups each result by the algorithm in question, by populating result.ruleId with CS1001/MD5 and CS1001/SHA1.
When performing rule metadata look-up, the consumer will consult the prefix only to retrieve the rule id.
An in-source suppression mechanism would allow users to selective suppress one sub-group or the other (or both sets) depending on how it is constructed. IOW, a user could specifically suppress (permit) all uses of MD5 (because they don't have a cryptographic purpose) while not suppressing the SHA1 usage.

See the SuppressMessageAttribute.MessageId property for an explanation of the scenario. https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.codeanalysis.suppressmessageattribute.messageid?view=netframework-4.7.2

This doc makes it clearer than I have that this sub-id is a qualifier on the message itself. Imagine a second scenario 'DoNotUseDangerousApi', which flags call sites to both strcpy and memcpy. In this case, you could use the API name as a way to create more suppressions precision, DoNotUserDangerousApi/memcpy and DoNotUseDangerousApi/strlen.

As always, a suppression can also be viewed as a configuration. i.e., you could imagine configuring a rule not to fire for DoNotUseDangerousApi/memcpy.

michaelcfanning · 2018-11-27T19:08:06Z

btw - if all the above isn't very clear or if our lack of clarity has opened new design fronts you'd like to pursue, we can table this issue for tomorrow's TC.

ghost · 2018-11-27T19:44:38Z

I get it. So in Visual Studio, if you right-clicked a result whose ruleId was CA1001/MD5, it would produce -- or at least, you are hoping it will produce -- a suppression like this:

[SuppressMessage("Microsoft.Security", "CA1001:DoNotUseBrokenCrypto", MessageId = "MD5")]

I'm curious how the Error List would know to populate the Category property ("Microsoft.Security"), the readable portion of the CheckId property ("DoNotUseBrokenCrypto"), and the MessageId property ("MD5"). Do the first two of those come from a ruleset file? Where does the last one come from? Can the SARIF extension intercept the "Suppress" gesture and populate all of this from the rule metadata? And if so, does that mean that our rule object needs a category property?

ghost · 2019-01-28T19:39:56Z

@michaelcfanning I'm going to start writing this, but please do take a look at my most recent comment, which asks how the"selective in-source suppression mechanism" that you describe in a previous comment would work in VS.

ghost · 2019-02-21T01:32:54Z

Changes merged to provisional draft.

michaelcfanning added the 2.1.0-CSD.1 Will be fixed in SARIF v2.1.0 CSD.1. label May 30, 2018

michaelcfanning added design-approved The TC approved the design and I can write the change draft triage-approved labels Nov 6, 2018

ghost added the to-be-written label Nov 27, 2018

ghost self-assigned this Nov 27, 2018

ghost changed the title ~~define a ruleSubId or define a rule id format that provides one~~ Make ruleId a hierarchical string to accommodate "sub-rules" Nov 27, 2018

ghost changed the title ~~Make ruleId a hierarchical string to accommodate "sub-rules"~~ Make result.ruleId a hierarchical string to accommodate "sub-rules" Dec 10, 2018

ghost pushed a commit that referenced this issue Jan 28, 2019

Change draft for #146: Hierarchical rule id

869d618

ghost added change-draft-available and removed to-be-written labels Jan 28, 2019

ghost pushed a commit that referenced this issue Feb 11, 2019

Add date for #146 to Editor's Report.

61b9be2

ghost added design-improvement impact-breaks-consumers impact-non-breaking-change resolved-fixed and removed change-draft-available impact-non-breaking-change labels Feb 18, 2019

ghost pushed a commit that referenced this issue Feb 21, 2019

Merge #146: Hierarchical rule ids.

3a5b40f

ghost closed this as completed Feb 21, 2019

ghost mentioned this issue Mar 15, 2019

Consider formalizing sub-rules in rules metadata #329

Closed

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make result.ruleId a hierarchical string to accommodate "sub-rules" #146

Make result.ruleId a hierarchical string to accommodate "sub-rules" #146

michaelcfanning commented Apr 17, 2018

michaelcfanning commented Apr 27, 2018

michaelcfanning commented Nov 6, 2018

michaelcfanning commented Nov 27, 2018

ghost commented Nov 27, 2018

ghost commented Nov 27, 2018

ghost commented Nov 27, 2018

ghost commented Nov 27, 2018

michaelcfanning commented Nov 27, 2018 •

edited by ghost

Loading

michaelcfanning commented Nov 27, 2018

ghost commented Nov 27, 2018 •

edited by ghost

Loading

ghost commented Jan 28, 2019

ghost commented Feb 21, 2019

Make result.ruleId a hierarchical string to accommodate "sub-rules" #146

Make result.ruleId a hierarchical string to accommodate "sub-rules" #146

Comments

michaelcfanning commented Apr 17, 2018

michaelcfanning commented Apr 27, 2018

michaelcfanning commented Nov 6, 2018

michaelcfanning commented Nov 27, 2018

ghost commented Nov 27, 2018

ghost commented Nov 27, 2018

ghost commented Nov 27, 2018

ghost commented Nov 27, 2018

michaelcfanning commented Nov 27, 2018 • edited by ghost Loading

michaelcfanning commented Nov 27, 2018

ghost commented Nov 27, 2018 • edited by ghost Loading

ghost commented Jan 28, 2019

ghost commented Feb 21, 2019

michaelcfanning commented Nov 27, 2018 •

edited by ghost

Loading

ghost commented Nov 27, 2018 •

edited by ghost

Loading