Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make result.ruleId a hierarchical string to accommodate "sub-rules" #146

Closed
michaelcfanning opened this issue Apr 17, 2018 · 12 comments
Closed
Labels
2.1.0-CSD.1 Will be fixed in SARIF v2.1.0 CSD.1. design-approved The TC approved the design and I can write the change draft design-improvement impact-breaks-consumers resolved-fixed triage-approved

Comments

@michaelcfanning
Copy link
Contributor

Consider this case, a check such as 'Do Not Use Broken Cryptography Algorithms'. This check flags both Md5 and Sha1 usage. A method contains references to both algorithms. The user would like to suppress the MD5 use (because it is simply used for fast hashing purposes in a non-cryptographic context) without suppressing the SHA1 use (which, in fact, needs to be changed).

For this case, a baselining system needs to qualify the rule id that is used to create the suppression, for example:

FxCop.DoNotUseBrokenCryptopgraphyAlgorithms.Md5

Note that we previously used the 'toolFingerprintContribution' for this data. This concept really is a qualification of the rule id, however. Consider how this compares to an actual fingerprint contribution (such as when a tool grabs an encapsulating region of text surrounding an issue). The latter is, in fact, a partial fingerprint. The rule qualifier is a subtly different idea.

@michaelcfanning
Copy link
Contributor Author

I will propose a format for this soon.

@michaelcfanning michaelcfanning added the 2.1.0-CSD.1 Will be fixed in SARIF v2.1.0 CSD.1. label May 30, 2018
@michaelcfanning
Copy link
Contributor Author

Simplest idea here would be to allow for the id to be a hierarchical string. We should perform some due diligence to make sure this doesn't break current static analysis providers.

@michaelcfanning michaelcfanning added design-approved The TC approved the design and I can write the change draft triage-approved labels Nov 6, 2018
@michaelcfanning
Copy link
Contributor Author

@lgolding, can we author a change draft for this triage-approved item, at your convenience?

@ghost ghost added the to-be-written label Nov 27, 2018
@ghost ghost self-assigned this Nov 27, 2018
@ghost
Copy link

ghost commented Nov 27, 2018

I marked it to-be-written and I'll do it today.

@ghost ghost changed the title define a ruleSubId or define a rule id format that provides one Make ruleId a hierarchical string to accommodate "sub-rules" Nov 27, 2018
@ghost
Copy link

ghost commented Nov 27, 2018

@michaelcfanning

This is a good idea. There are some complications to consider:

  1. In your example, FxCop.DoNotUseBrokenCryptopgraphyAlgorithms.Md5 is actually a rule name, not a rule id; rule.id is required to be opaque. Should the spec require that either both or neither of the rule name and the rule id be hierarchical, and with the same number of components? So that, in your example, we might have rule.name.text be DoNotUseBrokenCryptopgraphyAlgorithms/Md5, and rule.id be CA5351/1?

  2. Is it valid for result.ruleId to refer to only a left subset of the rule id? For example, if you have a set of rules parent/child1, parent/child2, can result.ruleId ever be just "parent"? It wouldn't make sense for your example, but it might for other examples.

  3. Does there have to be a separate rule object for each sub-rule?

{
  "results": [
    {
      "ruleId": "CA5351/1",
      "ruleIndex": 0
      "message": {
        "messageId": "default"
      }
    },
    {
      "ruleId": "CA5351/2",
      "ruleIndex": 1,
      "message": {
        "messageId": "default"
      }
    }
  ],
  "resources": {
    "rules": [
      {
        "ruleId":  "CA5351/1",
        "messageStrings": {
          "default": "Don't use MD5. It's busted."
        }
      },
      {
        "ruleId":  "CA5351/2",
        "messageStrings": {
          "default": "Don't use SHA1. It's busted."
        }
      }
    ]
  }
}

Or is it ok to do this:

{
  "results": [
    {
      "ruleId": "CA5351/1",
      "ruleIndex": 0,
      "message": {
        "messageId": "default",
        "arguments": [
          "MD5"
        ]
      }
    {
      "ruleId": "CA5351/2",
      "ruleIndex": 0,
      "message": {
        "messageId": "default",
        "arguments": [
          "SHA1"
        ]
      }
    }
  ],
  "resources": {
    "rules": [
      {
        "ruleId":  "CA5351",
        "messageStrings": {
          "default": "Don't use {0}. It's busted."
        }
      }
    ]
  }
}

The latter has the disadvantage that the producer needs to make sure that the message arguments are consistent with the specified sub-rule.

@ghost
Copy link

ghost commented Nov 27, 2018

Following up on point #3 above, some rule metadata might be common to all the sub-rules. Is it ok to do the following, which would require the rule object to have a parentIndex property?

{
  "results": [
    {
      "ruleId": "CA5351/1",
      "ruleIndex": 1
      "message": {
        "messageId": "default"
      }
    },
    {
      "ruleId": "CA5351/2",
      "ruleIndex": 2,
      "message": {
        "messageId": "default"
      }
    }
  ],
  "resources": {
    "rules": [
      {
        "ruleId": "CA5351",
        "shortDescription": {
          "text": "Don't use known busted crypto."
        },
        "fullDescription": {
          "text": Some crypto algorithms have been broken: SHA1, MD5, ... "
        }
      },
      {
        "ruleId":  "CA5351/1",
        "parentIndex": 0,
        "messageStrings": {
          "default": "Don't use MD5. It's busted."
        }
      },
      {
        "ruleId":  "CA5351/2",
        "parentIndex": 0,
        "messageStrings": {
          "default": "Don't use SHA1. It's busted."
        }
      }
    ]
  }
}

@ghost
Copy link

ghost commented Nov 27, 2018

My opinions are:

  1. Yes, require either or both of rule name and rule id to be hierarchical, with the same number of components.

  2. No. Always require the fully qualified rule id.

  3. Yes, require a separate rule object for every sub-rule. No, don't allow common metadata factoring.

The advantage of my answers to #2 and #3 is that it's the simplest to implement, and if we change our mind later, it's a non-breaking change.

@michaelcfanning
Copy link
Contributor Author

michaelcfanning commented Nov 27, 2018

FxCop.DoNotUseBrokenCryptopgraphyAlgorithms.Md5 is what FxCop refers to as a readable id. That is, it is intended to function as an opaque rule id does but also encode some readable information on the function of the rule. You're right, of course, rule.name is the SARIF place where this data would be persisted.

Let me clarify the request, which is for the result only to provide the ability to further qualify a result so that a more precise suppression can be applied. The request isn't intended to allow further granularity to be provided when populating the rules table.

IOW, you have proposed a new feature request. :) If you think this is valuable, create a tracking issue.

Here's the precise scenario this issue addresses:

  1. There is a single rule 'DoNotUseBrokenCrypto', with corresponding rule metadata.
  2. An analysis target, e.g., a method, might provoke two occurrences of this issue, by utilizing both SHA1 and MD5 in the same function. Both these algorithms are forbidden by the DoNotUseBrokenCrypto rule. Let us use CS1001, arbitrarily, as the ruleId
  3. Each result could qualify its ruleId by providing a token that groups each result by the algorithm in question, by populating result.ruleId with CS1001/MD5 and CS1001/SHA1.
  4. When performing rule metadata look-up, the consumer will consult the prefix only to retrieve the rule id.
  5. An in-source suppression mechanism would allow users to selective suppress one sub-group or the other (or both sets) depending on how it is constructed. IOW, a user could specifically suppress (permit) all uses of MD5 (because they don't have a cryptographic purpose) while not suppressing the SHA1 usage.

See the SuppressMessageAttribute.MessageId property for an explanation of the scenario. https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.codeanalysis.suppressmessageattribute.messageid?view=netframework-4.7.2

This doc makes it clearer than I have that this sub-id is a qualifier on the message itself. Imagine a second scenario 'DoNotUseDangerousApi', which flags call sites to both strcpy and memcpy. In this case, you could use the API name as a way to create more suppressions precision, DoNotUserDangerousApi/memcpy and DoNotUseDangerousApi/strlen.

As always, a suppression can also be viewed as a configuration. i.e., you could imagine configuring a rule not to fire for DoNotUseDangerousApi/memcpy.

@michaelcfanning
Copy link
Contributor Author

btw - if all the above isn't very clear or if our lack of clarity has opened new design fronts you'd like to pursue, we can table this issue for tomorrow's TC.

@ghost
Copy link

ghost commented Nov 27, 2018

I get it. So in Visual Studio, if you right-clicked a result whose ruleId was CA1001/MD5, it would produce -- or at least, you are hoping it will produce -- a suppression like this:

[SuppressMessage("Microsoft.Security", "CA1001:DoNotUseBrokenCrypto", MessageId = "MD5")]

I'm curious how the Error List would know to populate the Category property ("Microsoft.Security"), the readable portion of the CheckId property ("DoNotUseBrokenCrypto"), and the MessageId property ("MD5"). Do the first two of those come from a ruleset file? Where does the last one come from? Can the SARIF extension intercept the "Suppress" gesture and populate all of this from the rule metadata? And if so, does that mean that our rule object needs a category property?

@ghost ghost changed the title Make ruleId a hierarchical string to accommodate "sub-rules" Make result.ruleId a hierarchical string to accommodate "sub-rules" Dec 10, 2018
@ghost
Copy link

ghost commented Jan 28, 2019

@michaelcfanning I'm going to start writing this, but please do take a look at my most recent comment, which asks how the"selective in-source suppression mechanism" that you describe in a previous comment would work in VS.

@ghost
Copy link

ghost commented Feb 21, 2019

Changes merged to provisional draft.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.1.0-CSD.1 Will be fixed in SARIF v2.1.0 CSD.1. design-approved The TC approved the design and I can write the change draft design-improvement impact-breaks-consumers resolved-fixed triage-approved
Projects
None yet
Development

No branches or pull requests

1 participant