-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide optional result.rank value of 0.0 to 100.0 #280
Comments
Your popularity with this audience is proof against reraising this issue. :) I am personally open to revisiting this issue in any case. There are a handful of limits that we drew in our earlier design/discussion that were helpful to focus efforts on what was most actionable. In some cases, these decisions were skewed by problems we perceived in unifying outputs of existing tools. This is one of those topics. At this point, however, having (presumably) landed a substantive body of relevant data that covers a broad range of open source, commercial and private industry tooling, it's worth discussing the best design for direct production of useful concepts where we understand our decisions will work for some but not all. I like your proposal. I'd suggest that we call this property 'ranking'. The term 'priority' may be useful. The only rules are that a value for one result which is higher than the value for another means that the former result should be priotirized over the latter. There isn't much difficulty in this sorting mechanism for an output of a single tool (absent radical changes in prioritization between tool versions). In order to handle interleaved sorting for multiple tools, we could recommend that tools distribute their ranking values in some common way. Simply dividing the range into rough quartiles (not very important, possibly important, important, and very important) could be useful. Alternately, we could be very clear: this relative comparison is only relevant to results produced by the same tool. That approach would provide more utility, I think, allowing direct use of things like ML model computations where the bucketing suggested above might be an arbitrary or impossible to realize process. This topic relates to other open quetsions around how to categorize results. We really should consider all of these at once as we close on them. I will propose to Larry that we apply 'results-categorization' as a label so we can track them all. |
@lgolding, your thoughts? I'd like to propose that we develop a change draft for this request. I think we probably should have a float here, as rounding to an integer from 0 - 100 might not provide sufficient precision, particularly for model generated rankings. So let's start with either a float from 0.0F to 1.0F or 0.0 to 100.0F. Note that this request clearly falls outside the ambiguity around the degree to which we get inside the results management domain: producing a prioritization/ranking value is something that we expect tools producers to do. |
I am open to this idea but first I'd like to clear up a point of confusion that keeps creeping into this discussion. @michaelcfanning wrote:
"Priority" is not the same as "level" or "severity". "Severity" means "how bad is the bug". If a problem is too severe, something bad happens. Your code won't compile, or your device driver won't get the Windows logo, or you won't be able to sell your product in the EU. An analysis tool can assign a value for "severity". "Priority" means "how important is it to fix the bug". Sure, the code won't compile, but we're about to delete the code. Sure, the device driver won't get the Windows logo, but we're not submitting for logo certification until next year. Sure, we can't sell in the EU, but this product won't ship to the EU. A tool cannot assign a value for "priority" -- but a human being, or a tool that is aware of the engineering teams's process and context, might. I made this point repeatedly in #20. If you think I'm wrong, please correct me; otherwise, let's leave "priority" out of this discussion. Please note that #20 was about whether we should add
Ok, but this issue (#280) is about severity and how to represent it. So here are my thoughts:
|
My apologies for not making this clearer. I reject the notion of documenting severity in favor of approving a ranking or prioritzation. The problem with severity in a static analysis context is that the presumed riskof an issue (and a ranking for it) cannot easily be separated from the certainty that we've identified an issue. So, a presumed persistent XSS issue would be quite severe indeed but the reality may be that our detection technique is very rough and likely to produce false positives. A simple prioritization or ranking property avoids making strict assertions about defects while still allowing them to apply their own internal ranking mechanisms. Several tools have these. This kind of ranking can also be applied post facto by producing ML models that integrate other things (like reliability data) into the computation. I think this is clear, but I don't find much in your previous reply that's relevant to my specific suggestion. :) My position is the our existing result levels reflect process designations (these errors must be fixed, these errors should be fixed, you may be interested to know about these other errors that you will never fix). There is a coarse relationship to severity but in practice many issues that could, in fact, be severe if a true positive are dropped down for other reasons. We do have an open issue around result level where it will be useful to recall all your feedback. I've updated the title of this issue to reflect that we're discussing ranking/prioritization. |
I would love to have a numeric priority/ranking property, as I made clear in #20. But you are saying you don't want SARIF to represent "severity" at all. As you wrote:
(my emphasis). Rather, you apparently see our existing "severity-centric" I have multiple concerns with that:
|
Sorry again for the confusion. My statement was around this specific issue. In this specific issue, I am discussing prioritization, not severity. I am glad to discuss severity, but would like to keep it distinct from the discussion around ranking/priority. And I don't see level as a surrogate for ranking. #215 is the open issue around result.level, which provides an open door to revisiting this property. Let's please focus this issue on the proposal for priorizing/ranking issues using a percentile value from 0.0 to 1.0. If you agree on this, great, let's produce a change for that. Glad to discuss severity otherwise in another issue. |
I just talked with @michaelcfanning. We just had a misunderstanding. When Michael wrote:
... he did not mean to suggest that we shouldn't represent "severity" in SARIF. He meant that he didn't want to discuss severity in this issue! He wants to limit the discussion in this issue to priority/ranking. phew So bottom line: I support the addition of a property named I prefer "priority" to "ranking" because "ranking" doesn't state the criterion according to which the result is being ranked. I prefer 100.0 to 1.0 so that most of the time an integer persisted value will suffice. |
I think we all agree (you, paul and i) that a priority property of value 0.0 to 100.0 makes sense. How about producing a change draft? It is interesting to wonder whether it would be helpful to allow for users to specify named pairs here (to allow decorating results with multiple rankings). A tool might emit its own, in which case it could simply use its tool name as the priority label, or perhaps the absence of a priority 'name' would indicate it was emitted directly by the tool). Could be a dictionary. Could be an ordered array of unique members, as we wouldn't expect this set to be numerically intimidating. |
Let's do the simple thing (a |
This is where 'ranking' more accurately reflects tooling. 0 s/be the lowest value. You are correct about pri0 being the highest MS priority, but the only other possible values are 1, 2, 3 and sometimes 4. I prefer ranking as a term because that's what tools are generally doing. The priority of an issue can change for lots of reasons, outside of tooling control. This is all semantics, however. I care less about the term than I do that 0.0 s/be the lowest ranking/priority and 100.0 s/be the highest. Like a test score. Agree this property needs to be a non-integer value. Most of our ranking techniques here have greater precision than afforded by an integer range of 0 - 100. |
How about we split the difference. My objection to "ranking" was that it didn't say what we being ranked. How about |
I like the idea of using the term "ranking" (or just plain "rank"). I look forward to seeing the change draft. |
it is, as usual, a property of the result. would result.rank feel clearer to you? thinking it through, 'ranking' feels like the right term to me only because i follow sports in a few specific contexts. :) that also happens to be the term that was used in the prominent internal MS checkers that provides this data. i think 'result.rank' is perfectly clear if you agree |
I yield. I still think |
- Define ruleConfiguration.defaultRank. - Note that rank values from different tools might not be commensurable.
I hope this isn't going to make me unpopular, but I really want to revisit this issue.
I've re-read the contents of issue #20 and conclude that the standard would be better if we allowed tools to specify a numeric value for the severity of a rule and a result.
In issue #20 it was argued that the "level" property is intended to represent severity. Fair enough, but that property conflates another concept that I consider different; roughly speaking it's the response of the checker.
The problem I'm facing is that most of the tools I work with (not just CodeSonar) specify positive results that indicate a potential problem with the code, and they mostly specify a severity that is more precise than can be expressed with level. In the current spec, if the checker yields a warning, then level can be "note", "warning" or "error". None of the other values for level appear to be useful for this category of tools. That only gives me three values, and this isn't nearly enough resolution. Given that we have already conceded that level can indicate severity, why not allow it to be specified with more precision?
I don't buy the argument that it is equally good to put this information in a property bag. As a Sarif importer, this mandates special code for each Sarif generator because it is unlikely that all tools will agree on the name of the property, and the domain of its values.
Consequently I propose that we have a "severity" property that is an integer from zero to 100 where higher values indicate higher severity. I don't feel strongly about the domain. A float from 0.0 to 1.0 would be just as good.
The text was updated successfully, but these errors were encountered: