New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Error format support, and JSON output option #11396
base: master
Are you sure you want to change the base?
Conversation
mypy/error_formatter.py
Outdated
return json.dumps({ | ||
'file': file, | ||
'line': line, | ||
'column': column, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line
, column
seems short-sighed.
Maybe startLine
, startColumn
, which would leave room to later add endLine
, endColumn
. This information is useful for IDEs to know how what span to highlight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good point.
But mypy currently only does line and column, and it might be very long before spans are added in. It could be argued that the change to startLine
and startColumn
can be made when the feature exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the field names later will break tools though. And startLine
, startColumn
would already be accurate right now, because mypy currently points out the start of the span.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave this convention for the separate --output=sarif
format.
I'd like to make the case again for the existing JSON-based standard: Static Analysis Results Interchange Format (SARIF) While that does not preclude support for a custom JSON format as well, perhaps if mypy were to support SARIF, there would be no need for a custom format? Pros of SARIF:
Cons:
[1] The plugin ratings are 3.4 and 2.9 out of 5 😖 https://plugins.jetbrains.com/search?search=mypy Example minimal mypy SARIF output{
"version": "2.1.0",
"$schema": "https://schemastore.azurewebsites.net/schemas/json/sarif-2.1.0.json",
"runs": [
{
"tool": {
"driver": {
"name": "mypy",
"version": "0.910"
}
},
"results": [
{
"ruleId": "assignment",
"level": "error",
"message": {
"text": "Incompatible types in assignment (expression has type \"str\", variable has type \"int\")"
},
"locations": [
{
"physicalLocation": {
"artifactLocation": {
"uri": "mytest.py"
},
"region": {
"startLine": 2,
"startColumn": 4
}
}
}
]
},
{
"ruleId": "name-defined",
"level": "error",
"message": {
"text": "Name \"z\" is not defined"
},
"locations": [
{
"physicalLocation": {
"artifactLocation": {
"uri": "mytest.py"
},
"region": {
"startLine": 3,
"startColumn": 4
}
}
}
]
}
]
}
]
} |
+1 for SARIF format, let's use a common format instead of a mypy custom one ! :) |
I'm willing to turn it into SARIF (basing it on the json snippet provided by @intgr), if that's what I need to get this into mypy 😄 @ethanhs @TH3CHARLie what do you think? |
After sitting on it a little, I feel that there's room for more than one machine-readable format. The simpler json-line-based format is probably better for simple tools that only care about mypy. I'm willing to implement the SARIF part myself. But I shouldn't be the one to decide that, I'm just an occasional lurker here. No mypy developers have chipped in yet. |
I'm just a tourist here, but i'm currently activating SARIF output for all linters of MegaLinter, and having native SARIF output is a great benefit for linters ^^ ( + the Github CodeQL that natively understands SARIF format ^^ ) Some other python linters already have SARIF output, like bandit , maybe there is some code to reuse to manage the format ? |
@nvuillam this PR introduces an ErrorFormatter class. By the time this PR is finalized, even if it doesn't use SARIF you will be able to define your own ErrorFormatter class in a plugin probably, and tell it to output the SARIF format, it'll be really easy. Pylint has the same setup. |
@tusharsadhwani thank but... I don't have the bandwidth to implement SARIF output on all linters that do not manage it yet 😅 |
I think one property that machine-readable formats should have is: if one error causes multiple lines of output, then that should appear as one result item rather than multiple. So for example with
should maybe be output as {
"file": "_local/multi_error.py",
"line": 5,
"column": 8,
"severity": "error",
"context": "In function \"foo\":",
"message": "No overload variant of \"get\" of \"Mapping\" matches argument type \"str\"",
"hint": "Possible overload variants:\n def get(self, key: Any) -> Any\n def [_T] get(self, key: Any, default: Union[Any, _T]) -> Union[Any, _T]",
"code": "call-overload"
} But again, maybe that shouldn't be a blocker for this PR, getting machine-readable output can be useful even when findings aren't accurately grouped. The negative line/column numbers right now seem awkward though: {"file": "_local/multi_error.py", "line": -1, "column": -1, "severity": "note", "message": "In function \"foo\":", "code": null} |
That's definitely a bug. The mypy code that generates this output was ridiculously coupled. I'll take a look at if this can be fixed. |
It has been many months as the project has hoped for a best solution over an existing solution. Can we merge this and accept future improvement using |
@intgr I did it properly this time. Hints should be fixed now. |
|
@sehyun-hwang sure thing. If you could provide a reproducible example I can look into it right away. |
@tusharsadhwani Thank you! I tried linting jobs in a container, and was unable to reproduce it. The problem occurs only when my IDE invokes dmypy, so I'm suspecting this has to do with concurrent execution. Do you see a chance where |
I found that the first file linted output a json format, but from the second one it goes back to the string format. Can you try to reproduce this?
|
@sehyun-hwang This seems like a dmypy bug. It seems that dmypy ignores cli flags or config file flags in certain cases. Here's a file structure: $ tree .
.
├── a.py
├── b.py
└── mypy.ini And [mypy]
strict = True
def foo(): pass This way |
@intgr Could you take a look at the comment by @tusharsadhwani ? |
I'm afraid I can't help much with this. Just to clarify, I think my name shows up in the "reviewers" list only because I left a code comment about this PR earlier. I'm not a maintainer or reviewer in an official capacity and I don't have much knowledge of mypy's internals. I'm just a user interested in consuming structured output from mypy (and potentially adding SARIF support later on). That's why I shared my opinions in this comment thread. |
According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉 |
According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉 |
@sobolevn PR is ready for review! I've added tests, and taken care of code review comments by Ivan. |
Diff from mypy_primer, showing the effect of this PR on open source code: pyinstrument (https://github.com/joerick/pyinstrument) got 1.58x faster (14.1s -> 8.9s)
|
mypy/errors.py
Outdated
|
||
error_info = self.error_info_map[path] | ||
if formatter is not None: | ||
error_info = [info for info in error_info if not info.hidden] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not duplicate this and instead pass error_tuples
into format_messages
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mostly okay with this. The one thing I'm a little concerned about is the additional layer of translation we're doing to this MypyError
class.
Why can't we just emit the fields of the ErrorTuple basically as is? This keeps the mapping from normal mypy output to JSON mypy output trivial. Users interested in this can munge however best, e.g. I can imagine folks being interested in the output of reveal_type
which is of severity "note" that this PR currently drops. Keeping additional translation here minimal reduces maintenance risk.
@hauntsaninja my concern here was whether the If we plan on, in the future, let people pass their own I've myself seen Even if we don't end up doing custom formatters, I think the decoupling is worth it for easier definition of new output formats. |
Hmm, adding fields shouldn't be a break. I'm okay with doing a minor refactoring, e.g. at the least we should make it a NamedTuple. Overall, I think easier to maintain ErrorTuple than an additional translation layer. And if we do think we'd break ErrorTuple and we think that breaking it isn't acceptable, we could always add the decoupling from internals then. |
|
@hauntsaninja I checked again, and now I remember why I did the You can have 3 Due to this, there's no direct way to turn So I don't think we can get rid of the translation from tuples to a different object. |
Diff from mypy_primer, showing the effect of this PR on open source code: dacite (https://github.com/konradhalas/dacite) got 3.06x faster (8.3s -> 2.7s)
|
Is there anything still blocking this PR from being merged? |
@AngellusMortis #15273 is all |
@hauntsaninja, is anything still blocking this PR still or it is good to go? |
New year, new changes, new merge? |
Description
Resolves #10816
The changes this PR makes are relatively small.
It currently:
--output
option to mypy CLIErrorFormatter
abstract base class, which can be subclassed to create new output formatsMypyError
class that represents the external format of a mypy error.--output
being'json'
, in which case theJSONFormatter
is used to produce the reported output.Demo:
A few notes regarding the changes:
ErrorTuple
s created during error reporting, instead of using the more generalErrorInfo
class, because a lot of machinery already exists in mypy for sorting and removing duplicate error reports, which producesErrorTuple
s at the end. The error sorting and duplicate removal logic could perhaps be separated out from the rest of the code, to be able to useErrorInfo
objects more freely.ErrorFormatter
doesn't really need to be an abstract class, but I think it would be better this way. If there's a different method that would be preferred, I'd be happy to know.--output
CLI option is, most probably, not added in the correct place. Any help in how to do it properly would be appreciated, the mypy option parsing code seems very complex.ErrorFormatter
class inside a mypy plugin, and adding aname
field to the formatters. The mypy runtime can then check through the__subclasses__
of the formatter and determine if such a formatter is present.The "checking for the
name
field" part of this code might be appropriate to add within this PR itself, instead of hard-codingJSONFormatter
. Does that sound like a good idea?