Support more output formats #10816

Timmmm · 2021-07-13T13:41:35Z

Feature

Mypy should support JSON output to stdout. Pyright has this already via --outputjson so it might be a decent idea to copy their schema as much as possible.

Pitch

Programmatic access to the errors is needed for:

CI integration
IDE integration

Parsing the current output is a bad idea because:

It's error prone and difficult to get right - especially for things like multi-line errors, filenames containing spaces or colons and so on.
It's easy to accidentally break it. The current output does not like it is intended to be machine readable and is therefore likely to be accidentally changed in subtle ways, breaking parsers.
It's way more effort to actually implement a Mypy error parser than a JSON parser.
It can only include a small amount of information. For example column numbers and error codes are not shown.

For example consider the hacky regex used here for Mypy vs the simple JSON parsing used here for Pyright.

The text was updated successfully, but these errors were encountered:

intgr · 2021-10-15T12:18:27Z

Turns out there is a standard for this sort of thing: Static Analysis Results Interchange Format (SARIF)

It seems preferable to adapt SARIF instead of another tool-specific format.

Timmmm · 2021-10-15T16:30:58Z

That looks like one of those extremely long specifications that supports everything but is so complicated nobody can be bothered to support it. Look at Pyright's specification - it's like 3 paragraphs.

But it does have a VSCode extension that looks decent so it might be worth the effort.

tusharsadhwani · 2021-10-20T18:30:55Z

All of the current mypy integrations parse the cli output already, which only has a very small amount of information:

message type (error/hint)
file name
line/column number
error message
(optional) error code

But even just having this in a JSON format would already make mypy much more compatible with IDEs and other tools. If that sounds like a good idea, I'd be willing to make that work this weekend.

tusharsadhwani · 2021-10-20T18:36:27Z

I like Pylint's approach for this:

We can keep the default JSON output pretty simple, but add a custom Reporter interface class, which will have access to all metadata that mypy can provide in the future, and the end-user can extend it, and figure out exactly the data they need, the shape in which they want it formatted, etc.

Ref:

nvuillam · 2021-12-08T17:41:41Z

SARIF format would help to improve integration of mypy within MegaLinter, it would be great :)

Hnasar · 2022-04-01T17:23:27Z

CodeClimate, which is used in GitLab, also needs JSON output: https://docs.gitlab.com/ee/user/project/merge_requests/code_quality.html#implementing-a-custom-tool

Here's an old codeclimate plugin to parse and output JSON, for example: https://github.com/mercos/codeclimate-pylint

tsuyoshicho · 2022-04-29T23:32:56Z

FYI:
Relation File Format information.
reviewdog are diagnostics parsing tool, and that support json file.
https://github.com/reviewdog/reviewdog/tree/master/proto/rdf#rdjson

action-pyright using converted rdjson from original json.

97littleleaf11 · 2022-05-01T04:24:06Z

It's a quite useful feature and can benefit the community! however mypy doesn't have full-time maintainers currently and we can't guarantee a clear plan for this. Anyone is welcome to contribute if interested!

damienallen · 2022-05-27T15:30:58Z

@97littleleaf11 How would one go about such a task?

Is a plugin the preferred approach? Any chance you know anything that could serve as some reference on handling of reports?

pranavrajpal · 2022-05-29T02:52:15Z

#11396 already seems like a reasonable implementation of this. I think at this point it just needs a maintainer to review it.

ssbarnea · 2022-05-29T07:07:02Z

A maintainer of another linter (ansible-lint), I faced two json formats being added, one was the codeclimate one and, more recently, SARIF. Sarif is more complex but it was designed for exchanging information between linters, so I would recommend its use instead of a custom one.

Timmmm · 2022-05-29T09:12:19Z

SARIF was mentioned before but have you seen the spec? I would say just go with the current implementation which is small and simple, and if someone wants to spend a week implementing --format=sarif later that's probably fine.

tusharsadhwani · 2022-05-29T09:13:20Z

Adding both into this PR is incredibly easy actually :)
At least a sarif output with the same information as the current JSON impl.

tusharsadhwani · 2022-05-29T09:15:09Z

@sobolevn @hauntsaninja if any decisions can be taken on this they'll probably have to be by the core team.

Timmmm · 2022-05-29T09:26:08Z

Can you give an example of what it would look like? I.e. instead of this

{"file": "mytest.py", "line": 2, "column": 4, "severity": "error", "message": "Incompatible types in assignment (expression has type \"str\", variable has type \"int\")", "code": "assignment"}
{"file": "mytest.py", "line": 3, "column": 4, "severity": "error", "message": "Name \"z\" is not defined", "code": "name-defined"}

What would the SARIF version be?

tusharsadhwani · 2022-05-29T09:38:32Z

@Timmmm taken from @intgr's reply on my PR:

Click to reveal

{
  "version": "2.1.0",
  "$schema": "https://schemastore.azurewebsites.net/schemas/json/sarif-2.1.0.json",
  "runs": [
    {
      "tool": {
        "driver": {
          "name": "mypy",
          "version": "0.910"
        }
      },
      "results": [
        {
          "ruleId": "assignment",
          "level": "error",
          "message": {
            "text": "Incompatible types in assignment (expression has type \"str\", variable has type \"int\")"
          },
          "locations": [
            {
              "physicalLocation": {
                "artifactLocation": {
                  "uri": "mytest.py"
                },
                "region": {
                  "startLine": 2,
                  "startColumn": 4
                }
              }
            }
          ]
        },
        {
          "ruleId": "name-defined",
          "level": "error",
          "message": {
            "text": "Name \"z\" is not defined"
          },
          "locations": [
            {
              "physicalLocation": {
                "artifactLocation": {
                  "uri": "mytest.py"
                },
                "region": {
                  "startLine": 3,
                  "startColumn": 4
                }
              }
            }
          ]
        }
      ]
    }
  ]
}

Timmmm · 2022-05-29T09:56:45Z

Well it's not totally awful but I know which I would rather parse! I feel like having both options would be good, so the people that don't need SARIF don't have to deal with all of that extra complexity.

Anyway sorry I should have read all the PR comments first. Looks like that is already the consensus there. 👍

nvuillam · 2022-05-29T10:48:33Z

Sarif is not for humans but for machine ^^
For example it would allow to easily display mypy errors directly on the PR files on github UI thanks to a simple github action :)

https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/uploading-a-sarif-file-to-github#uploading-a-code-scanning-analysis-with-github-actions

Timmmm · 2022-05-29T12:38:23Z

Yes we know. But for systems that don't already support SARIF it's nice to have a format that is simple to parse.

ssbarnea · 2022-05-29T14:37:59Z

I had to read some of Sarif spec and it was not a pleasure, reminded me of XML and design committees. Still, after a while I realised that you do not have to support all that crap and get something relatively simple.

We do not need both from start.

ErezAmihud · 2023-02-06T23:03:00Z

I took a stub at this issue, the main issue with adding the capability is the base that mypy provides. \

In one hand you have AbstractReporter which is how the html reports are created. This is good for basic coverage, but it does not include the errors data, and will require running the mypy validator on all file again. \
On the other hand there is the junit-xml option that looks like it creates a good xml. The problem is that it only looks like it. It actually does not contain a lot of data - and the data is does contains comes from parsing the strings of the errors messages (as seen here) which is an unreliable method.

Anyway, there are people who created json reports based on formatting the string output, such as:

Therefor If we want a more reliable method, which will allow us to add more report schemas easily, this issue will be composed of 2 parts:

Allowing the reporters to get more information on errors and the validation in general - in the best implementation stdout would be a reporter, so that everything that the program is able to print to the user other reporters will be able to display. This will be done by editing the main build function to receive a reporter instead of stdout and stderr.
Creating a custom reporter for the desired data type. This would be less of a problem once we have the infrastructure mentioned above.

@ssbarnea I would appreciate your input if from your experience with serif and linters, is output string parsing a reliable enough method for comforting to a schema like sherif?

Edit:
Note that there are other type checkers that do have output in sarif format. For example I saw pyre.

ssbarnea · 2023-02-07T11:33:31Z

@ErezAmihud I think it would be better to avoid parsing the output in order to produce SARIF, better to do it directly from internal data captured by mypy. Still, i do not know mypy internals... so based on that it might not be so easy to do it.

Parsing stdout is not very reliable and could easily break.

tusharsadhwani · 2023-02-07T13:18:24Z

@ssbarnea my PR #11396 does this properly, it can help you with what you want to accomplish. It hasn't gotten any traction yet though.

### Description Resolves #10816 The changes this PR makes are relatively small. It currently: - Adds an `--output` option to mypy CLI - Adds a `ErrorFormatter` abstract base class, which can be subclassed to create new output formats - Adds a `MypyError` class that represents the external format of a mypy error. - Adds a check for `--output` being `'json'`, in which case the `JSONFormatter` is used to produce the reported output. #### Demo: ```console $ mypy mytest.py mytest.py:2: error: Incompatible types in assignment (expression has type "str", variable has type "int") mytest.py:3: error: Name "z" is not defined Found 2 errors in 1 file (checked 1 source file) $ mypy mytest.py --output=json {"file": "mytest.py", "line": 2, "column": 4, "severity": "error", "message": "Incompatible types in assignment (expression has type \"str\", variable has type \"int\")", "code": "assignment"} {"file": "mytest.py", "line": 3, "column": 4, "severity": "error", "message": "Name \"z\" is not defined", "code": "name-defined"} ``` --- A few notes regarding the changes: - I chose to re-use the intermediate `ErrorTuple`s created during error reporting, instead of using the more general `ErrorInfo` class, because a lot of machinery already exists in mypy for sorting and removing duplicate error reports, which produces `ErrorTuple`s at the end. The error sorting and duplicate removal logic could perhaps be separated out from the rest of the code, to be able to use `ErrorInfo` objects more freely. - `ErrorFormatter` doesn't really need to be an abstract class, but I think it would be better this way. If there's a different method that would be preferred, I'd be happy to know. - The `--output` CLI option is, most probably, not added in the correct place. Any help in how to do it properly would be appreciated, the mypy option parsing code seems very complex. - The ability to add custom output formats can be simply added by subclassing the `ErrorFormatter` class inside a mypy plugin, and adding a `name` field to the formatters. The mypy runtime can then check through the `__subclasses__` of the formatter and determine if such a formatter is present. The "checking for the `name` field" part of this code might be appropriate to add within this PR itself, instead of hard-coding `JSONFormatter`. Does that sound like a good idea? --------- Co-authored-by: Tushar Sadhwani <86737547+tushar-deepsource@users.noreply.github.com> Co-authored-by: Tushar Sadhwani <tushar@deepsource.io>

Timmmm added the feature label Jul 13, 2021

Timmmm mentioned this issue Jul 13, 2021

Add option to format the output messages #3547

Closed

erictraut mentioned this issue Aug 10, 2021

Exact error codes for mypy errors #10843

Open

Funk66 mentioned this issue Aug 29, 2021

Add support for mypy jose-elias-alvarez/null-ls.nvim#139

Closed

1 task

tusharsadhwani mentioned this issue Oct 27, 2021

Add Error format support, and JSON output option #11396

Merged

tsuyoshicho mentioned this issue Apr 7, 2022

Multiple reviews generated for multiline mypy error tsuyoshicho/action-mypy#38

Open

abelcheung added a commit to abelcheung/types-lxml that referenced this issue Apr 7, 2022

Disable any mypy review until improvement in python/mypy#10816

8a7dfce

97littleleaf11 added the topic-usability label May 1, 2022

AlexWaygood mentioned this issue Jun 24, 2022

dump diagnostic info to json file #13017

Closed

erictraut mentioned this issue Oct 12, 2022

add an output format option (e.g. csv and json) to the command line #13874

Closed

ErezAmihud mentioned this issue Feb 7, 2023

Produce reports containing file/line information of bad formatting psf/black#1516

Open

abelcheung mentioned this issue Mar 13, 2023

Feature proposal: Allow specifying out as list typeddjango/pytest-mypy-plugins#96

Open

ethanhs mentioned this issue Apr 1, 2023

support CI-friendly formatted output #14962

Closed

ethanhs changed the title ~~JSON output format~~ Support more output formats Apr 1, 2023

Josverl mentioned this issue Nov 6, 2023

Add snippet tests using mypy Josverl/micropython-stubber#445

Closed

JelleZijlstra closed this as completed in #11396 May 10, 2024

tsuyoshicho mentioned this issue May 11, 2024

feat: JSON Output support tsuyoshicho/action-mypy#142

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support more output formats #10816

Support more output formats #10816

Timmmm commented Jul 13, 2021

intgr commented Oct 15, 2021

Timmmm commented Oct 15, 2021

tusharsadhwani commented Oct 20, 2021

tusharsadhwani commented Oct 20, 2021 •

edited

nvuillam commented Dec 8, 2021

Hnasar commented Apr 1, 2022

tsuyoshicho commented Apr 29, 2022 •

edited

97littleleaf11 commented May 1, 2022

damienallen commented May 27, 2022

pranavrajpal commented May 29, 2022

ssbarnea commented May 29, 2022

Timmmm commented May 29, 2022

tusharsadhwani commented May 29, 2022 •

edited

tusharsadhwani commented May 29, 2022

Timmmm commented May 29, 2022

tusharsadhwani commented May 29, 2022 •

edited

Timmmm commented May 29, 2022

nvuillam commented May 29, 2022 •

edited

Timmmm commented May 29, 2022

ssbarnea commented May 29, 2022

ErezAmihud commented Feb 6, 2023 •

edited

ssbarnea commented Feb 7, 2023

tusharsadhwani commented Feb 7, 2023

Support more output formats #10816

Support more output formats #10816

Comments

Timmmm commented Jul 13, 2021

intgr commented Oct 15, 2021

Timmmm commented Oct 15, 2021

tusharsadhwani commented Oct 20, 2021

tusharsadhwani commented Oct 20, 2021 • edited

nvuillam commented Dec 8, 2021

Hnasar commented Apr 1, 2022

tsuyoshicho commented Apr 29, 2022 • edited

97littleleaf11 commented May 1, 2022

damienallen commented May 27, 2022

pranavrajpal commented May 29, 2022

ssbarnea commented May 29, 2022

Timmmm commented May 29, 2022

tusharsadhwani commented May 29, 2022 • edited

tusharsadhwani commented May 29, 2022

Timmmm commented May 29, 2022

tusharsadhwani commented May 29, 2022 • edited

Timmmm commented May 29, 2022

nvuillam commented May 29, 2022 • edited

Timmmm commented May 29, 2022

ssbarnea commented May 29, 2022

ErezAmihud commented Feb 6, 2023 • edited

ssbarnea commented Feb 7, 2023

tusharsadhwani commented Feb 7, 2023

tusharsadhwani commented Oct 20, 2021 •

edited

tsuyoshicho commented Apr 29, 2022 •

edited

tusharsadhwani commented May 29, 2022 •

edited

tusharsadhwani commented May 29, 2022 •

edited

nvuillam commented May 29, 2022 •

edited

ErezAmihud commented Feb 6, 2023 •

edited