Stream dmypy output instead of dumping everything at the end #16252

svalentin · 2023-10-11T22:27:07Z

This does 2 things:

It changes the IPC code to work with multiple messages.
It changes the dmypy client/server communication so that it streams stdout/stderr instead of dumping everything at the end.

For 1, we have to provide a way to separate out different messages. I chose to frame messages as bytes separated by whitespace character. That means we have to encode the message in a scheme that escapes whitespace. The urllib.parse quote/unquote seems reasonable. It encodes more than needed but the application is not IPC IO limited so it should be fine. With this convention in place, all we have to do is read from the socket stream until we have a whitespace character.
The framing logic can be easily changed.

For 2, since we communicate with JSONs, it's easy to add a "finished" key that tells us it's the final response from dmypy. Anything else is just stdout/stderr output.

Note: dmypy server also returns out/err which is the output of actual mypy type checking. Right now this change does not stream that output. We can stream that in a followup change. We just have to decide on how to differenciate the 4 text streams (stdout/stderr/out/err) that will now be interleaved.

Played around with it on Linux quite a bit. Will also test it some more on Windows.

The WriteToConn class could use more love. I just put a bare minimum to test the rest.

This does 2 things: 1. It changes the IPC code to work with multiple messages. 2. It changes the dmypy client/server communication so that it streams stdout/stderr instead of dumping everything at the end. For 1, we have to provide a way to separate out different messages. We can frame messages as bytes separated by whitespace character. That means we have to encode the message in a scheme that escapes whitespace. The urllib.parse quote/unquote seems reasonable. It encodes more than needed but the application is not IPC IO limited so it should be fine. With this convention in place, all we have to do is read from the socket stream until we have a whitespace character. For 2, since we communicate with JSONs, it's easy to add a "finished" key that tells us it's the final response from dmypy. Anything else is just stdout/stderr output. Note: dmypy server also returns out/err which is the output of actual mypy type checking. Right now this change does not stream that output. We can stream that in a followup change. We just have to decide on how to differenciate the 4 text streams (stdout/stderr/out/err) that will now be interleaved. Played around with it on Linux quite a bit. Will also test it some more on Windows. The WriteToConn class could use more love. I just put a bare minimum to test the rest.

mypy/dmypy_server.py

JukkaL

Thanks, overall looks good! This will make debugging daemon issues easier. Left a few comments (not a full review).

JukkaL · 2023-10-12T17:09:07Z

mypy/ipc.py

@@ -13,6 +13,7 @@
 import tempfile
 from types import TracebackType
 from typing import Callable, Final
+from urllib.parse import quote, unquote


As a very minor optimization, what about using codecs.encode(<bytes_data>, 'hex') and codecs.decode(<encoded_data>, 'hex') instead? We wouldn't need to depend on urllib.parse, which is a Python package so importing it may take some time, and it would probably be a bit faster as well.

A similar minor optimization could be to use bytes.hex() and bytes.fromhex().

Ended up going with codecs.encode(<bytes_data>, 'base64'). I think it's a better encoding for this. It has a wider alphabet, but without space. Can easily change it to hex if you think it's better.

Base64 is a scheme for converting binary data to printable ASCII characters, namely the upper- and lower-case Roman alphabet characters (A–Z, a–z), the numerals (0–9), and the "+" and "/" symbols, with the "=" symbol as a special suffix code.

JukkaL · 2023-10-12T17:11:26Z

mypy/test/testipc.py

+        p.start()
+        connection_name = queue.get()
+        with IPCClient(connection_name, timeout=1) as client:
+            client.write("test1")


Test writing whitespace and non-ascii characters? Maybe test all unicode code points within some range.

Done! Also added a test for multiple consecutive messages to test when a buffer has multiple frames inside it.

Though I didn't test all unicode code points. I tested some. I would trust codecs.encode to base64 does the right thing and we don't have to worry about it.

for more information, see https://pre-commit.ci

github-actions · 2023-10-13T11:23:28Z

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

svalentin requested a review from JukkaL October 11, 2023 22:27

ilevkivskyi reviewed Oct 11, 2023

View reviewed changes

mypy/dmypy_server.py Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

JukkaL reviewed Oct 12, 2023

View reviewed changes

Better tests + fix bug with multiple frames in buffer + base64 encoding

73ecb8a

svalentin force-pushed the stream-output branch from 9081b1e to 73ecb8a Compare October 13, 2023 10:52

svalentin and others added 2 commits October 13, 2023 10:54

Merge branch 'master' into stream-output

dfb9500

[pre-commit.ci] auto fixes from pre-commit.com hooks

4851cdd

for more information, see https://pre-commit.ci

JukkaL approved these changes Oct 16, 2023

View reviewed changes

svalentin merged commit 2bcec24 into python:master Oct 16, 2023
18 checks passed

svalentin deleted the stream-output branch October 16, 2023 17:37

sshishov mentioned this pull request Dec 19, 2023

MyPy daemon version 1.7.0+ crashes if colorama is installed and reporting (cubertura) is used. #16678

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream dmypy output instead of dumping everything at the end #16252

Stream dmypy output instead of dumping everything at the end #16252

svalentin commented Oct 11, 2023 •

edited

Loading

This comment has been minimized.

JukkaL left a comment

JukkaL Oct 12, 2023

JelleZijlstra Oct 12, 2023

svalentin Oct 13, 2023

JukkaL Oct 12, 2023

svalentin Oct 13, 2023

svalentin Oct 13, 2023 •

edited

Loading

github-actions bot commented Oct 13, 2023

Stream dmypy output instead of dumping everything at the end #16252

Stream dmypy output instead of dumping everything at the end #16252

Conversation

svalentin commented Oct 11, 2023 • edited Loading

This comment has been minimized.

JukkaL left a comment

Choose a reason for hiding this comment

JukkaL Oct 12, 2023

Choose a reason for hiding this comment

JelleZijlstra Oct 12, 2023

Choose a reason for hiding this comment

svalentin Oct 13, 2023

Choose a reason for hiding this comment

JukkaL Oct 12, 2023

Choose a reason for hiding this comment

svalentin Oct 13, 2023

Choose a reason for hiding this comment

svalentin Oct 13, 2023 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Oct 13, 2023

svalentin commented Oct 11, 2023 •

edited

Loading

svalentin Oct 13, 2023 •

edited

Loading