Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try compiling parts of mypy with cython and see if it's faster #3408

Closed
JukkaL opened this issue May 22, 2017 · 12 comments
Closed

Try compiling parts of mypy with cython and see if it's faster #3408

JukkaL opened this issue May 22, 2017 · 12 comments

Comments

@JukkaL
Copy link
Collaborator

JukkaL commented May 22, 2017

Maybe we could us Cython to run mypy faster. We probably don't want to use any Cython type annotations, but Cython can speed up straight Python code as well by removing interpretation overhead.

Note that using Cython may make it significantly harder to generate mypy wheels, and cause other potential issues, so we'd likely only want to use Cython if there is a significant performance improvement, and even then it might be unclear if we really want to use Cython.

Here is a potential way to run an experiment (note that I don't have much experience with Cython):

  • Run a profiler to determine mypy modules that mypy spends the most time in when type checking.
  • Use Cython to compile one of the hot spot modules, without any changes.
  • Make sure tests pass.
  • Measure whether self-checking mypy is any faster.
  • Pick another module and compile it, and so on.
    ...
  • Post your results here.
@carljm
Copy link
Member

carljm commented May 22, 2017

I will work on this.

@refi64
Copy link
Contributor

refi64 commented May 23, 2017

I wonder how Nuitka would do...

@carljm
Copy link
Member

carljm commented May 23, 2017

Results from the first quick and dirty test with Cython:

$ python bench.py 
.....................
mypy_cython: Mean +- std dev: 11.9 sec +- 0.7 sec
.....................
mypy: Mean +- std dev: 12.5 sec +- 1.2 sec

I profiled a mypy self-check and found that mypy/subtypes.py and mypy/types.py were the modules within mypy containing the top CPU-using functions:

         24486713 function calls (23243872 primitive calls) in 20.433 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
5495972/5495604    1.094    0.000    1.095    0.000 {built-in method builtins.isinstance}
120587/77871    0.694    0.000    1.937    0.000 subtypes.py:132(visit_instance)
     2285    0.587    0.000    0.587    0.000 {built-in method _ast3._parse}
152660/81662    0.530    0.000    3.321    0.000 subtypes.py:37(is_subtype)
   382195    0.434    0.000    1.019    0.000 typing.py:801(_geqv)
   325022    0.414    0.000    1.519    0.000 typing.py:1122(__new__)
    82996    0.342    0.000    0.537    0.000 types.py:564(__init__)
   238049    0.342    0.000    0.460    0.000 types.py:402(__init__)
   766221    0.325    0.000    0.450    0.000 typing.py:793(_gorg)
108859/26709    0.285    0.000    8.770    0.000 checkexpr.py:2042(accept)
    74918    0.282    0.000    0.764    0.000 types.py:606(copy_modified)
12279/9844    0.272    0.000    0.696    0.000 fastparse.py:819(visit_Call)
   289230    0.246    0.000    0.345    0.000 types.py:1638(is_named_instance)

I compiled those two modules using Cython (with the allow_keyword_args=True and language_level=3 options), and then wrote a benchmark script using the perf module:

import sys
import perf

runner = perf.Runner()

runner.bench_command(
    'mypy_cython',
    [
        '/home/carljm/.venvs/mypy-dev/bin/mypy',
        '/home/carljm/projects/python/mypy-pristine/mypy',
    ]
)

runner.bench_command(
    'mypy',
    [
        '/home/carljm/.venvs/mypy-dev-pristine/bin/mypy',
        '/home/carljm/projects/python/mypy-pristine/mypy',
    ]
)

The first benchmark runs the partially-cythonized mypy over an unmodified mypy code tree, and the second runs the unmodified mypy over itself.

The partially-Cythonized mypy passes runtests.py, except for FAILURE check package mypy nonstrict optional and FAILURE check legacy entry script, which fail because I renamed subtypes.py and types.py to subtypes.pyx and types.pyx, so mypy can't find them on self-check. This is a solvable problem.

I'm sure we can squeeze out more by compiling more modules, or quite a bit more if we were willing to introduce cython-specific syntax.

@gvanrossum
Copy link
Member

I would much prefer mypy to stay pure Python. Maybe we can use some of the profile data to optimize some of mypy's Python code?

@gvanrossum
Copy link
Member

(Also, I once ran mypy with Nuitka and there wasn't much improvement there either.)

@ilevkivskyi
Copy link
Member

Maybe we can use some of the profile data to optimize some of mypy's Python code?

I already do this. The point is that the protocols PR #3132 touches the hottest function SubtypeVisitor.visit_instance, so that I implement a subtype cache and it looks like it gives 5-10% speed-up.

@carljm
Copy link
Member

carljm commented May 23, 2017

If there isn't interest in the Cython direction, I won't experiment further. I think that the gains here will be only incremental and probably not worth the release hassles unless we were willing to add Cython-specific annotations.

It would be great if mypy and its dependencies (cough typed-ast) were all pure Python, so that we could explore speed improvements from running under pypy. But it looks like typed-ast will make that very difficult.

@JukkaL
Copy link
Collaborator Author

JukkaL commented May 23, 2017

It would be great if mypy and its dependencies (cough typed-ast) were all pure Python, so that we could explore speed improvements from running under pypy. But it looks like typed-ast will make that very difficult.

I experimented with pypy before switching to typed-ast and it was actually slower than CPython when type checking mypy at least. When I self-checked mypy in a loop, I think that the first iteration was about half as fast as CPython, and then maybe iteration 5 or so was up to around 2 times faster. This was a while back, though, and I haven't tried with a recent PyPy release. It would easy enough to check out an old mypy commit without typed-ast and see how it behaves with a recent pypy, but it's unlikely that we are actually going to move away from typed-ast.

@ethanhs
Copy link
Collaborator

ethanhs commented May 27, 2017

I just checked mypy v471, which I believe was the last version supporting the old parser, and compared it to CPython. I also tried Nuitka. Here is what I got:

$ python3 -V
Python 3.5.2
$ pypy3 -V
Python 3.5.3 (2875f328eae2, Apr 02 2017, 17:58:49)
[PyPy 5.7.1-beta0 with GCC 6.2.0 20160901]
$ /usr/bin/time pypy3 -m mypy mypy
20.17 user
0.34 system
0:20.57 elapsed
99% CPU
$ /usr/bin/time python3 -m mypy mypy
12.04 user
0.29 system
0:12.37 elapsed
99% CPU
$ /usr/bin/time python3 -m mypy --fast-parse mypy
10.87 user
0.46 system
0:11.37 elapsed
99% CPU
$ /usr/bin/time ./mypy.exe mypy
12.81 user
1.15 system
0:13.96 elapsed
100% CPU
$ /usr/bin/time ./mypy.exe --fast-parse mypy
11.37 user
1.14 system
0:12.52 elapsed
99% CPU

I can provide any further timing upon request.

Based on this, I can surmise that PyPy is not a good path to go down, as it performs much worse than even the non-fast-parse implementation. I presume that one could extend their ast to support type comments and essentially implement fast-parse on top of that, however, I do not see this as being a good path to go down.

Similarly, Nuitka provided the same or worse performance, and I see no way to improve this. Also as a side note, I find it very strange it created an exe on a Linux system.

So I think it is safe to say that neither of these are possible alternatives. In general in trying PyPy, Cython, and Nuitka on different projects in the past, PyPy can lead to impressive speedups for some but not all workloads, Cython without special typing can lead to a 5-10% speedup, and Nuitka never seems to add much of any advantage.
EDIT:
Out of curiosity, I ran the PyPy tests so that it ran multiple times:

$ cat test.py
from mypy.api import run
import time
for i in range(5):
    start = time.time()
    run('mypy')
    print(time.time() - start)
$ pypy3 test.py
21.934080839157104
13.73313283920288
9.043972730636597
10.095704317092896
9.367397785186768

This is actually a nice improvement, faster even than the fastparse of 0.471. However, I don't think this means we should consider PyPy. My thinking is two-fold. First, most things using mypy are almost certainly running on CPython, as PyPy does not support PEP 484 or 526. Secondly, mypy is most often called in a one-off fashion. It does not make much sense for it to be called multiple times like in my test. I think the cost of the startup is too great to compensate for any speed gained by it.

However this is a fair bit of my opinion! Interpret the numbers how you will.

@ethanhs
Copy link
Collaborator

ethanhs commented Dec 3, 2017

Closing as I believe we decided we aren't going to do this.

@ethanhs ethanhs closed this as completed Dec 3, 2017
@scoder
Copy link

scoder commented Aug 19, 2018

I renamed subtypes.py and types.py to subtypes.pyx and types.pyx

It seems to be a hard to defeat myth that this is necessary. It isn't. It's more like renaming a .c file to .cpp – don't do it, unless you have a good reason to do so. In your case, you decidedly want to keep the .py extension, because you want the syntax to remain Python, and the code to be runnable in Python unchanged (and non-duplicated).

I do not know mypy's implementation, but the run profile above suggests that the main bottleneck is Python's isinstance() function. It is already implemented in C, and therefore not something that Cython can magically speed up for you. My (underinformed) advice would therefore be to look for algorithmic improvements. Why use isinstance() at all? It's based on an MRO traversal and therefore requires time that's O(N) in the number of base classes. Try to find a way to do the same thing in O(1). CPython did that in Py2.7+ by mapping the builtin types to a set of type flags, and also caches the result of type checks in the ABC implementation (both successes and failures). Use hashing/caching where you can. I'd be very surprised if all of the 5.5M subtype checks were actually unique.

Then, the method names suggest that the next bottleneck could be a visitor implementation. In Cython, we achieved a visible speedup of the compiler by converting the main visitor base class(es) into extension types. See
https://github.com/cython/cython/blob/master/Cython/Compiler/Visitor.py
and the type declarations in
https://github.com/cython/cython/blob/master/Cython/Compiler/Visitor.pxd

(Again, I'm mostly guessing here, but it seems unlikely that the designs of Cython and mypy are entirely different.)

@ethanhs
Copy link
Collaborator

ethanhs commented Aug 20, 2018

It seems to be a hard to defeat myth that this is necessary. It isn't.

Yes, since writing that, I've learned that it isn't needed :)

Also, thank you for taking a look at the profiling data, your recommendations seem useful. There definitely are things we can do to optimize regardless of whether we choose to use Cython or not, so I appreciate your comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants