Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing speed of the starting and initialisation phase #4814

Closed
Hubro opened this issue Aug 8, 2021 · 29 comments
Closed

Optimizing speed of the starting and initialisation phase #4814

Hubro opened this issue Aug 8, 2021 · 29 comments

Comments

@Hubro
Copy link

Hubro commented Aug 8, 2021

Current problem

Pylint is frustratingly slow, taking several seconds to lint even tiny, simple Python files. Pylint is glacially slow compared to any linting tool in any other language I have worked with. This slowness is not too big a problem for CI/CD pipelines, since in this context a few seconds doesn't matter too much and linting of multiple files can be parallelized.

However, for background linting in editors while editing (for example with ALE) this leads to a terrible experience of warnings popping up seconds after I've moved past a line, or warnings staying up seconds after I've resolved them.

Since Python is not a slow language, I assume this slowness has to do with one or more linter settings. However, since there are hundreds of settings, I have no idea where to begin with improving the performance.

My goal would be to get a pylint run on a single file to take somewhere in the ballpark of 100 ms.

Desired solution

I would like a section in the front page README, or a separate guide linked to from the README file, with suggestions for how to improve the performance of pylint, for example disabling certain options that are particularly taxing.

Linting 80 lines of Python should not take 2.5 seconds, at the very most it should take maybe 100 ms.

Additional context

My pylint config: https://gist.github.com/Hubro/7adba88c42e4df0706067bfe9b2cea53#file-pylintrc-ini

The file I'm currently linting: https://gist.github.com/Hubro/7adba88c42e4df0706067bfe9b2cea53#file-generic_http_mock_server-py

➜ \pylint --version
pylint 2.8.3
astroid 2.5.6
Python 3.8.6 (default, Jun 27 2021, 03:28:09)
[GCC 11.1.0]
➜ time \pylint --rcfile=pylintrc.ini generic_http_mock_server.py
************* Module generic_http_mock_server
generic_http_mock_server.py:28:0: C0115: Missing class docstring (missing-class-docstring)
\pylint --rcfile=pylintrc.ini   2.30s user 0.15s system 101% cpu 2.416 total
@Hubro Hubro added Enhancement ✨ Improvement to a component Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling labels Aug 8, 2021
@DudeNr33
Copy link
Collaborator

DudeNr33 commented Aug 8, 2021

I ran pylint with different checkers enabled and timed the execution (once with normal output and once while using -f pylint.testutils.reporter_for_tests.FunctionalTestReporter to suppress the output):

checkerName   ;user time (with output) ;user time (no output)
async         ;2.37s                   ;2.45s
basic         ;6.02s                   ;6.01s
classes       ;4.08s                   ;4.15s
design        ;2.64s                   ;2.73s
exceptions    ;2.67s                   ;2.71s
format        ;2.65s                   ;2.70s
imports       ;4.90s                   ;4.82s
logging       ;5.39s                   ;5.49s
miscellaneous ;2.53s                   ;2.62s
newstyle      ;2.51s                   ;2.57s
python3       ;4.23s                   ;4.35s
metrics       ;2.44s                   ;2.44s
similarities  ;7.74s                   ;7.82s
spelling      ;2.39s                   ;2.44s
stdlib        ;5.46s                   ;5.57s
string        ;5.57s                   ;5.61s
typecheck     ;7.07s                   ;7.10s
variables     ;5.37s                   ;5.44s
refactoring   ;5.89s                   ;5.99s
all           ;16.58s                  ;16.32s
E             ;??                      ;9.01s
C             ;??                      ;6.13s
W             ;??                      ;9.29s
R             ;??                      ;12.41s

(Command: time pylint --disable=all --enable=<checker> -f pylint.testutils.reporter_for_tests.FunctionalTestReporter pylint).
Maybe this can serve as a starting point.

However, I think the main problem might be the startup overhead.
Linting the complete codebase of Pylint itself with 144 Python files took only 16s on my machine, while linting your example took around 1.6s. Linting a single random file of Pylint's codebase also took 1.84s.
So I don't really know how much can be gained by enabling/disabling checkers alone.

Update:
Running pylint on a file with just a = 1 takes around 0.42s for me.
That's roughly the same time as running with --disable=all and therefore the minimum overhead of just starting and initialising pylint even before any linting is done.

@Pierre-Sassoulas
Copy link
Member

Pierre-Sassoulas commented Aug 8, 2021

Ultimately pylint's characteristic is that it's slow and very thorough because it infer values contrary to some faster linter. Your file might be small but if the dependencies it uses are huge, pylint will analyses all of them and takes time to infer the value of things (On your file the lint is 1,87s if flask is installed, 0,51 without flask). Improving performance is a goal, but some checks cannot be done without calculation. For example a checker that is known to be slow on big code base is the duplicate-code checker (its performance were enhanced in 2.10). There's not much we can do to make it faster as comparing line from all files against all other files will always take time and not scale very well.

Regarding startup time, I would not be surprised if the way this is done right now with a giant god class (PyLinter), strange call backs, and deprecated library (optparse) could be hugely improved upon. A performance analysis of the startup time of this part of the code would be greatly beneficial in order to prioritize what needs to be done if we refactor to improve performance.

@Pierre-Sassoulas Pierre-Sassoulas added Documentation 📗 performance and removed Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling labels Aug 8, 2021
@Hubro
Copy link
Author

Hubro commented Aug 9, 2021

That's roughly the same time as running with --disable=all and therefore the minimum overhead of just starting and initialising pylint even before any linting is done.

Aw, that's discouraging... So I guess Pylint needs some work on the performance front. Are there any efforts to do that, that you're aware of?

Also, do any real alternatives to Pylint exist? (I.e. a generic linter with configurable checkers)

@Pierre-Sassoulas
Copy link
Member

Aw, that's discouraging... So I guess Pylint needs some work on the performance front. Are there any efforts to do that, that you're aware of?

Issues are opened for it, and we're aware of the problem. Everyone likes contributing new check or bug fix, and improving performance is hard work (especially if we need to stop inferring everything and taking type clue into account which would be a major redesign). So yes, there is an effort to do that but maintainers time is limited and being fast is not pylint's main value proposition.

Also, do any real alternatives to Pylint exist? (I.e. a generic linter with configurable checkers)

For duplicate code there is Simian. For linting, vanilla flake8 is fast because it does less checks than pylint and does not infer values. It includes pyflake, pycodestyle, mccabe and some of its own checks. I would encourage you to run pylint in continuous integration and pre-commit, and to run pycodestyle/pyflakes in IDE (or simply Pycharm's check if you use pycharm).

@DudeNr33
Copy link
Collaborator

DudeNr33 commented Aug 9, 2021

Regarding startup time, I would not be surprised if the way this is done right now with a giant god class (PyLinter), strange call backs, and deprecated library (optparse) could be hugely improved upon. A performance analysis of the startup time of this part of the code would be greatly beneficial in order to prioritize what needs to be done if we refactor to improve performance.

I ran cProfile and fed the result to gprof2dot:
startup_profile

Haven't looked at it in detail, but options processing does take a substantial amount of time (considering I only gave disable=all via the command line).

@DudeNr33
Copy link
Collaborator

I noticed that another large amount of time was attributed to parallel.py.
That made me wonder - why is Pylint spending almost 50% of its startup time in this module, even though I did not tell it to run in parallel mode?
A little hacking and putting timing code in each module later I could track it down:
It's the import from astroid import nodes which takes a huge amount of time!

Simply timing the import of pylint.lint results in:

python -c "import pylint.lint"  0.20s user 0.05s system 85% cpu 0.287 total

This is to 75% due to the import of astroid. Timing just this single import statement gives 0.15s!

python -c "from astroid import nodes"  0.15s user 0.04s system 77% cpu 0.245 total

I'll try to do the same profiling for astroid, maybe we can find a few things there to reduce the overhead of just importing the nodes.

@Pierre-Sassoulas
Copy link
Member

We're refactoring the nodes module right now un astroid. This is for readability reason, because a 4500 lines file freeze the ide for seconds while being analyzed (with a fast linter) and because there are hidden circular import. If there is something to do regarding this import (beside bursting it, we're doing that anyway) it's the perfect time :D

@DudeNr33
Copy link
Collaborator

I ran the same script over the current astroid master.
It places time.time() calls at the very beginning of the script, one after the last "normal" import statement (things like try: import xy; except: ... are not recognized by the script) and one at the very end.
The column imports is the time delta between the first two measuring points, code is the delta between the second and the last.

Example:

start = time.time()
import sys
from foo import bar

import_end = time.time()

# the following should probably be included when measuring the import time, but I used a very simple script...
try:
    import spam
except ImportError:
    spam = None

class Dummy:
    pass

end = time.time()
print(';'.join([__file__, f'{(import_end - start):.10f}', f'{(end - import_end):.10f}']))

The result is the following:

file                                                                                             ;imports      ;code
/Users/andreas/programming/git-originals/astroid/astroid/__pkginfo__.py                          ;0.0000009537 ;0.0000000000
/Users/andreas/programming/git-originals/astroid/astroid/context.py                              ;0.0002040863 ;0.0008509159
/Users/andreas/programming/git-originals/astroid/astroid/util.py                                 ;0.0006728172 ;0.0000352859
/Users/andreas/programming/git-originals/astroid/astroid/exceptions.py                           ;0.0000038147 ;0.0001928806
/Users/andreas/programming/git-originals/astroid/astroid/decorators.py                           ;0.0066850185 ;0.0000209808
/Users/andreas/programming/git-originals/astroid/astroid/const.py                                ;0.0000011921 ;0.0000646114
/Users/andreas/programming/git-originals/astroid/astroid/interpreter/__init__.py                 ;0.0000000000 ;0.0000000000
/Users/andreas/programming/git-originals/astroid/astroid/interpreter/_import/__init__.py         ;0.0000000000 ;0.0000000000
/Users/andreas/programming/git-originals/astroid/astroid/interpreter/_import/util.py             ;0.0000000000 ;0.0500040054
/Users/andreas/programming/git-originals/astroid/astroid/interpreter/_import/spec.py             ;0.0501790047 ;0.0002009869
/Users/andreas/programming/git-originals/astroid/astroid/modutils.py                             ;0.0005559921 ;0.0000340939
/Users/andreas/programming/git-originals/astroid/astroid/transforms.py                           ;0.0000038147 ;0.0000102520
/Users/andreas/programming/git-originals/astroid/astroid/manager.py                              ;0.0514240265 ;0.0000298023
/Users/andreas/programming/git-originals/astroid/astroid/interpreter/objectmodel.py              ;0.0515880585 ;0.0001418591
/Users/andreas/programming/git-originals/astroid/astroid/bases.py                                ;0.0081331730 ;0.0522449017
/Users/andreas/programming/git-originals/astroid/astroid/mixins.py                               ;0.0000038147 ;0.0000460148
/Users/andreas/programming/git-originals/astroid/astroid/nodes/const.py                          ;0.0000000000 ;0.0000040531
/Users/andreas/programming/git-originals/astroid/astroid/as_string.py                            ;0.0000009537 ;0.0000159740
/Users/andreas/programming/git-originals/astroid/astroid/nodes/node_ng.py                        ;0.0002577305 ;0.0000770092
/Users/andreas/programming/git-originals/astroid/astroid/nodes/node_classes.py                   ;0.0629832745 ;0.0011730194
/Users/andreas/programming/git-originals/astroid/astroid/interpreter/dunder_lookup.py            ;0.0000028610 ;0.0000009537
/Users/andreas/programming/git-originals/astroid/astroid/nodes/scoped_nodes.py                   ;0.0000960827 ;0.0002627373
/Users/andreas/programming/git-originals/astroid/astroid/nodes/__init__.py                       ;0.0652899742 ;0.0000011921
/Users/andreas/programming/git-originals/astroid/astroid/node_classes.py                         ;0.0654280186 ;0.0000030994
/Users/andreas/programming/git-originals/astroid/astroid/scoped_nodes.py                         ;0.0000030994 ;0.0000011921
/Users/andreas/programming/git-originals/astroid/astroid/raw_building.py                         ;0.0000059605 ;0.0481710434
/Users/andreas/programming/git-originals/astroid/astroid/helpers.py                              ;0.0483200550 ;0.0000030994
/Users/andreas/programming/git-originals/astroid/astroid/arguments.py                            ;0.0000109673 ;0.0000131130
/Users/andreas/programming/git-originals/astroid/astroid/protocols.py                            ;0.0001487732 ;0.0000519753
/Users/andreas/programming/git-originals/astroid/astroid/inference.py                            ;0.0488860607 ;0.0000698566
/Users/andreas/programming/git-originals/astroid/astroid/astroid_manager.py                      ;0.0000000000 ;0.0000019073
/Users/andreas/programming/git-originals/astroid/astroid/brain/__init__.py                       ;0.0000000000 ;0.0000000000
/Users/andreas/programming/git-originals/astroid/astroid/brain/helpers.py                        ;0.0000021458 ;0.0000000000
/Users/andreas/programming/git-originals/astroid/astroid/_ast.py                                 ;0.0000038147 ;0.0002527237
/Users/andreas/programming/git-originals/astroid/astroid/rebuilder.py                            ;0.0003468990 ;0.0004632473
/Users/andreas/programming/git-originals/astroid/astroid/builder.py                              ;0.0011017323 ;0.0000209808
/Users/andreas/programming/git-originals/astroid/astroid/inference_tip.py                        ;0.0000019073 ;0.0000059605
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_random.py                   ;0.0000069141 ;0.0000050068
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_collections.py              ;0.0000030994 ;0.0000040531
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_numpy_utils.py              ;0.0000030994 ;0.0000011921
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_numpy_core_function_base.py ;0.0000686646 ;0.0000052452
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_six.py                      ;0.0000059605 ;0.0000050068
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_crypt.py                    ;0.0000030994 ;0.0000009537
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_numpy_core_numeric.py       ;0.0000030994 ;0.0000028610
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_threading.py                ;0.0000009537 ;0.0000009537
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_re.py                       ;0.0000050068 ;0.0000030994
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_hypothesis.py               ;0.0000021458 ;0.0000009537
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_type.py                     ;0.0000040531 ;0.0000019073
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_namedtuple_enum.py          ;0.0000071526 ;0.0000069141
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_dataclasses.py              ;0.0000019073 ;0.0000009537
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_subprocess.py               ;0.0000030994 ;0.0000007153
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_ssl.py                      ;0.0000038147 ;0.0000009537
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_numpy_core_multiarray.py    ;0.0000030994 ;0.0000829697
/Users/andreas/programming/git-originals/astroid/astroid/brain/__init__.py                       ;0.0000000000 ;0.0000000000
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_mechanize.py                ;0.0000021458 ;0.0000009537
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_numpy_random_mtrand.py      ;0.0000021458 ;0.0000009537
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_curses.py                   ;0.0000016689 ;0.0000009537
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_fstrings.py                 ;0.0000019073 ;0.0000009537
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_io.py                       ;0.0000021458 ;0.0000009537
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_numpy_core_numerictypes.py  ;0.0000021458 ;0.0000009537
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_qt.py                       ;0.0000040531 ;0.0000030994
/Users/andreas/programming/git-originals/astroid/astroid/objects.py                              ;0.0000047684 ;0.0001993179
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_functools.py                ;0.0003061295 ;0.0000138283
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_http.py                     ;0.0000021458 ;0.0000019073
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_dateutil.py                 ;0.0000019073 ;0.0000011921
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_gi.py                       ;0.0000057220 ;0.0000030994
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_sqlalchemy.py               ;0.0000021458 ;0.0000007153
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_responses.py                ;0.0000021458 ;0.0000009537
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_typing.py                   ;0.0000078678 ;0.0000400543
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_argparse.py                 ;0.0000040531 ;0.0000009537
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_numpy_core_fromnumeric.py   ;0.0000021458 ;0.0000007153
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_scipy_signal.py             ;0.0000011921 ;0.0000009537
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_attrs.py                    ;0.0000021458 ;0.0000019073
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_uuid.py                     ;0.0000021458 ;0.0000009537
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_boto3.py                    ;0.0000028610 ;0.0000009537
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_builtin_inference.py        ;0.0000059605 ;0.0017840862
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_pytest.py                   ;0.0000021458 ;0.0000021458
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_numpy_core_umath.py         ;0.0000021458 ;0.0000007153
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_pkg_resources.py            ;0.0000038147 ;0.0000011921
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_nose.py                     ;0.0000040531 ;0.0000658035
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_numpy_ndarray.py            ;0.0000021458 ;0.0000021458
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_multiprocessing.py          ;0.0000030994 ;0.0000019073
/Users/andreas/programming/git-originals/astroid/astroid/brain/brain_hashlib.py                  ;0.0000009537 ;0.0000030994
/Users/andreas/programming/git-originals/astroid/astroid/__init__.py                             ;0.1232609749 ;0.0056791306

The most taxing individual modules (with the highest code time) are:

file imports code
/Users/andreas/programming/git-originals/astroid/astroid/bases.py 0.0081331730 0.0522449017
/Users/andreas/programming/git-originals/astroid/astroid/interpreter/_import/util.py 0.0000000000 0.0500040054
/Users/andreas/programming/git-originals/astroid/astroid/raw_building.py 0.0000059605 0.0481710434
/Users/andreas/programming/git-originals/astroid/astroid/init.py 0.1232609749 0.0056791306

I suggest to run the script again after you finished the refactoring and see if it is already better. 😁

If someone wants to use the script himself, here is the snippet:

from pathlib import Path


all_py_files = Path("astroid").rglob("*.py")

for py_file in all_py_files:
    try:
        print(f"Patching {py_file}...")
        lines = py_file.read_text().splitlines()
        last_import_line = 0
        for lineno, line in enumerate(reversed(lines)):
            if line.startswith("from") or line.startswith("import"):
                last_import_line = len(lines) - lineno
                while lines[last_import_line] != "":
                    last_import_line += 1
                break
        print(f"Last import is on line {last_import_line}...")
        lines[0:0] = ["import time", "start = time.time()"]
        lines[last_import_line+2:last_import_line+2] = ["import_end = time.time()"]
        lines.extend([
            "end = time.time()",
            "print(';'.join([__file__, f'{(import_end - start):.10f}', f'{(end - import_end):.10f}']))"
        ])
        py_file.write_text("\n".join(lines))
    except IndexError:
        print(f"Could not patch file {py_file}!")

@Pierre-Sassoulas Pierre-Sassoulas changed the title Guide for optimizing performance Optimizing speed of the starting and initialisation phase Aug 11, 2021
@Pierre-Sassoulas
Copy link
Member

Restricted the issue title to something actionable so it does not become an epic ticket we can never close.

@matusvalo
Copy link
Collaborator

One idea how to tackle this problem is to introduce pylintd server. Using deamon background server is pretty standard - at least mypy [1] and black [2] have it. Moving to background daemon we can save time in two places:

  1. python startup - this is for free since daemon will run permanently
  2. pylint initialisation - this will require some refactoring of pylint/lint/run.py

Implementing a simple protocol will also simplify editor integration when editor can use this protocol instead of calling binary directly.

[1] https://mypy.readthedocs.io/en/stable/mypy_daemon.html
[2] https://black.readthedocs.io/en/stable/usage_and_configuration/black_as_a_server.html

ps. I personally like blackd implementation which is simple, straightforward and also protocol is lightweight (simple http based protocol)

@matusvalo
Copy link
Collaborator

OK I have finally managed to gather proper information. I have created a very simple pylintd service and here are results:

$ cat test.py                           # empty python file
pass

$ time pylint test.py                   # running pylint command
************* Module test
test.py:1:0: C0114: Missing module docstring (missing-module-docstring)

--------------------------------------------------------------------
Your code has been rated at 0.00/10 (previous run: 10.00/10, -10.00)


real	0m0.744s
user	0m0.572s
sys	0m0.102s

$ time curl http://localhost:8000       # asking pylintd
************* Module test
test.py:1:0: C0114: Missing module docstring (missing-module-docstring)

------------------------------------------------------------------
Your code has been rated at 0.00/10 (previous run: 0.00/10, +0.00)


real	0m0.082s
user	0m0.006s
sys	0m0.011s

@matusvalo
Copy link
Collaborator

matusvalo commented Nov 25, 2021

I have also tried the example directly from issue report (generic_http_mock_server.py) and it seems that still the most expensive part is linting itself:

$ time curl http://localhost:8000
************* Module generic_http_mock_server
generic_http_mock_server.py:13:0: E0401: Unable to import 'flask' (import-error)
generic_http_mock_server.py:14:0: E0401: Unable to import 'flask.typing' (import-error)
generic_http_mock_server.py:28:0: C0115: Missing class docstring (missing-class-docstring)

real	0m1.732s
user	0m0.006s
sys	0m0.009s

$ time pylint generic_http_mock_server.py
************* Module generic_http_mock_server
generic_http_mock_server.py:13:0: E0401: Unable to import 'flask' (import-error)
generic_http_mock_server.py:14:0: E0401: Unable to import 'flask.typing' (import-error)
generic_http_mock_server.py:28:0: C0115: Missing class docstring (missing-class-docstring)

real	0m2.325s
user	0m4.342s
sys	0m0.523s

@cdce8p
Copy link
Member

cdce8p commented Nov 27, 2021

I have also tried the example directly from issue report (generic_http_mock_server.py) and it seems that still the most expensive part is linting itself ...

I'm really not sure a deamon server does make for pylint. At least not currently. It might if the linting itself would be fast but as you pointed out and we all already know, that is the most expensive part. Startup doesn't make much of a difference there.

Some other things to consider, @Pierre-Sassoulas already mentioned pylint-dev/astroid#792. I would bet there are also other issues. Adding something like this now, does only mean the we need to support one more thing. Maintenance is hard enough as it is already. There is also the issue how it will interact with caching if it gets added at some point which would complicate any implementation even further.

I do agree that performance is an issue, but IMO this isn't the right step at the moment.

@Hubro
Copy link
Author

Hubro commented Nov 27, 2021

I have also tried the example directly from issue report (generic_http_mock_server.py) and it seems that still the most expensive part is linting itself ...

I'm really not sure a deamon server does make for pylint. At least not currently. It might if the linting itself would be fast but as you pointed out and we all already know, that is the most expensive part. Startup doesn't make much of a difference there.

Some other things to consider, @Pierre-Sassoulas already mentioned pylint-dev/astroid#792. I would bet there are also other issues. Adding something like this now, does only mean the we need to support one more thing. Maintenance is hard enough as it is already. There is also the issue how it will interact with caching if it gets added at some point which would complicate any implementation even further.

I do agree that performance is an issue, but IMO this isn't the right step at the moment.

As @Pierre-Sassoulas said earlier, Pylint apparently analyzes all dependencies, which would explain why linting even a tiny Python file can be frustratingly slow.

If a Pylint server were to keep this analysis in memory and only analyze changes on demand, it wouldn't surprise me if that could improve the performance dramatically.

@cdce8p
Copy link
Member

cdce8p commented Nov 27, 2021

...

As @Pierre-Sassoulas said earlier, Pylint apparently analyzes all dependencies, which would explain why linting even a tiny Python file can be frustratingly slow.

That happens as part of the inference and not the startup itself. Say for example you inherit from some unknown base class in your "small" file, pylint might try to check if that base is an abstract class to warn you if you haven't implemented all methods. For that it needs to infer the base class, which might require analyzing dependencies.

Tbh I'm not sure it would be feasible to preinfer all symbols from dependencies. Much just isn't needed. So far we also don't have a use for it.

If a Pylint server were to keep this analysis in memory and only analyze changes on demand, it wouldn't surprise me if that could improve the performance dramatically.

I believe what you're looking for is most likely some form of caching of inference results. There have been a few discussions around that, in particular pylint-dev/astroid#1145. The main challenge I see there is to know when to invalidate which cache entries. That isn't trivial as an inference result might depend on multiple other files.

Crucially though, caching should not require a dedicated server. The cache results could be stored in temp files and be available for the next pylint run. (Similar to mypy.)

@Hubro
Copy link
Author

Hubro commented Nov 27, 2021

I believe what you're looking for is most likely some form of caching of inference results. There have been a few discussions around that, in particular pylint-dev/astroid#1145. The main challenge I see there is to know when to invalidate which cache entries. That isn't trivial as an inference result might depend on multiple other files.

That sounds like a very logical next step for Pylint. Even a "dumb" implementation of an inference cache would be an amazing start:

  • Any Python standard library or third party library, always cache inference results indefinitely
  • Any other code, process from scratch every time

It sounds to me like this would provide a massive Pylint speed boost for the vast majority of Python projects, and the cache would almost never be an issue. The cache can be invalidate by deleting the cache directory. At least as an opt-in feature until smarter cache invalidation can be figured out, this would be freaking amazing.

@cdce8p
Copy link
Member

cdce8p commented Nov 27, 2021

That sounds like a very logical next step for Pylint. Even a "dumb" implementation of an inference cache would be an amazing start:

  • Any Python standard library or third party library, always cache inference results indefinitely
  • Any other code, process from scratch every time

Not so fast! How do you handle dependency upgrades between runs? I'm not sure we infer much from the standard library but even there what about patch version updates?

It sounds to me like this would provide a massive Pylint speed boost for the vast majority of Python projects, and the cache would almost never be an issue. The cache can be invalidate by deleting the cache directory.

Although I agree a working caching solution would be amazing, I'm not sure "the cache would _almost_ never be an issue" is accurate. I my opinion nothing is worse then getting incorrect results because the cache wasn't invalidated correctly. You start to wonder why your CI pipeline fails while locally everything works and until you realize it's the cache, you have already lost a few more hairs and half an hour (at best) or more.

At least as an opt-in feature until smarter cache invalidation can be figured out, this would be freaking amazing.

Although I can understand why you (and others) would like that, it's not something I would recommend personally. If people start using it, they generally expect it to work and will be easily frustrated if it doesn't. Even if we mention that it's only alpha software and thus not complete.

@Hubro
Copy link
Author

Hubro commented Nov 28, 2021

Isn't that what flag names like --dangerously-enable-inference-cache or --enable-highly-experimental-inference-cache are for? 😁 You would have to be a pretty brave person to enable that flag for Pylint as part of a CI/CD pipeline. In any case, saving a few seconds for every CI test run isn't super helpful, but saving a few seconds of waiting every time I edit a file is worth living a little dangerously for.

But yes, I understand your point, even though I hope they do it anyway 😁

@DanielNoord
Copy link
Collaborator

I ran cProfile and fed the result to gprof2dot:

Is it me or is most of the time spent in the bottom left quadrant: handling of messages, messages states, symbols and ids? I'm not sure if I'm reading the graph correctly but we're spending 10% of the run in get_active_msgids. That seems like quite a bit..

So if anybody wants to work with this I think that's where they should be looking.

@Pierre-Sassoulas Perhaps we should update the title to reflect this as well?

@Pierre-Sassoulas
Copy link
Member

I'm going to create another issue for optimizing the message store. I'm not sure it's going to be enough to fix the startup time issue entirely but it seems like something we should pursue for sure.

@DanielNoord
Copy link
Collaborator

DanielNoord commented Dec 23, 2021

I ran cProfile and fed the result to gprof2dot

@DudeNr33 Do you happen to have the command you used to get this profile?
I used:
python -m cProfile -o output.pstats -m pylint --disable=all test.py and got:
foo

It has a lot of noice so it's not particularly useful for finding any improvable areas.

@DudeNr33
Copy link
Collaborator

@DanielNoord unfortunately I can't recall exactly. I think I ran it over the whole pylint codebase, and did not apply any additional options or filtering (my original graph also contains stdlib calls just like yours).

It could help to filter the results to only contain calls from pylint and optionally astroid, which should be possible with the pstats module. I haven't tried it yet, this was the first and until now last time I worked with profiling tools.
Working with data in tabular format might be reasonable as well, as it allows sorting and filtering.

What is remarkable in your graph is the big difference in the time spent in run:7x:__init__.
Is your graph a result from the latest master, or from a version where you already started optimisation work?

@DanielNoord
Copy link
Collaborator

DanielNoord commented Dec 24, 2021

Hmm, I'll look into the pstats module as well.

Although I think with my latest PR the issue found in your graph has been mostly fixed? When I run cProfile now the most time is spent in astroid and importing modules.

@DudeNr33
Copy link
Collaborator

Yes, that's what I thought too when I saw your graph.

I played around a bit, and while exporting to CSV is not really straightforward with pstats, using it on the command line is also quite good. For example sorting by cumulative time and only outputting calls within pylint itself:

import pstats

p = pstats.Stats("output.pstats")
p.sort_stats("cumtime")
p.print_stats("pylint/pylint", 10)

This gives:

Fri Dec 24 11:34:23 2021    output.pstats

         18251477 function calls (17126994 primitive calls) in 4.765 seconds

   Ordered by: cumulative time
   List reduced from 3309 to 507 due to restriction <'pylint/pylint'>
   List reduced from 507 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    4.787    4.787 /Users/andreas/programming/forks/pylint/pylint/__main__.py:6(<module>)
        1    0.000    0.000    4.787    4.787 /Users/andreas/programming/forks/pylint/pylint/__init__.py:20(run_pylint)
        1    0.000    0.000    4.521    4.521 /Users/andreas/programming/forks/pylint/pylint/lint/run.py:76(__init__)
        1    0.000    0.000    4.366    4.366 /Users/andreas/programming/forks/pylint/pylint/lint/pylinter.py:1010(check)
        1    0.000    0.000    4.365    4.365 /Users/andreas/programming/forks/pylint/pylint/lint/pylinter.py:1067(_check_files)
      136    0.001    0.000    4.355    0.032 /Users/andreas/programming/forks/pylint/pylint/lint/pylinter.py:1089(_check_file)
      136    0.000    0.000    2.949    0.022 /Users/andreas/programming/forks/pylint/pylint/lint/pylinter.py:1208(get_ast)
      136    0.010    0.000    1.403    0.010 /Users/andreas/programming/forks/pylint/pylint/lint/pylinter.py:1238(check_astroid_module)
      136    0.000    0.000    1.392    0.010 /Users/andreas/programming/forks/pylint/pylint/lint/pylinter.py:1255(_check_astroid_module)
      136    0.010    0.000    0.792    0.006 /Users/andreas/programming/forks/pylint/pylint/utils/utils.py:163(tokenize_module)

(Taken from the current main branch, with output.pstats produced by running python -m cProfile -o output.pstats -m pylint --disable=all pylint)

@DanielNoord
Copy link
Collaborator

With #5596 I think it makes sense to revisit this issue and see if there is anything actionable left.

@DudeNr33 what do you think we should look at before closing this?

@Pierre-Sassoulas
Copy link
Member

After #5605 and #5606 I don't see any low hanging fruits anymore. I tested with pylint --disable=all setup.py. I can't manage to make gprof2dot works right now (error: failed to load pylint.prof, maybe they are generated by different python version?) So no screenshot but here's my analysis anyway:

  • isinstance is taking 1,5% of the runtime alone, maybe a better typing would slightly helps (we often add isinstance to sooth mypy in astroid).

  • More importantely :986(_find_and_load) cumulative time is 92% of the run. It seems like importing is the bottleneck now. A half second only to import astroid seems like a lot.

Here's the result of python3 -X importtime -c 'import astroid' for reference:

import time: self [us] | cumulative | imported package
import time:       182 |        182 |   _io
import time:        59 |         59 |   marshal
import time:       213 |        213 |   posix
import time:       463 |        916 | _frozen_importlib_external
import time:       143 |        143 |   time
import time:       210 |        353 | zipimport
import time:        68 |         68 |     _codecs
import time:      8157 |       8225 |   codecs
import time:      4532 |       4532 |   encodings.aliases
import time:      2092 |      14848 | encodings
import time:       526 |        526 | encodings.utf_8
import time:       183 |        183 | _signal
import time:       910 |        910 | encodings.latin_1
import time:        34 |         34 |     _abc
import time:       527 |        560 |   abc
import time:       539 |       1099 | io
import time:        43 |         43 |       _stat
import time:       522 |        564 |     stat
import time:      1066 |       1066 |     _collections_abc
import time:       459 |        459 |       genericpath
import time:       481 |        939 |     posixpath
import time:       781 |       3349 |   os
import time:       451 |        451 |   _sitebuiltins
import time:        57 |         57 |     _locale
import time:       411 |        467 |   _bootlocale
import time:      1220 |       1220 |     apport_python_hook
import time:       396 |       1615 |   sitecustomize
import time:      4927 |      10808 | site
import time:       989 |        989 |     types
import time:       833 |        833 |     warnings
import time:      1833 |       3654 |   importlib
import time:       954 |        954 |         enum
import time:        58 |         58 |           _sre
import time:       560 |        560 |             sre_constants
import time:      1033 |       1593 |           sre_parse
import time:       788 |       2437 |         sre_compile
import time:        58 |         58 |               _operator
import time:      1129 |       1187 |             operator
import time:       424 |        424 |             keyword
import time:        34 |         34 |               _heapq
import time:       499 |        533 |             heapq
import time:        85 |         85 |             itertools
import time:       496 |        496 |             reprlib
import time:        47 |         47 |             _collections
import time:      1360 |       4129 |           collections
import time:        37 |         37 |           _functools
import time:      3949 |       8114 |         functools
import time:       451 |        451 |         copyreg
import time:       751 |      12704 |       re
import time:       547 |      13251 |     fnmatch
import time:        70 |         70 |       nt
import time:        49 |         49 |       nt
import time:        46 |         46 |       nt
import time:        46 |         46 |       nt
import time:       608 |        817 |     ntpath
import time:        47 |         47 |     errno
import time:       734 |        734 |       urllib
import time:      1519 |       2253 |     urllib.parse
import time:      3700 |      20066 |   pathlib
import time:      1277 |       1277 |   astroid.__pkginfo__
import time:       423 |        423 |         collections.abc
import time:       704 |        704 |         contextlib
import time:      1683 |       2809 |       typing
import time:       127 |        127 |               _opcode
import time:       619 |        745 |             opcode
import time:       744 |       1488 |           dis
import time:       442 |        442 |           importlib.machinery
import time:       480 |        480 |               token
import time:      1048 |       1527 |             tokenize
import time:       389 |       1915 |           linecache
import time:      1941 |       5784 |         inspect
import time:       518 |        518 |               _weakrefset
import time:       873 |       1390 |             weakref
import time:      1384 |       1384 |             wrapt._wrappers
import time:      6921 |       9695 |           wrapt.wrappers
import time:      1350 |       1350 |             threading
import time:      1461 |       2811 |           wrapt.decorators
import time:      1089 |       1089 |           wrapt.importer
import time:      2134 |      15727 |         wrapt
import time:        77 |         77 |             copy_reg
import time:       449 |        449 |               lazy_object_proxy.utils_py3
import time:       550 |        999 |             lazy_object_proxy.utils
import time:      1466 |       1466 |             lazy_object_proxy.cext
import time:       572 |        572 |             lazy_object_proxy._version
import time:      4391 |       7503 |           lazy_object_proxy
import time:       687 |       8190 |         astroid.util
import time:       635 |        635 |           pprint
import time:      2008 |       2643 |         astroid.context
import time:      1714 |       1714 |         astroid.exceptions
import time:      7641 |       7641 |         typing_extensions
import time:       616 |      42312 |       astroid.decorators
import time:       594 |        594 |       astroid.mixins
import time:      1371 |       1371 |         astroid.const
import time:      1881 |       1881 |           astroid.interpreter._import
import time:      1275 |       1275 |             distutils
import time:       385 |        385 |                 __future__
import time:        68 |         68 |                   binascii
import time:       631 |        631 |                     importlib.abc
import time:       503 |       1133 |                   importlib.util
import time:        66 |         66 |                     zlib
import time:       367 |        367 |                       _compression
import time:       251 |        251 |                       _bz2
import time:       485 |       1103 |                     bz2
import time:       217 |        217 |                       _lzma
import time:       431 |        647 |                     lzma
import time:        36 |         36 |                     pwd
import time:        26 |         26 |                     grp
import time:      1033 |       2908 |                   shutil
import time:        66 |         66 |                     _struct
import time:       337 |        402 |                   struct
import time:      1019 |       5528 |                 zipfile
import time:       599 |        599 |                 pkgutil
import time:      1913 |       1913 |                 platform
import time:        53 |         53 |                     math
import time:       120 |        120 |                     _datetime
import time:      1114 |       1286 |                   datetime
import time:      1129 |       1129 |                       xml
import time:      1078 |       2206 |                     xml.parsers
import time:       107 |        107 |                     pyexpat
import time:       431 |       2744 |                   xml.parsers.expat
import time:      1234 |       5263 |                 plistlib
import time:       578 |        578 |                   email
import time:       674 |        674 |                     email.errors
import time:        36 |         36 |                             _string
import time:       843 |        878 |                           string
import time:       465 |       1343 |                         email.quoprimime
import time:       524 |        524 |                           base64
import time:       402 |        925 |                         email.base64mime
import time:       451 |        451 |                             quopri
import time:       418 |        869 |                           email.encoders
import time:       874 |       1743 |                         email.charset
import time:       950 |       4960 |                       email.header
import time:        34 |         34 |                             _bisect
import time:       424 |        457 |                           bisect
import time:        27 |         27 |                           _sha512
import time:        22 |         22 |                           _random
import time:       616 |       1122 |                         random
import time:       136 |        136 |                           _socket
import time:       617 |        617 |                             select
import time:       885 |       1501 |                           selectors
import time:      1593 |       3229 |                         socket
import time:      1011 |       1011 |                             locale
import time:       919 |       1929 |                           calendar
import time:       557 |       2485 |                         email._parseaddr
import time:       813 |       7647 |                       email.utils
import time:       561 |      13167 |                     email._policybase
import time:       945 |      14785 |                   email.feedparser
import time:      8210 |      23572 |                 email.parser
import time:       721 |        721 |                 tempfile
import time:      1194 |       1194 |                 textwrap
import time:      6297 |       6297 |                 pkg_resources.extern
import time:      1130 |       1130 |                     pkg_resources._vendor
import time:      1168 |       2297 |                   pkg_resources._vendor.six
import time:       149 |       2446 |                 pkg_resources.extern.six
import time:       560 |        560 |                     pkg_resources._vendor.six
import time:        70 |        629 |                   pkg_resources._vendor.six.moves
import time:        56 |        685 |                 pkg_resources.extern.six.moves
import time:        20 |         20 |                   pkg_resources._vendor.six.moves
import time:        50 |         69 |                 pkg_resources._vendor.six.moves.urllib
import time:       461 |        461 |                 pkg_resources.py31compat
import time:       910 |        910 |                   pkg_resources._vendor.appdirs
import time:        81 |        990 |                 pkg_resources.extern.appdirs
import time:       436 |        436 |                     pkg_resources._vendor.packaging.__about__
import time:      3589 |       4025 |                   pkg_resources._vendor.packaging
import time:        72 |       4097 |                 pkg_resources.extern.packaging
import time:       378 |        378 |                   pkg_resources.extern.packaging._structures
import time:      2111 |       2489 |                 pkg_resources.extern.packaging.version
import time:       379 |        379 |                   pkg_resources.extern.packaging._compat
import time:      6610 |       6989 |                 pkg_resources.extern.packaging.specifiers
import time:        74 |         74 |                             org
import time:        17 |         90 |                           org.python
import time:        13 |        103 |                         org.python.core
import time:       539 |        641 |                       copy
import time:      1431 |       1431 |                       traceback
import time:     27210 |      29281 |                     pkg_resources._vendor.pyparsing
import time:       107 |      29388 |                   pkg_resources.extern.pyparsing
import time:        42 |         42 |                   pkg_resources.extern.six.moves.urllib
import time:      1463 |       1463 |                   pkg_resources.extern.packaging.markers
import time:      8461 |      39353 |                 pkg_resources.extern.packaging.requirements
import time:      4606 |       4606 |                 sysconfig
import time:     52302 |     159948 |               pkg_resources
import time:       338 |     160286 |             astroid.interpreter._import.util
import time:      1009 |     162569 |           astroid.interpreter._import.spec
import time:       614 |        614 |             distutils.errors
import time:       426 |        426 |                 distutils.dep_util
import time:       355 |        355 |                   distutils.debug
import time:       504 |        504 |                   distutils.log
import time:       548 |       1406 |                 distutils.spawn
import time:       559 |       2391 |               distutils.util
import time:       889 |       3280 |             distutils.sysconfig
import time:       854 |       4746 |           astroid.modutils
import time:       463 |        463 |           astroid.transforms
import time:      3030 |     172688 |         astroid.manager
import time:      4830 |     178888 |       astroid.bases
import time:      8974 |       8974 |       astroid.nodes.const
import time:       737 |        737 |         astroid.nodes.as_string
import time:      1306 |       2042 |       astroid.nodes.node_ng
import time:      5136 |     240752 |     astroid.nodes.node_classes
import time:       467 |        467 |       astroid.interpreter.dunder_lookup
import time:      1416 |       1883 |     astroid.nodes.scoped_nodes
import time:      4791 |     247424 |   astroid.nodes
import time:        57 |         57 |       _ast
import time:       764 |        820 |     ast
import time:    177912 |     177912 |       astroid.raw_building
import time:       656 |     178567 |     astroid.helpers
import time:       569 |        569 |       astroid.arguments
import time:       972 |       1541 |     astroid.protocols
import time:      1415 |     182342 |   astroid.inference
import time:      1575 |       1575 |   astroid.astroid_manager
import time:      1293 |       1293 |     astroid.brain
import time:       712 |       2005 |   astroid.brain.helpers
import time:        76 |         76 |           typed_ast
import time:        53 |        128 |         typed_ast.ast3
import time:       655 |        783 |       astroid._ast
import time:      1616 |       2399 |     astroid.rebuilder
import time:       561 |       2959 |   astroid.builder
import time:       355 |        355 |   astroid.inference_tip
import time:       701 |        701 |   astroid.objects
import time:       305 |        305 |   astroid.brain.brain_numpy_utils
import time:     46840 |     509498 | astroid

Maybe we can create an issue directly in astroid to make importing faster ?

@DudeNr33
Copy link
Collaborator

Sorry I did not respond yet, I haven't found the time to look at it properly. But I trust your analysis, if there are no more low hanging fruits we should not keep this issue open "just in case", but rather create a new one if we find spots were concrete improvements can be done.

I also noticed that the astroid import is a large part of the startup time. I did not know about python -X importtime, great to learn something new. 👍 Why did I even bother writing a custom script for that. 😀 Creating a new issue in astroid is probably the best way to proceed.

@Pierre-Sassoulas Pierre-Sassoulas added this to the 2.13.0 milestone Dec 29, 2021
@Pierre-Sassoulas
Copy link
Member

I reread prior comments, where you said the startup time was:

75% due to the import of astroid.

The import time from astroid now takes 92% of the total, I think it's safe to say, if we find area for improvement in pylint, it's probably going to be impactful only once we take care of astroid import time. I'm going to close this issue, thank you all who participated !

@Pierre-Sassoulas
Copy link
Member

I opened pylint-dev/astroid#1320 for the follow-up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

6 participants