Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No pylint error messages when multiple pylint jobs used #320

Closed
seanh opened this issue Feb 27, 2019 · 30 comments
Closed

No pylint error messages when multiple pylint jobs used #320

seanh opened this issue Feb 27, 2019 · 30 comments

Comments

@seanh
Copy link

seanh commented Feb 27, 2019

I can add jobs: 8 to the pylint section of my .prospector.yaml and it runs much faster but no pylint errors appear in prospector's output

@seanh
Copy link
Author

seanh commented Feb 27, 2019

Adding the --direct-tool-stdout option it looks like pylint or prospector is printing a lot of internal error with sending report for module ['*.py'] 'indent-strict-spaces' errors when more than one pylint job is used

@carlio
Copy link
Contributor

carlio commented Feb 27, 2019

You can also try the --die-on-tool-error flag (or -X for short) in case there are exceptions not being correctly handled however that case should result in messages stating that.

@seanh
Copy link
Author

seanh commented Feb 27, 2019

Tried --die-on-tool-error but it doesn't seem to crash

@carlio
Copy link
Contributor

carlio commented Feb 27, 2019

Ok, I'll try reproducing it after putting out the fires caused by the pylint 2.3 release :-)

@seanh
Copy link
Author

seanh commented Feb 27, 2019

Thanks!

@chocoelho
Copy link
Contributor

IIRC, we still don't support jobs usage. There's still a way to capture the results from pylint multithreaded run, I started working on that a while ago but my time got limited on that, thus leading me to put that aside for a while.

@seanh
Copy link
Author

seanh commented Feb 27, 2019

@chocoelho How hard would it be to add jobs support? Any pointers? I think it'd reduce the time to run prospector on my codebase locally to about half of what it is now. I'm sure my developers would appreciate it.

Prospector 1.1.6.2 takes just over a minute to run on my codebase in a Python 3.6 venv. Running pylint (2.1.1, astroid 2.0.4) from this same venv on its own takes over 40s. So most of prospector's time must be spent waiting for pylint. With -j 8 pylint goes from 40s to 12s.

@janneronkko
Copy link

I would also like to see jobs support for Pylint under Prospector.

I took a look at the code and the reason for Pylint to fail is the following:

  • Pylint starts worker processes (via multiprocessing) which create new PyLinter objects
  • The PyLinter object in worker process does not have Prospectors prospector.tools.pylint.indent_checker.IndentChecker registered

Now when the worker process starts, pylint.config.OptionsManagerMixIn.load_configuration_from_config gets called with configuration dict containing, among others, indent-strict-tabs defined by IndentChecker. As PyLint does not know anything about Prospector's IndentChecker, the _all_options member of OptionsManagerMixin does not contain indent-strict-tabs and indexing results in KeyError

With a quick look, Prospector seems to do all kind of initialization for for PyLinter in prospector.tools.pylint.init.PylintTool which would need to be done inside PyLint's worker process. This means that getting PyLint's jobs argument to work with Prospector looks quite hard.

@chocoelho Did you have a design how to implement jobs support?

To me it looks like it would be easier to implement jobs support for PyLint inside Prospector on Prospector's side.

@chocoelho
Copy link
Contributor

@jannero as this was a while ago, so I just had the time to only check how Pylint implemented it's jobs settings and play a bit with the current checker implementation on prospector side, I didn't have the time to go further so this kinda got lost in my workspace when I switched machines. I'd really appreciate to review and test a PR for that if you're up to 😃

The configuration inside PylintTool is basically setting up the pylint tool for the run and handling the output after pylint process runs. One place to start is here: https://github.com/PyCQA/prospector/blob/master/prospector/tools/pylint/__init__.py#L253 (meaning, when pylint jobs is on, https://github.com/PyCQA/prospector/blob/master/prospector/tools/pylint/__init__.py#L257 returns empty as this is being called before the actual jobs are finished).

@janneronkko
Copy link

Ok, I had the time to do a proof of concept and it seems to work.

I used one of my projects as test and with single core implementation (the original) it took around 110 seconds to run PyLint through Prospector. With the changes in janneronkko@6236ad9 it took about 38 seconds

The change is not finished and therefore I did not create PR for this; I wanted to ask if you would be willing to pull this kind of implementation? @chocoelho

PyLint also does parallel checks file by file so in that sense this implementation is pretty similar.

Some things that should be done before the change is ready

  • test the change with supported Python version (I've only tested on Python 3.7)
  • test the change on Windows (I've only tested on Archlinux)
  • remove _self.args as that is not used at all (see note below)

Note about self._args
I think the code between https://github.com/jannero/prospector/blob/parallel-lint/prospector/tools/pylint/__init__.py#L122 and https://github.com/jannero/prospector/blob/parallel-lint/prospector/tools/pylint/__init__.py#L147 can be removed

@janneronkko
Copy link

I can also check if PyLint could be fixed (easily) so you would not need to use multiprocessing on Prospector's side.

That would be a better solution but most likely it requires bigger changes than the one already made.

@chocoelho
Copy link
Contributor

@jannero that's awesome! I'm slowly coming back again as my time allows, but I'd more than happy to test it out. I'll probably do it over the weekend and feel free to open a draft PR for the time being, while the feature is developed.

@janneronkko
Copy link

I created PR for refactoring: pylint-dev/pylint#3015

If you like, I can continue working on top of that commit until linting parallel works also under Prosector or you could just merge that refactoring now and I'll submit more changes as separate PR.

@janneronkko
Copy link

I closed the original PR and created new one containing the refactoring in the previous PR (3015) and parallel linting refactoring that allow Prospector to run PyLint parallel.

Should there be a change in Prospector to run PyLint parallel by default?

@chocoelho
Copy link
Contributor

Better to keep it separate. Makes it easier to review. As to the default, keep the current behavior as default, parallels as optional.

@carlio feel like this should be part of 1.2.0?

@chocoelho
Copy link
Contributor

Ah, noticed now that this involves @PyCQA/pylint-dev as well, once they dealt with it we can see it going into prospector :) hopefully with few changes.

@janneronkko
Copy link

If the change gets merged into PyLint, there is no required changes to Prospector. It is enough to set jobs option in profile YAML:

pylint:
  options:
    jobs: 8

So I just asked if I (or someone else) should create PR for changing the default behaviour of PyLint in Prospector to use parallel linting by default.

@janneronkko
Copy link

Also, in my opinion, there should maybe be an special value auto stating that automatic amount of parallel linters were used. PyLint implements this with a magic value of 0 but that does not work for Prospector out of the box as the magic value is handled in PyLint's Run class that is not used by Prospector.

@janneronkko
Copy link

Now the PyLint PR 3016 (pylint-dev/pylint#3016) contains changes that also works with spawn method, i.e. parallel linting should work also on Windows.

I have tested the change with two of my own projects and the results look to be fine though I have not (yet) checked messages one by one.

@chocoelho
Copy link
Contributor

So I just asked if I (or someone else) should create PR for changing the default behaviour of PyLint in Prospector to use parallel linting by default.

Please, do it. I want to check the changes, and maybe we can change the way it is now no our side to make things easier.

Also, in my opinion, there should maybe be an special value auto stating that automatic amount of parallel linters were used. PyLint implements this with a magic value of 0 but that does not work for Prospector out of the box as the magic value is handled in PyLint's Run class that is not used by Prospector.

That'd be put into prospector configuration file as an option?

@janneronkko
Copy link

I will do a PR for these once the changes to PyLint are merged.

The auto value would go to Prospector profile file under pylint/options like you suspected:

pylint:
  options:
    jobs: auto

Or you could just specify the exact number:

pylint:
  options:
    jobs: 8

Prospector would check if the value is auto and change that to a number (most likely the number of cores on current machine).

@chocoelho chocoelho added this to the 1.2.0 milestone Jul 26, 2019
@janneronkko
Copy link

The PyLint PR (pylint-dev/pylint#3016) was merged and the fix is going to be included in PyLint 2.5.0

I already have the configuration changes to support jobs=auto but while testing PyLint from master I got a lot of duplicate messages (I updated the used astroid version to one required by PyLint). Is this expected? How are you testing new PyLint versions?

@janneronkko
Copy link

There are the required changes to support auto option for PyLint in https://github.com/jannero/prospector/commits/parallel-pylint

Add support for auto option: janneronkko@c815091
Make auto option default: janneronkko@4f24900

@chocoelho
Copy link
Contributor

Thanks @jannero, I was trying to keep an eye in that PR from time to time. As I'm back from vacations now, I should be taking a look at it in the next few days. The duplicated messages is not an expected behavior, I'll be taking care of that.

@chocoelho
Copy link
Contributor

@jannero I created a branch here and cherry-picked your commits, thanks for that. Meanwhile, I'm adding more tests for the pylint tool in order to improve reliability on how prospector integrates it.

@rik
Copy link
Contributor

rik commented May 25, 2020

I looked at the commits on pylint-jobs.

Pylint supports passing 0 for autodetection. Maybe we should pass 0 to rely on Pylint's logic. It is a bit more evolved than what the branch does. It would also mean that any improvement in pylint's detection would benefit prospector. And we don't need a new auto value in the config files, making it simpler to document prospector.

@rik
Copy link
Contributor

rik commented May 25, 2020

Hmm, turns out the way prospector uses pylint does not allow 0.

Traceback (most recent call last):
  File "/Users/rik24d/.local/share/virtualenvs/i-make-reviews-sygjxbSg/bin/prospector", line 8, in <module>
    sys.exit(main())
  File "/Users/rik24d/.local/share/virtualenvs/i-make-reviews-sygjxbSg/lib/python3.7/site-packages/prospector/run.py", line 173, in main
    prospector.execute()
  File "/Users/rik24d/.local/share/virtualenvs/i-make-reviews-sygjxbSg/lib/python3.7/site-packages/prospector/run.py", line 66, in execute
    messages += tool.run(found_files)
  File "/Users/rik24d/.local/share/virtualenvs/i-make-reviews-sygjxbSg/lib/python3.7/site-packages/prospector/tools/pylint/__init__.py", line 267, in run
    self._linter.check(self._args)
  File "/Users/rik24d/.local/share/virtualenvs/i-make-reviews-sygjxbSg/lib/python3.7/site-packages/pylint/lint/pylinter.py", line 878, in check
    files_or_modules,
  File "/Users/rik24d/.local/share/virtualenvs/i-make-reviews-sygjxbSg/lib/python3.7/site-packages/pylint/lint/check_parallel.py", line 91, in check_parallel
    with multiprocessing.Pool(jobs, initializer=initializer, initargs=[linter]) as pool:
  File "/Users/rik24d/.pyenv/versions/3.7.7/lib/python3.7/multiprocessing/context.py", line 119, in Pool
    context=self.get_context())
  File "/Users/rik24d/.pyenv/versions/3.7.7/lib/python3.7/multiprocessing/pool.py", line 169, in __init__
    raise ValueError("Number of processes must be at least 1")

@rik
Copy link
Contributor

rik commented May 25, 2020

In terms of API, I think we should use 0 instead of auto to be consistent with pylint.

@chocoelho
Copy link
Contributor

Closing this issue as this is now fixed in the next release: 1.3.0

@simonschmidt
Copy link

I see similar behavior with 1.3.1, but only some errors get lost.

I can reproduce on this repository on current master 9a8c0e365f1d9efb19797a8f45edb6f4ea6b8818

prospector --tool pylint > output_normal.log

# Manually edit .prospector.yml, add options section to pylint with `jobs: 8`
# ...

prospector --tool pylint > output_jobs.log

# For cleaner diff, grab mentioned files, the lines all start with prospector/
grep prospector/ output_normal.log | sort > files_normal.log
grep prospector/ output_jobs.log | sort > files_jobs.log

diff files_normal.log files_jobs.log
21d20
< prospector/tools/pylint/indent_checker.py

--
I've been able to reproduce in other repositories as well by removing a few pylint: disable=.. statements and comparing the output similarly to above.

I have not found any pattern as to what errors get dropped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants