Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor file checking to allow parallel linting in Prospector #3016

Merged
merged 1 commit into from
Oct 18, 2019

Conversation

janneronkko
Copy link

@janneronkko janneronkko commented Jul 20, 2019

Description

The PyLinter.check and related methods are refactored to remove copy-paste code and to allow simpler parallel linting implementation.

PyLinter is made pickleable so that the configured linter can be passed to worker processes (even when not using fork method) so custom PyLinter objects can be used when linting parallel. For example, Prospector, will benefit from this. See landscapeio/prospector#320

Type of Changes

Type
🔨 Refactoring

@coveralls
Copy link

coveralls commented Jul 20, 2019

Coverage Status

Coverage increased (+0.2%) to 89.836% when pulling 21dc875 on jannero:parallel into 662c9e4 on PyCQA:master.

@janneronkko
Copy link
Author

I got some encoding errors when trying to run the tests in a Windows VM but the same error can be reproduced on Linux by setting multiprocessing start type to spawn:

c = multiprocessing.get_context('spawn')
with c.Pool(...)...

So PyLinter should be made picklable and the easiest way would be to split the linting implementation out from PyLinter class into separate class that would contain only the required functionality for linting, i.e. all configuration etc. would be left into PyLinter.

@janneronkko janneronkko changed the title Refactor parallel linting Refactor file checking Jul 23, 2019
@janneronkko
Copy link
Author

I updated this PR to contain only the refactoring changes that cleanup copy-paste code and other things to allow easier refactoring of parallel linting.

Parallel linting refactoring needs a bit more thinking so I decided to get the changes required for the actual refactoring merged back so there would be no conflicts due to changes in PyLint master

@janneronkko
Copy link
Author

I tried to split PyLinter into two classes where one would take care of configurations etc. and the second would contain linting logic but that proved to be a very big change.

Instead I was able to introduce couple quite small changes that made PyLinter objects pickleable. I will update this PR as soon as I get the changes polished a bit.

@janneronkko janneronkko changed the title Refactor file checking Refactor file checking to allow parallel linting in Prospector Jul 25, 2019
@janneronkko
Copy link
Author

If it makes reviewing easier, I can split this PR into multiple PRs. Also note that each commit in this PR is a atomic change and all commits can be reviewed one by one to be able to see why different changes were made.

@janneronkko
Copy link
Author

The commits in this PR were rebased on top of commi 41c9522 containing conflicting changes, i.e. the conflicts are now resolved.

@janneronkko
Copy link
Author

Rebased the changes once more as there was conflicts in the changes merged to master

@janneronkko
Copy link
Author

@PCManticore I have been resolving conflicts from code pushed to master couple of times and would like to know if this PR is something you are even considering pulling?

Is there something I could do to make reviewing easier or help in some other way to get this PR merged?

Copy link
Contributor

@AWhetter AWhetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice to see this area of the code getting some love! Thanks for the change.

My only request is please could you add some more complete docstrings and/or type hints on the new methods that you've added.

pylint/lint.py Outdated
linter.reporter = None

# The linter is inherited by all the pool's workers, i.e. the linter
# is identical to the linter object here. This is requirde so that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trivial: Typo here for "required".

@janneronkko
Copy link
Author

Fixed the typo and added docstrings.

Anything else you would like to get changed and/or updated?

@janneronkko
Copy link
Author

Is there anyyhing I can do to make this PR move forward?

@AWhetter
Copy link
Contributor

@PCManticore I've resolved the conflicts and squashed all the commits. Did you want to review before we merge?

chocoelho referenced this pull request in landscapeio/prospector Oct 15, 2019
Copy link
Contributor

@PCManticore PCManticore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome stuff @jannero ! This looks good to go, can you solve the conflicts and move the changelog to 2.5, same for the documentation?

@@ -15,7 +15,17 @@ Release date: TBA
What's New in Pylint 2.4.3?
===========================

Release date: TBA
* Allow parallel linting when run under Prospector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to the 2.5 section instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pylint/lint.py Outdated
# The linter is inherited by all the pool's workers, i.e. the linter
# is identical to the linter object here. This is requred so that
# a custom PyLinter object (inherited from PyLinter) can be used.
# See https://github.com/PyCQA/prospector/issues/320
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a link to prospector's issue tracker adds any value here, can we remove and expand the comment if needed more context instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pylint/lint.py Outdated
def _check_astroid_module(self, ast_node, walker, rawcheckers, tokencheckers):
"""Check given AST node with given walker and checkers

ast_mode: AST node of the module to check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we switch to restructuredText instead? It's most used structured documentation format used in the codebase.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

This change prepares the code for enabling Prospector to take advantage
of running PyLint parallel.

Iterating files is moved into generator (_iterate_file_descrs) so that
parallel checking can use the same implementation (_check_file) just
by providing different kind of generator that reads the files from parent
process.

The refactoring removes code duplication that existed in PyLinter._do_check
method; checking module content from stdin had identical implementation to
checking content from a source file.

Made PyLinter.expand_files a private method.

The previous implementation of parallel linting created new PyLinter
objects in the worker (child) process causing failure when running under
Prospector because Prospector uses a custom PyLinter class
(a class inherited from PyLinter)
and PyLint naturally just creates PyLinter object. This caused linting to
fail because there is options for Prospector's IndentChecker which was not
created in the worker process.

The new implementation passes the original PyLinter object into workers
when the workers are created. See https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

Note that as Windows uses spawn method by default, PyLinter object (and
its) members need to be pickleable from now on with the exception being
PyLinter.reporter which is not passed to child processes.

The performance has remained about the same based on quick tests done with
Django project containing about 30 000 lines of code; with the old
implementation linting took 26-28 seconds with 8 jobs on quad core i7 and
24-27 seconds with the new implementation.
@PCManticore
Copy link
Contributor

Thank you @jannero !

@s0undt3ch
Copy link
Contributor

Does this solve running in parallel with load_plugins?
Couldn't find an issue pointing to the parallel and load_plugins limitation. Is there one?

@jfly
Copy link
Contributor

jfly commented Dec 7, 2021

@s0undt3ch I don't know if there any issues, but I've documented my understanding of the situation here: #4874 (comment). In short: on macOS with Python 3.8+ and the changes introduced by this PR, astroid transformation plugins won't work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants