Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spawning child-process when using multiprocessing is very slow #6967

Closed
Pierre-Sassoulas opened this issue Jun 17, 2022 · 1 comment
Closed
Labels
Duplicate 🐫 Duplicate of an already existing issue High effort 🏋 Difficult solution or problem to solve multiprocessing

Comments

@Pierre-Sassoulas
Copy link
Member

Pierre-Sassoulas commented Jun 17, 2022

Bug description

See: #6965 (comment)

I've noticed that spawning the child processes is extremely slow - about 1-2 processes per second. So, 60-way parallelism generally means that there is about a 30 s delay while things spin up, then almost no time spent doing work. This slow startup time was happening despite my CPUs being about 70% idle. This slow startup time presumably explains the bug I saw about how ineffectual multiprocessing was for pylint. You would need a huge batch of files to justify doing significant multi-processing.

I don't know whether the slow startup is a bug in multiprocessing or in pylint. I just measured some of our presubmits and 60-way parallelism more than doubles the time that they take, from ~30 to ~70 s.

Possible solution in #6965 (comment)

We would need a complete rewrite of the parallel code. We currently spin up a new PyLinter class for every job. That is taking way too long (probably). But I'm not sure what the best approach is to create a PyLinterLite...

I personally think that a refactor of PyLinter will be required, and we'd have to classify checkers to know if they can benefit from multiprocessing or not. duplicate-code or cyclic import won't for example as they need information on the imports of a file. Some check are are file based like unused-private-member (the scope is a single class) or while-used (it just has to check if a while node exists) and can benefit from multiprocessing if done at the right time.

Configuration

We should use a full configuration for this with a lot to parse, as we're probably parsing the configuration in each forks and this would make it apparent.

Command used

``lint.Run(['--jobs', '42'] + argv)``

Expected behavior

Run time decrease with more core (when there is more files to lint than cores available).

Pylint version

2.14.2
@Pierre-Sassoulas Pierre-Sassoulas added High effort 🏋 Difficult solution or problem to solve multiprocessing labels Jun 17, 2022
@Pierre-Sassoulas Pierre-Sassoulas changed the title Multiprocessing is not very efficient Spawning child-process when using multiprocessing is very slow Jun 17, 2022
@DanielNoord
Copy link
Collaborator

Duplicate of #2525 😄

@DanielNoord DanielNoord closed this as not planned Won't fix, can't repro, duplicate, stale Jun 17, 2022
@jacobtylerwalls jacobtylerwalls added the Duplicate 🐫 Duplicate of an already existing issue label Jun 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate 🐫 Duplicate of an already existing issue High effort 🏋 Difficult solution or problem to solve multiprocessing
Projects
None yet
Development

No branches or pull requests

3 participants