[Fix #117] Add --parallel option #4272

jonas054 · 2017-04-12T18:48:56Z

Another stab at an old problem.

This change tries to solve the tricky parallel execution problem by spawning off a number of processes/threads to do file inspection, sharing the work between them, without collecting any output. When all processes are finished, the original process runs the full inspection again, taking advantage of result caching.

There's not a lot of specs added for parallel execution. I've run on RuboCop's own source with --force-default-config to see that I get the same offenses with and without -P.

With MRI there seems to be a speed gain of around 3 times when running on an 8 core machine. With JRuby and Rubinius, it's about 2 times.

@bbatsov said in #3794

I think we also have some cops that depend on the order other cops are executed

With the solution proposed here, this would not matter. Files are inspected in parallel, but cops are being run in sequence on a given file.

and a few cops that work on all processed files that were added afterwards - I'll have to double check this.

I have looked but have not found any evidence for tricky cops like that, and I have no memory of them being added. Let's hope I'm right about that, because otherwise I don't think there's any easy way to parallelize the execution.

This change tries to solve the tricky parallel execution problem by spawning off a number of processes/threads to do file inspection, sharing the work between them, without collecting any output. When all processes are finished, the original process runs the full inspection again, taking advantage of result caching.

rrosenblum · 2017-04-12T19:39:29Z

spec/rubocop/cli/cli_options_spec.rb

+        it 'prints a warning' do
+          cli.run ['-P']
+          expect($stderr.string)
+            .to include('Process.fork is not supported by this Ruby')


Won't this happen using JRuby as well?

No on JRuby the parallel gem uses threads instead of processes.

rrosenblum · 2017-04-12T19:44:12Z

This change tries to solve the tricky parallel execution problem by spawning off a number of processes/threads to do file inspection, sharing the work between them, without collecting any output. When all processes are finished, the original process runs the full inspection again, taking advantage of result caching.

Can we apply a similar concept to running the cops in parallel and then merge the result when each one finishes to avoid having to run rely on the cache and running checks twice?

bbatsov · 2017-04-12T20:16:41Z

👍 Great work, @jonas054! I love the simplicity or your solution of the problem!

jonas054 · 2017-04-12T20:20:55Z

Can we apply a similar concept to running the cops in parallel and then merge the result when each one finishes to avoid having to run rely on the cache and running checks twice?

Perhaps, but I think that coming up with a solution that does everything we wish for is exceedingly difficult. Otherwise we wouldn't be closing in on the four year anniversary of the issue saying that we'd like to run RuboCop in parallel -- without any successful resolution. 😄

What I'm trying to do here is to set the bar a bit lower, in order to get something that works.

Glad to see @bbatsov agreeing.

cheerfulstoic · 2017-07-24T19:37:31Z

Sorry if this was mentioned somewhere else: Why is --parallel mutually exclusive to --auto-correct?

jonas054 · 2017-07-25T11:30:41Z

The error message says -P/--parallel uses caching to speed up execution, while --auto-gen-config needs a non-cached run, so they cannot be combined.

A more detailed explanation is that it might be possible to get parallel configuration generation to work, but it would require some extra logic, and I didn't think it worth the effort when I made the implementation. I wanted something simple. Anyone who's interested is most welcome to give the matter some thought and let the rest of us know what you came up with. 😄

cheerfulstoic · 2017-07-25T12:16:08Z

Awesome, thanks for the contribution! We were super blown away when we found it. I'm just glad to know that parallel and auto-correct aren't fundamentally impossible together.

rrosenblum reviewed Apr 12, 2017

View reviewed changes

bbatsov merged commit 1551c24 into rubocop:master Apr 12, 2017

jonas054 deleted the 117_add_parallel_option branch April 12, 2017 20:21

jonas054 mentioned this pull request Apr 17, 2017

More illegal combinations with --parallel #4284

Merged

searls mentioned this pull request Nov 17, 2018

Is autocorrect _actually_ incompatible with parallel? #6495

Closed

texpert mentioned this pull request Oct 25, 2020

Add possibility to run Rubocop in parallel mode funbox/face_control#16

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix #117] Add --parallel option #4272

[Fix #117] Add --parallel option #4272

jonas054 commented Apr 12, 2017 •

edited

Loading

rrosenblum Apr 12, 2017

jonas054 Apr 12, 2017

rrosenblum commented Apr 12, 2017

bbatsov commented Apr 12, 2017

jonas054 commented Apr 12, 2017

cheerfulstoic commented Jul 24, 2017

jonas054 commented Jul 25, 2017

cheerfulstoic commented Jul 25, 2017

[Fix #117] Add --parallel option #4272

[Fix #117] Add --parallel option #4272

Conversation

jonas054 commented Apr 12, 2017 • edited Loading

rrosenblum Apr 12, 2017

Choose a reason for hiding this comment

jonas054 Apr 12, 2017

Choose a reason for hiding this comment

rrosenblum commented Apr 12, 2017

bbatsov commented Apr 12, 2017

jonas054 commented Apr 12, 2017

cheerfulstoic commented Jul 24, 2017

jonas054 commented Jul 25, 2017

cheerfulstoic commented Jul 25, 2017

jonas054 commented Apr 12, 2017 •

edited

Loading