Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix #117] Add --parallel option #4272

Merged
merged 1 commit into from
Apr 12, 2017

Conversation

jonas054
Copy link
Collaborator

@jonas054 jonas054 commented Apr 12, 2017

Another stab at an old problem.

This change tries to solve the tricky parallel execution problem by spawning off a number of processes/threads to do file inspection, sharing the work between them, without collecting any output. When all processes are finished, the original process runs the full inspection again, taking advantage of result caching.

There's not a lot of specs added for parallel execution. I've run on RuboCop's own source with --force-default-config to see that I get the same offenses with and without -P.

With MRI there seems to be a speed gain of around 3 times when running on an 8 core machine. With JRuby and Rubinius, it's about 2 times.

@bbatsov said in #3794

I think we also have some cops that depend on the order other cops are executed

With the solution proposed here, this would not matter. Files are inspected in parallel, but cops are being run in sequence on a given file.

and a few cops that work on all processed files that were added afterwards - I'll have to double check this.

I have looked but have not found any evidence for tricky cops like that, and I have no memory of them being added. Let's hope I'm right about that, because otherwise I don't think there's any easy way to parallelize the execution.

This change tries to solve the tricky parallel execution problem
by spawning off a number of processes/threads to do file inspection,
sharing the work between them, without collecting any output.
When all processes are finished, the original process runs the
full inspection again, taking advantage of result caching.
it 'prints a warning' do
cli.run ['-P']
expect($stderr.string)
.to include('Process.fork is not supported by this Ruby')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this happen using JRuby as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No on JRuby the parallel gem uses threads instead of processes.

@rrosenblum
Copy link
Contributor

This change tries to solve the tricky parallel execution problem by spawning off a number of processes/threads to do file inspection, sharing the work between them, without collecting any output. When all processes are finished, the original process runs the full inspection again, taking advantage of result caching.

Can we apply a similar concept to running the cops in parallel and then merge the result when each one finishes to avoid having to run rely on the cache and running checks twice?

@bbatsov bbatsov merged commit 1551c24 into rubocop:master Apr 12, 2017
@bbatsov
Copy link
Collaborator

bbatsov commented Apr 12, 2017

👍 Great work, @jonas054! I love the simplicity or your solution of the problem!

@jonas054
Copy link
Collaborator Author

Can we apply a similar concept to running the cops in parallel and then merge the result when each one finishes to avoid having to run rely on the cache and running checks twice?

Perhaps, but I think that coming up with a solution that does everything we wish for is exceedingly difficult. Otherwise we wouldn't be closing in on the four year anniversary of the issue saying that we'd like to run RuboCop in parallel -- without any successful resolution. 😄

What I'm trying to do here is to set the bar a bit lower, in order to get something that works.

Glad to see @bbatsov agreeing.

@jonas054 jonas054 deleted the 117_add_parallel_option branch April 12, 2017 20:21
@cheerfulstoic
Copy link

Sorry if this was mentioned somewhere else: Why is --parallel mutually exclusive to --auto-correct?

@jonas054
Copy link
Collaborator Author

The error message says -P/--parallel uses caching to speed up execution, while --auto-gen-config needs a non-cached run, so they cannot be combined.

A more detailed explanation is that it might be possible to get parallel configuration generation to work, but it would require some extra logic, and I didn't think it worth the effort when I made the implementation. I wanted something simple. Anyone who's interested is most welcome to give the matter some thought and let the rest of us know what you came up with. 😄

@cheerfulstoic
Copy link

Awesome, thanks for the contribution! We were super blown away when we found it. I'm just glad to know that parallel and auto-correct aren't fundamentally impossible together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants