Skip to content
This repository

Ruby: parallel processing made simple and fast

Readme.md

Run any code in parallel Processes(> use all CPUs) or Threads(> speedup blocking operations).
Best suited for map-reduce or e.g. parallel downloads/uploads.

Install

gem install parallel

Usage

# 2 CPUs -> work in 2 processes (a,b + c)
results = Parallel.map(['a','b','c']) do |one_letter|
  expensive_calculation(one_letter)
end

# 3 Processes -> finished after 1 run
results = Parallel.map(['a','b','c'], :in_processes=>3){|one_letter| ... }

# 3 Threads -> finished after 1 run
results = Parallel.map(['a','b','c'], :in_threads=>3){|one_letter| ... }

Same can be done with each

Parallel.each(['a','b','c']){|one_letter| ... }

or each_with_index or map_with_index

Processes/Threads are workers, they grab the next piece of work when they finish.

Processes

  • Speedup through multiple CPUs
  • Speedup for blocking operations
  • Protects global data
  • Extra memory used ( very low on REE through copy_on_write_friendly )
  • Child processes are killed when your main process is killed through Ctrl+c or kill -2

Threads

  • Speedup for blocking operations
  • Global data can be modified
  • No extra memory used

ActiveRecord

Try any of those to get working parallel AR

# reproducibly fixes things (spec/cases/map_with_ar.rb)
Parallel.each(User.all, :in_processes => 8) do |user|
  user.update_attribute(:some_attribute, some_value)
end
User.connection.reconnect!

# maybe helps: explicitly use connection pool
Parallel.each(User.all, :in_threads => 8) do |user|
  ActiveRecord::Base.connection_pool.with_connection do
    user.update_attribute(:some_attribute, some_value)
  end
end

# maybe helps: reconnect once inside every fork
Parallel.each(User.all, :in_processes => 8) do |user|
  @reconnected ||= User.connection.reconnect! || true
  user.update_attribute(:some_attribute, some_value)
end

Break

Parallel.map(User.all) do |user|
  raise Parallel::Break # -> stop all execution
end

Progress / ETA

Use :finish or :start hook to get progress information, :start has item and index, :finish has item, index, result.

gem install ruby-progressbar
require 'ruby-progressbar'
progress = ProgressBar.create(:title => "The Progress", :total => 100)
Parallel.map(1..100, :finish => lambda { |item, i, result| progress.increment }) { sleep 1 }

Tips

  • [Benchmark/Test] Disable threading/forking with :in_threads => 0 or :in_processes => 0, great to test performance or to debug parallel issues

TODO

  • JRuby / Windows support <-> possible ?

Authors

Contributors

Michael Grosser
michael@grosser.it
License: MIT
Build Status

Something went wrong with that request. Please try again.