MiniMapReduce is a library that provides a simple DSL for data-processing operations in Ruby. This is not meant to be a fast engine, and is not meant to compete with Hadoop & friends. Rather, it's meant to be an expressive way to process data in your favorite programming language. Use it to explore and prototype.
Add this line to your application's Gemfile:
gem 'mini-map-reduce'
And then execute:
$ bundle
Or install it yourself as:
$ gem install mini-map-reduce
MiniMapReduce is invoked by creating a pipeline, and within that pipeline, using seeds, maps, and reductions.
An example is worth a thousand words, so here goes:
require 'mini-map-reduce'
comments = [
{author: 'kenneth', score: 20},
{author: 'joe', score: 2},
{author: 'kenneth', score: 19},
{author: 'bob', score: 33}
]
MiniMapReduce.process do
seed { comments.pop }
map {|comment| emit(comment[:author], :score => comment[:score], :count => 1) }
reduce do |author, comments|
result = {score: 0, count: 0}
comments.each do |e|
result[:score] += e[:score]
result[:count] += e[:count]
end
result
end
translate do |id, result|
result[:average] = result[:score] / result[:count]
result
end
dump {|id,r| puts "#{id} averages #{r[:average]}" }
end
A more complex example:
require 'mini-map-reduce'
# let's say cars.json contains records which look like this:
# {"name":"Audi R8","country":"DE","score":20,"votes":3}
data = File.open('cars.json', 'r')
out = File.open('out.json', 'w')
# let's say we want to find out which country makes the best cars
MiniMapReduce.process do
# read every line from the data file, and use it as seed data
seed { l = data.gets ? JSON.parse(l) : nil }
# map data to emit scores by country (essentially a group by)
map do |car|
emit(car['country'],
:score => car['score'],
:votes => car['votes'],
:average_sum => car['score'] / car['votes'],
:count => 1)
end
# reduce per country
# cars is an Array of the objects emitted above
# this block is executed for each value of country
reduce do |country, cars|
cars.reduce do |a,b|; {
:score => a[:score] + b[:score],
:votes => a[:votes] + b[:votes],
:average_sum => a[:count] + b[:count],
:count => a[:average_sum] + b[:average_sum]
}
end
end
# calculate the final averages and cleanup the output
translate do |id, result|
result[:average_raw] = result[:score] / result[:votes]
result[:average_equalized] = result.delete(:average_sum) / result[:count]
result[:id] = id
result
end
# dump the results to disk
dump {|id,r| out.puts(r.to_json) }
end
data.close; out.close
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request