Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
mapreduce in bash
tree: d8b03117e2

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.


bashreduce : mapreduce in a bash script

bashreduce lets you apply your favorite unix tools in a mapreduce fashion across multiple machines/cores. There’s no installation, administration, or distributed filesystem. You’ll need:

  • br somewhere handy in your path
  • gnu core utils on each machine: sort, awk, grep
  • netcat on each machine


If you wish, you may edit /etc/br.hosts and enter the machines you wish to use as workers. You can also specify this at runtime:

br -m "host1 host2 host3"

To take advantage of multiple cores, repeat the host name as many times as you wish.



br < input > output

word count

br -r "uniq -c" < input > output

count words that begin with ‘b’

br -r "grep ^b | uniq -c" < input > output


Completely spurious numbers, all this shows you is how useful br is to me :-)

I have four compute machines and I’m usually relegated to using one core on one machine to sort. How about when I use br?

command time transfer rate
sort -k1,1 4gb_file > 4gb_file_sorted 82m54.746s 843 kBps
br -i 4gb_file -o 4gb_file_sorted using brp 6m5.968s 11.19 MBps

When I have more time I’ll compare this apples to apples. There’s also still a ton of room for improvement.

I’m also interested in seeing how bashreduce compares to hadoop. Of course it won’t win… but how close does it come?

Something went wrong with that request. Please try again.