Skip to content
mapreduce in bash
C Shell Makefile
Find file
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


bashreduce : mapreduce in a bash script

bashreduce lets you apply your favorite unix tools in a mapreduce fashion across multiple machines/cores. There’s no installation, administration, or distributed filesystem. You’ll need:

  • br somewhere handy in your path
  • gnu core utils on each machine: sort, awk, grep
  • netcat on each machine


If you wish, you may edit /etc/br.hosts and enter the machines you wish to use as workers. You can also specify this at runtime:

br -m "host1 host2 host3"

To take advantage of multiple cores, repeat the host name as many times as you wish.



br < input > output

word count

br -r "uniq -c" < input > output

count words that begin with ‘b’

br -r "grep ^b | uniq -c" < input > output


Completely spurious numbers, all this shows you is how useful br is to me :-)

I have four compute machines and I’m usually relegated to using one core on one machine to sort. How about when I use br?

command time mb/s
sort -k1,1 4gb_file > 4gb_file_sorted 82m54.746s 843 kBps
br -i 4gb_file -o 4gb_file_sorted 9m49s 6.95 MBps

When I have more time I’ll compare this apples to apples. There’s also still a ton of room for improvement.

I’m also interested in seeing how bashreduce compares to hadoop. Of course it won’t win… but how close does it come?

Something went wrong with that request. Please try again.