Lightweight MapReduce in python
Switch branches/tags
Nothing to show
Pull request Compare This branch is 20 commits behind michaelfairley:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
LICENSE MapReduce on Python

Introduction is a Python implementation of the MapReduce distributed computing framework. is:

  • Lightweight - All of the code is contained in a single Python file (currently weighing in at <13kB) that depends only on the Python Standard Library. Any computer with Python and can be a part of your cluster.
  • Fault tolerant - Workers (clients) can join and leave the cluster at any time without affecting the entire process.
  • Secure - authenticates both ends of every connection, ensuring that only authorized code is executed.
  • Open source - is distributed under the MIT License, and consequently is free for all use, including commercial, personal, and academic, and can be modified and redistributed without restriction.


  • Just (v 0.1.2)
  • The full 0.1.2 release (includes documentation and examples)
  • Clone this git repository: git clone


Let's look at the canonical MapReduce example, word counting:

#!/usr/bin/env python
import mincemeat

data = ["Humpty Dumpty sat on a wall",
        "Humpty Dumpty had a great fall",
        "All the King's horses and all the King's men",
        "Couldn't put Humpty together again",

def mapfn(k, v):
    for w in v.split():
        yield w, 1

def reducefn(k, vs):
    result = 0
    for v in vs:
        result += v
    return result

s = mincemeat.Server()

# The data source can be any dictionary-like object
s.datasource = dict(enumerate(data))
s.mapfn = mapfn
s.reducefn = reducefn

results = s.run_server(password="changeme")
print results

Execute this script on the server:


Run as a worker on a client:

python -p changeme [server address]

And the server will print out:

{'a': 2, 'on': 1, 'great': 1, 'Humpty': 3, 'again': 1, 'wall': 1, 'Dumpty': 2, 'men': 1, 'had': 1, 'all': 1, 'together': 1, "King's": 2, 'horses': 1, 'All': 1, "Couldn't": 1, 'fall': 1, 'and': 1, 'the': 2, 'put': 1, 'sat': 1} 

This example was overly simplistic, but changing the datasource to be a collection of large files and running the client on multiple machines will work just as well. In fact, has been used to produce a word frequency lists for many gigabytes of text using a slightly modified version of this code.