Permalink
Browse files

Fix up readme

  • Loading branch information...
1 parent 92f5f57 commit 700e8393fdc23ae76768d008589b66a2654cfed2 @michaelfairley committed Feb 26, 2012
Showing with 74 additions and 11 deletions.
  1. +0 −11 README
  2. +74 −0 README.md
View
@@ -1,11 +0,0 @@
-mincemeat.py: MapReduce on Python
-
-mincemeat.py is licensed under the MIT license. See LICENSE for more details
-
-
-Run the example with:
-python example.py
-
-and a worker with:
-python mincemeat.py -p changeme [server address]
-
View
@@ -0,0 +1,74 @@
+mincemeat.py: MapReduce on Python
+=================================
+
+Introduction
+------------
+mincemeat.py is a Python implementation of the [MapReduce](http://en.wikipedia.org/wiki/Mapreduce) distributed computing framework.
+
+mincemeat.py is:
+- Lightweight - All of the code is contained in a single Python file (currently weighing in at <13kB) that depends only on the Python Standard Library. Any computer with Python and mincemeat.py can be a part of your cluster.
+- Fault tolerant - Workers (clients) can join and leave the cluster at any time without affecting the entire process.
+- Secure - mincemeat.py authenticates both ends of every connection, ensuring that only authorized code is executed.
+- Open source - mincemeat.py is distributed under the [MIT License](http://en.wikipedia.org/wiki/Mit_license), and consequently is free for all use, including commercial, personal, and academic, and can be modified and redistributed without restriction.
+
+
+Download
+--------
+
+- Just [mincemeat.py](https://raw.github.com/michaelfairley/mincemeatpy/master/mincemeat.py) (v 0.1.2)
+- The full 0.1.2 release (includes documentation and examples)
+- Clone this git repository: `https://github.com/michaelfairley/mincemeatpy.git`
+
+Example
+-------
+
+Let's look at the canonical MapReduce example, word counting:
+
+example.py:
+```python
+#!/usr/bin/env python
+import mincemeat
+
+data = ["Humpty Dumpty sat on a wall",
+ "Humpty Dumpty had a great fall",
+ "All the King's horses and all the King's men",
+ "Couldn't put Humpty together again",
+ ]
+
+def mapfn(k, v):
+ for w in v.split():
+ yield w, 1
+
+def reducefn(k, vs):
+ result = 0
+ for v in vs:
+ result += v
+ return result
+
+s = mincemeat.Server()
+
+# The data source can be any dictionary-like object
+s.datasource = dict(enumerate(data))
+s.mapfn = mapfn
+s.reducefn = reducefn
+
+results = s.run_server(password="changeme")
+print results
+```
+
+Execute this script on the server:
+```bash
+python example.py
+```
+
+Run mincemeat.py as a worker on a client:
+```bash
+python mincemeat.py -p changeme [server address]
+```
+And the server will print out:
+```python
+{'a': 2, 'on': 1, 'great': 1, 'Humpty': 3, 'again': 1, 'wall': 1, 'Dumpty': 2, 'men': 1, 'had': 1, 'all': 1, 'together': 1, "King's": 2, 'horses': 1, 'All': 1, "Couldn't": 1, 'fall': 1, 'and': 1, 'the': 2, 'put': 1, 'sat': 1}
+```
+
+This example was overly simplistic, but changing the datasource to be a collection of large files and running the client on multiple machines will work just as well. In fact, mincemeat.py has been used to produce a word frequency lists for >3GB of text using a slightly modified version of this code.
+

0 comments on commit 700e839

Please sign in to comment.