-
Notifications
You must be signed in to change notification settings - Fork 84
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from gatesvp/master
Map-reduce example for "pivot"
- Loading branch information
Showing
1 changed file
with
74 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
--- | ||
title: Pivot Data with Map reduce | ||
created_at: 2011-05-05 18:0:024.036546 -04:00 | ||
recipe: true | ||
author: Gaetan Voyer-Perrault | ||
description: How to use map-reduce to pivot table data. | ||
filter: | ||
- erb | ||
- markdown | ||
--- | ||
|
||
### Problem | ||
|
||
You have a collection of Actors with an array of the Movies they've done. | ||
|
||
You want to generate a collection of Movies with an array of Actors in each. | ||
|
||
Some sample data | ||
|
||
<% code 'javascript' do %> | ||
db.actors.insert( { actor: "Richard Gere", movies: ['Pretty Woman', 'Runaway Bride', 'Chicago'] }); | ||
db.actors.insert( { actor: "Julia Roberts", movies: ['Pretty Woman', 'Runaway Bride', 'Erin Brockovich'] }); | ||
<% end %> | ||
|
||
### Solution | ||
|
||
We need to loop through each movie in the Actor document and emit each Movie individually. | ||
|
||
The catch here is in the reduce phase. We cannot emit an array from the reduce phase, so we must build an Actors array inside of the "value" document that is returned. | ||
|
||
#### The code | ||
|
||
<% code 'javascript' do %> | ||
map = function() { | ||
for(var i in this.movies){ | ||
key = { movie: this.movies[i] }; | ||
value = { actors: [ this.actor ] }; | ||
emit(key, value); | ||
} | ||
} | ||
|
||
reduce = function(key, values) { | ||
actor_list = { actors: [] }; | ||
for(var i in values) { | ||
actor_list.actors = values[i].actors.concat(actor_list.actors); | ||
} | ||
return actor_list; | ||
} | ||
<% end %> | ||
|
||
Notice how actor_list is actually a javascript object that contains an array. Also notice that map emits the same structure. | ||
|
||
Run the following to execute the command and print the result: | ||
|
||
<% code 'javascript' do %> | ||
printjson(db.foo.mapReduce(map, reduce, "pivot")); | ||
db.pivot.find().forEach(printjson); | ||
<% end %> | ||
|
||
##### Reduce Step | ||
|
||
The reduce function is trivial, as it simply performs a count: | ||
|
||
<% code 'javascript' do %> | ||
reduce = "function(key, values) { | ||
var count = 0; | ||
|
||
values.forEach(function(v) { | ||
count += v['count']; | ||
}); | ||
|
||
return {count: count}; | ||
}" | ||
<% end %> |