Skip to content

Commit

Permalink
Silly article on MapReduce vs Map Reduce
Browse files Browse the repository at this point in the history
  • Loading branch information
rathboma committed Jun 29, 2016
1 parent 5901526 commit bdd7f27
Show file tree
Hide file tree
Showing 4 changed files with 48 additions and 5 deletions.
43 changes: 43 additions & 0 deletions _posts/2016-06-21-mapreduce-or-map-reduce.md
@@ -0,0 +1,43 @@
---
title: Is it 'MapReduce' or 'Map Reduce'?
date: 2016-06-30
subject: hadoop
description: Confused about whether Map Reduce is one word or two? Let me settle this once and for all.
layout: post
published: true
image:
url: /img/question.jpg
author:
name: Beatnik Photos
url: https://www.flickr.com/photos/dharmabum1964/3108162671

---

MapReduce is a data processing methodology made popular by [Hadoop](http://hadoop.apache.org). It describes a way that multiple computational units can work together to process a large scale dataset whilst acting independantly and not depending on one another.

Should you call this technology 'MapReduce' or 'Map Reduce'? It's a question that is trivial, but common. Personally I'm very unreliable with how I describe the technology, [sometimes I write 'MapReduce'](http://blog.matthewrathbone.com/2016/01/05/experts-and-mapreduce.html), and [sometimes I write 'Map Reduce'](http://blog.matthewrathbone.com/2013/05/31/hadoop-resources-books.html).

The short version is that the correct spelling is 'MapReduce'. That is - all one word with R capitalized. You shouldn't write 'Map Reduce' or 'Map/Reduce'.

## The case for MapReduce vs Map Reduce

Google's seminal paper from 2004 is titled [*MapReduce: Simplified Data Processing on Large Clusters*](http://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf). They're very consistent about using *MapReduce* to describe the concept and nowhere in the paper do they split this into two words.

This is backed up by Google search traffic which shows *MapReduce* has a clear lead.

![Google Trends](/img/mapreduce.png)
screenshot from [Google Trends](https://www.google.com/trends/explore#q=MapReduce%2C%20Map%20Reduce&cmpt=q&tz=Etc%2FGMT%2B5)

The [Apache Hadoop website](http://hadoop.apache.org) and big Hadoop vendors like Cloudera and Hortonworks refer to it as *MapReduce* also.

## It's not all that clear

However, outside of the Hadoop ecosystem naming is less clear. MongoDB has it's own MapReduce implementation, but [it is referred to as 'Map-reduce'](https://docs.mongodb.com/manual/core/map-reduce/) (they don't even capitalize the R! \*Gasps\*).

Even back in Hadoop-land not all content has settled on the *MapReduce*. There are [several](http://ksat.me/map-reduce-a-really-simple-introduction-kloudo/) [examples](https://www.hackerrank.com/domains/distributed-systems/mapreduce-basics) of places where folks are confused, or even [use several different spellings](https://www.linkedin.com/pulse/map-reduce-tutorial-gives-brief-overview-application-agrawal).

## Wrap-up

It doesn't *really* matter of couse, but now you know -- one *MapReduce* to rule them all.

While you're here, check out [my guide to MapReduce frameworks](http://blog.matthewrathbone.com/2013/01/05/a-quick-guide-to-hadoop-map-reduce-frameworks.html)
@@ -1,6 +1,6 @@
---
layout: post
title: "Hadoop Map-Reduce Framework Tutorials with Examples"
title: "Hadoop MapReduce Framework Tutorials with Examples"
subject: hadoop
description: "A constantly expanding list of 12+ hadoop frameworks, with code examples and documentation links"
tags:
Expand All @@ -17,7 +17,7 @@ published: true

**Updated October 2015** Full sample code is available for many frameworks, see the list [at the bottom of the article](#updates)

There are a lot of frameworks for writing map-reduce pipelines for Hadoop, but
There are a lot of frameworks for writing MapReduce pipelines for Hadoop, but
it can be pretty hard to navigate everything to get a good sense of what
framework you should be using. I felt very overwhelmed when I started working
with Hadoop, and this has only gotten worse for newcomers as the number of
Expand All @@ -31,7 +31,7 @@ Generally speaking, the goal of each framework is to make building pipelines
easier than when using the basic map and reduce interface provided by hadoop-
core. This usually means the frameworks do not require you to write these
functions at all, but something more high-level that the framework can
'compile' into a pipeline of map-reduce jobs. This is particularly true for
'compile' into a pipeline of MapReduce jobs. This is particularly true for
the higher level frameworks (such as hive), which don't really require any
knowledge of programming to operate.

Expand Down Expand Up @@ -106,7 +106,7 @@ please tweet me if I have missed any: [@rathboma](http://twitter.com/rathboma)
## Framework Walkthroughs ## {#walkthrough}

I will create a separate article for each framework ( [current articles listed here](#updates) ) in which I will build a
small map-reduce pipeline to do the following:
small MapReduce pipeline to do the following:

Given two (fake) datasets:

Expand All @@ -127,7 +127,7 @@ example.
## My Commonly used Frameworks

* Hive -- Hive is amazing because anyone can query the data with a little knowledge of SQL. Hook it up to a visual query designer and you don't even need that.
* Pig -- the perfect framework for prototyping and quick-investigation. It's a simple scripting language with a bunch of powerful map-reduce specific features.
* Pig -- the perfect framework for prototyping and quick-investigation. It's a simple scripting language with a bunch of powerful MapReduce specific features.
* Scoobi -- I use this a lot to build pipelines in Scala because it's very functional, and in many way's you just treat the data like a regular list, which is great.
* Raw Map/Reduce -- Sometimes I like to program directly to the API, especially when doing something mission critical. I also find the individual map and reduce functions easier to test.

Expand Down
Binary file added img/mapreduce.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/question.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit bdd7f27

Please sign in to comment.