diff --git a/_posts/2016-06-21-mapreduce-or-map-reduce.md b/_posts/2016-06-21-mapreduce-or-map-reduce.md new file mode 100644 index 0000000..6718026 --- /dev/null +++ b/_posts/2016-06-21-mapreduce-or-map-reduce.md @@ -0,0 +1,43 @@ +--- +title: Is it 'MapReduce' or 'Map Reduce'? +date: 2016-06-30 +subject: hadoop +description: Confused about whether Map Reduce is one word or two? Let me settle this once and for all. +layout: post +published: true +image: + url: /img/question.jpg + author: + name: Beatnik Photos + url: https://www.flickr.com/photos/dharmabum1964/3108162671 + +--- + +MapReduce is a data processing methodology made popular by [Hadoop](http://hadoop.apache.org). It describes a way that multiple computational units can work together to process a large scale dataset whilst acting independantly and not depending on one another. + +Should you call this technology 'MapReduce' or 'Map Reduce'? It's a question that is trivial, but common. Personally I'm very unreliable with how I describe the technology, [sometimes I write 'MapReduce'](http://blog.matthewrathbone.com/2016/01/05/experts-and-mapreduce.html), and [sometimes I write 'Map Reduce'](http://blog.matthewrathbone.com/2013/05/31/hadoop-resources-books.html). + +The short version is that the correct spelling is 'MapReduce'. That is - all one word with R capitalized. You shouldn't write 'Map Reduce' or 'Map/Reduce'. + +## The case for MapReduce vs Map Reduce + +Google's seminal paper from 2004 is titled [*MapReduce: Simplified Data Processing on Large Clusters*](http://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf). They're very consistent about using *MapReduce* to describe the concept and nowhere in the paper do they split this into two words. + +This is backed up by Google search traffic which shows *MapReduce* has a clear lead. + +![Google Trends](/img/mapreduce.png) +screenshot from [Google Trends](https://www.google.com/trends/explore#q=MapReduce%2C%20Map%20Reduce&cmpt=q&tz=Etc%2FGMT%2B5) + +The [Apache Hadoop website](http://hadoop.apache.org) and big Hadoop vendors like Cloudera and Hortonworks refer to it as *MapReduce* also. + +## It's not all that clear + +However, outside of the Hadoop ecosystem naming is less clear. MongoDB has it's own MapReduce implementation, but [it is referred to as 'Map-reduce'](https://docs.mongodb.com/manual/core/map-reduce/) (they don't even capitalize the R! \*Gasps\*). + +Even back in Hadoop-land not all content has settled on the *MapReduce*. There are [several](http://ksat.me/map-reduce-a-really-simple-introduction-kloudo/) [examples](https://www.hackerrank.com/domains/distributed-systems/mapreduce-basics) of places where folks are confused, or even [use several different spellings](https://www.linkedin.com/pulse/map-reduce-tutorial-gives-brief-overview-application-agrawal). + +## Wrap-up + +It doesn't *really* matter of couse, but now you know -- one *MapReduce* to rule them all. + +While you're here, check out [my guide to MapReduce frameworks](http://blog.matthewrathbone.com/2013/01/05/a-quick-guide-to-hadoop-map-reduce-frameworks.html) \ No newline at end of file diff --git a/_posts/tumblr/2013-01-05-a-quick-guide-to-hadoop-map-reduce-frameworks.md b/_posts/tumblr/2013-01-05-a-quick-guide-to-hadoop-map-reduce-frameworks.md index 1cbf53a..c774f8d 100644 --- a/_posts/tumblr/2013-01-05-a-quick-guide-to-hadoop-map-reduce-frameworks.md +++ b/_posts/tumblr/2013-01-05-a-quick-guide-to-hadoop-map-reduce-frameworks.md @@ -1,6 +1,6 @@ --- layout: post -title: "Hadoop Map-Reduce Framework Tutorials with Examples" +title: "Hadoop MapReduce Framework Tutorials with Examples" subject: hadoop description: "A constantly expanding list of 12+ hadoop frameworks, with code examples and documentation links" tags: @@ -17,7 +17,7 @@ published: true **Updated October 2015** Full sample code is available for many frameworks, see the list [at the bottom of the article](#updates) -There are a lot of frameworks for writing map-reduce pipelines for Hadoop, but +There are a lot of frameworks for writing MapReduce pipelines for Hadoop, but it can be pretty hard to navigate everything to get a good sense of what framework you should be using. I felt very overwhelmed when I started working with Hadoop, and this has only gotten worse for newcomers as the number of @@ -31,7 +31,7 @@ Generally speaking, the goal of each framework is to make building pipelines easier than when using the basic map and reduce interface provided by hadoop- core. This usually means the frameworks do not require you to write these functions at all, but something more high-level that the framework can -'compile' into a pipeline of map-reduce jobs. This is particularly true for +'compile' into a pipeline of MapReduce jobs. This is particularly true for the higher level frameworks (such as hive), which don't really require any knowledge of programming to operate. @@ -106,7 +106,7 @@ please tweet me if I have missed any: [@rathboma](http://twitter.com/rathboma) ## Framework Walkthroughs ## {#walkthrough} I will create a separate article for each framework ( [current articles listed here](#updates) ) in which I will build a -small map-reduce pipeline to do the following: +small MapReduce pipeline to do the following: Given two (fake) datasets: @@ -127,7 +127,7 @@ example. ## My Commonly used Frameworks * Hive -- Hive is amazing because anyone can query the data with a little knowledge of SQL. Hook it up to a visual query designer and you don't even need that. -* Pig -- the perfect framework for prototyping and quick-investigation. It's a simple scripting language with a bunch of powerful map-reduce specific features. +* Pig -- the perfect framework for prototyping and quick-investigation. It's a simple scripting language with a bunch of powerful MapReduce specific features. * Scoobi -- I use this a lot to build pipelines in Scala because it's very functional, and in many way's you just treat the data like a regular list, which is great. * Raw Map/Reduce -- Sometimes I like to program directly to the API, especially when doing something mission critical. I also find the individual map and reduce functions easier to test. diff --git a/img/mapreduce.png b/img/mapreduce.png new file mode 100644 index 0000000..ab4b9b3 Binary files /dev/null and b/img/mapreduce.png differ diff --git a/img/question.jpg b/img/question.jpg new file mode 100644 index 0000000..0f5fea6 Binary files /dev/null and b/img/question.jpg differ