# Map Reduce in Practice

First thing you will need to do:
`conda install -c conda-forge mrjob`

Next, read this: https://pythonhosted.org/mrjob/guides/quickstart.html

Let's say we have the following code in `review_word_count.py`.

In [2]:
from mrjob.job import MRJob
from mrjob.step import MRStep
from mrjob.protocol import JSONValueProtocol

In [3]:
import re

WORD_RE = re.compile(r"[\w']+")

In [None]:
class ReviewWordCount(MRJob):
    INPUT_PROTOCOL = JSONValueProtocol

    def extract_words(self, _, review):
        """Extract words using a regular expression.  Normalize the text to
        ignore capitalization."""
        for word in WORD_RE.findall(review['text']):
            yield (word.lower(), 1)

    def count_words(self, word, counts):
        """Summarize all the counts by taking the sum."""
        yield (word, sum(counts))

    def steps(self):
        return [MRStep(mapper=self.extract_words,
                       reducer=self.count_words)]

if __name__ == '__main__':
    ReviewWordCount.run()


In [7]:
! python review_word_count.py data/review.json

No configs found; falling back on auto-configuration
Creating temp directory /var/folders/gz/dyzj7rns5bqd0tv_lb5kq1m80000gn/T/review_word_count.markcrovella.20161201.024225.483349
Running step 1 of 1...
Streaming final output from /var/folders/gz/dyzj7rns5bqd0tv_lb5kq1m80000gn/T/review_word_count.markcrovella.20161201.024225.483349/output...
"'"	49
"''"	3
"''tourist"	1
"'05"	1
"'07"	1
"'50s"	1
"'70"	1
"'70s"	1
"'71"	1
"'79"	1
"'7s"	1
"'8'"	1
"'90s"	1
"'95"	1
"'96"	1
"'99"	1
"'9am"	1
"'a"	2
"'almost"	1
"'am"	1
"'and"	1
"'angry"	2
"'app'"	1
"'are"	1
"'art'"	1
"'ask'"	1
"'authentic'"	3
"'back"	1
"'bad"	1
"'barista'"	1
"'batman'"	1
"'beer"	1
"'behavior'"	1
"'bent"	1
"'best'"	1
"'blackbird'"	1
"'blah'"	1
"'block"	1
"'blue"	1
"'bout"	1
"'browsing'"	1
"'bucks"	1
"'buffaloed'"	1
"'buffet'"	1
"'build"	1
"'burbs"	1
"'burgh"	56
"'burgh's"	1
"'burghers"	1
"'bustling"	1
"'butter"	1
"'buy"	1
"'cambodian'"	1
"'cause"	5
"'chat'"	1
"'cheese"	1
"'cheeseburger"	1
"'cheesesteak'"	1
"'chico'"	1
"'chorizo'"

Now, we will be using Amazon Web Services.  This is a cloud service that allows you to "rent" a virtual machine.  You can rent a machine (or even a cluster) that has Hadoop already installed on it.

To get started, apply for an account here:

Then, read this (carefully!): https://pythonhosted.org/mrjob/guides/emr-quickstart.html#amazon-setup

Then, to run your job on Amazon machines, read this: https://pythonhosted.org/mrjob/guides/runners.html#running-on-emr.

Using MapReduce on Amazon Web Services is called "Elastic Map Reduce" (EMR).    The basic idea is that you allocate a cluster, upload your code to the master 

It is extremely important that you apply for an "AWS Educate" account right away to get $100 free credit, and verify that the credit has been properly applied to your account.

* Go to https://www.awseducate.com/Application
* Choose **Student** and click **Next**
* Fill out the application

We recommend selecting an AWS Educate Starter Account. If, instead, you want to use an AWS Account ID, it is important that you set up a billing alarm on AWS to notify you when your credits are running low.
