# Counting Words Serially

For this exercise, write a program that serially counts the number of occurrences of each word in the book Alice in Wonderland. The text of Alice in Wonderland will be fed into your program line-by-line.

Your program needs to take each line and do the following:

1. Tokenize the line into string tokens by whitespace. For Example: `Hello, World!` should be converted into `Hello`,  and `World!` (This part has been done for you.)
2. Remove all punctuation. Example: `Hello`, and `World!` should be converted into `Hello` and `World`
3. Make all letters lowercase. Example: `Hello` and `World` should be converted to `hello` and `world`

Store the the number of times that a word appears in Alice in Wonderland in the `word_counts` dictionary, and then *print* (don't return) that dictionary.

In this exercise, print statements will be considered your final output. Because of this, printing a debug statement will cause the grader to break. Instead, you can use the logging module which we've configured for you.

For example:
```
logging.info("My debugging message")
```

The logging module can be used to give you more control over your debugging or other messages than you can get by printing them. Messages  logged via the logger we configured will be saved to a file. If you click "Test Run", then you will see the contents of that file once your program has finished running.

In [9]:
import logging
import sys
import string

from util import logfile

logging.basicConfig(filename=logfile, format='%(message)s', level=logging.INFO, filemode='w')

def word_count():
    word_counts = {}
    for line in sys.stdin:
        data = line.strip().split(" ")
        for word in data:
            key = word.translate(string.maketrans("", ""), string.punctuation).lower()
            if word_counts.has_key(word):
                word_counts[word] += 1
            else:
                word_counts[word] = 1

    logging.info(word_counts)

word_count()

# Mapper and Reducer with Aadhaar Data

Each line will be a comma-separated list of values. The header row WILL be included. Tokenize each row using the commas, and emit (i.e. print) a key-value pair containing the  district (not state) and Aadhaar generated, separated by a tab.  Skip rows without the correct number of tokens and also skip the header row.

Since you are printing the output of your program, printing a debug  statement will interfere with the operation of the grader. Instead,  use the logging module, which we've configured to log to a file printed when you click "Test Run". 

For example: 
```
logging.info("My debugging message")
```

In [None]:
# Mapper

import sys
import string
import logging

from util import mapper_logfile
logging.basicConfig(filename=mapper_logfile, format='%(message)s',
                    level=logging.INFO, filemode='w')

def mapper():
    for line in sys.stdin:
        data = line.strip().split(",")
        if len(data) != 12 or data[0] == 'Registrar':
            continue
        print '{0}\t{1}'.format(data[3], data[8])

mapper()

In [10]:
# Reducer

def reducer():
    aadhar_generated = 0
    old_key = None

    for line in sys.stdin:
        data = line.strip().split('\t')
    
        if len(data) != 2:
            continue
        
        this_key, count = data
        if old_key and old_key != this_key:
            print '{0}\t{1}'.format(old_key, aadhar_generated)
            aadhar_generated = 0
        
        old_key = this_key
        aahar_generated += float(count)
        
        if old_key != None:
            print '{0}\t{1}'.format(old_key, aadhar_generated)
        

reducer()