### References

- Mining Massive Datasets
- https://lagunita.stanford.edu/asset-v1:ComputerScience+MMDS+SelfPaced+type@asset+block@MapReduce2_TheMapReduceComputationalModel.pdf

### Exercise 4

Suppose our input data to a map-reduce operation consists of integer values (the keys are not important). The map function takes an integer i and produces the list of pairs (p,i) such that p is a prime divisor of i. For example, map(12) = [(2,12), (3,12)].
The reduce function is addition. That is, reduce(p, [i1, i2, ...,ik]) is (p,i1+i2+...+ik).

Compute the output, if the input is the set of integers 15, 21, 24, 30, 49. Then, identify, in the list below, one of the pairs in the output.

 - (3,90)
 - (3,75)
 - (5,30)
 - (2,75)

In [1]:
from math import ceil, sqrt

# Get the prime divisors of a number.
# This function come from http://codereview.stackexchange.com/questions/19509/functional-prime-factor-generator
def factor(n):
    if n <= 1:
        return []
    
    prime = next((x for x in range(2, ceil(sqrt(n))+1) if n%x == 0), n)
#     print(n, prime)
    result = [prime] + factor(n//prime)
#     print(result)
    # This algorithm return duplicated values. Let's return an array of unique values
    result = list(set(result))
    return result

# Set the list of integer to process
integers = [15, 21, 24, 30, 49]

# Store pairs
pairs = []

# Map function creating every pairs
def map(n):
    factors = factor(n)
    map_keys = []
    
    for num in factors:
        map_keys.append((num, n))
        
    return map_keys

# Reduce function summing the values of every keys
def reduce(pairs):
    result = dict()
    last_key = None
    
    for pair in pairs:
        
        key = pair[0]
        if key == last_key:
            result[key] += pair[1]
            
        else:
            result[key] = pair[1]
        last_key = key
        
    return result

# Create pairs by calling map function for every integers
for integer in integers:
    tuples = map(integer)
    for single_tuple in tuples:
        pairs.append(single_tuple)

# Sort pairs by key
pairs = sorted(pairs, key=lambda key: key[0])

# Reduce pairs to get the results
reduced_pairs = reduce(pairs)

# Print results
print("3,90: {0}".format(reduced_pairs[3] == 90))
print("3,75: {0}".format(reduced_pairs[3] == 75))
print("5,30: {0}".format(reduced_pairs[5] == 30))
print("7,70: {0}".format(reduced_pairs[7] == 70))


3,90: True
3,75: False
5,30: False
7,70: True
