# Development of Map Reduce APIs

* Develop myFilter
* Validate myFilter Function
* Develop myMap
* Validate myMap Function
* Develop myReduce
* Validate myReduce
* Develop myReduceByKey
* Validate myReduceByKey
* Exercises

## Develop myFilter

Develop a function by name myFilter which takes a collection and a function as arguments. Function should do the following:
* Iterate through elements
* Apply the condition using the argument passed. We might pass named function or lambda function.
* Return the collection with all the elements satisfying the condition.

In [None]:
def myFilter(c, f):
    c_f = []
    for e in c:
        if f(e):
            c_f.append(e)
    return c_f

## Validate myFilter Function

Use the same examples which were used before as part of Processing Collections using loops.
* Read orders data

In [None]:
orders_path = "/Users/itversity/Research/data/retail_db/orders/part-00000"
orders = open(orders_path). \
    read(). \
    splitlines()

In [None]:
orders[:10]

In [None]:
order = '1,2013-07-25 00:00:00.0,11599,CLOSED'
int(order.split(',')[2]) == 11599

* Get orders placed by customer id 12431

In [None]:
customer_orders = myFilter(orders, 
                           lambda order: int(order.split(',')[2]) == 12431
                          )

In [None]:
customer_orders

* Get orders placed by customer id 12431 in the month of 2014 January

In [None]:
customer_orders_for_month = myFilter(orders, 
                           lambda order: int(order.split(',')[2]) == 12431
                                     and order.split(',')[1].startswith('2014-01')
                          )
customer_orders_for_month

* Get orders placed by customer id 12431 in processing or pending_payment for the month of 2014 January

In [None]:
customer_orders_for_month = myFilter(orders, 
                           lambda order: int(order.split(',')[2]) == 12431
                                     and order.split(',')[1].startswith('2014-01')
                                     and order.split(',')[3] in ('PENDING_PAYMENT', 'PROCESSING')
                          )

In [None]:
customer_orders_for_month

## Develop myMap

Develop a function by name myMap which takes a collection and a function as arguments. Function should do the following:
* Iterate through elements.
* Apply the transformation logic using the argument passed. Append the transformed record to the list.
* Return the collection with all the elements which are transformed based on the logic passed.
* We will also validate the function using a simple list of integers.

In [None]:
def myMap(c, f):
    c_t = []
    for e in c:
        c_t.append(f(e))
    return c_t

In [None]:
l = list(range(1, 10))
l

In [None]:
myMap(l, lambda e: e * e)

In [None]:
def myMap(c, f):
    return [f(e) for e in c]

In [None]:
l = list(range(1, 10))
l

In [None]:
myMap(l, lambda e: e * e)

## Validate myMap Function

Let us validate the function by using some realistic examples.

* Use orders and extract order_dates. Also apply set and get only unique dates.

In [None]:
orders_path = "/Users/itversity/Research/data/retail_db/orders/part-00000"
orders = open(orders_path). \
    read(). \
    splitlines()

In [None]:
orders[:10]

In [None]:
order = '1,2013-07-25 00:00:00.0,11599,CLOSED'
order.split(',')[1]

In [None]:
order_dates = myMap(orders,
                    lambda order: order.split(',')[1]
                   )
order_dates[:10]

In [None]:
len(orders)

In [None]:
len(order_dates)

In [None]:
set(order_dates)

In [None]:
len(set(order_dates))

* Use orders and extract order_id as well as order_date from each element in the form of a tuple. Make sure that order_id is of type int.

In [None]:
orders_path = "/Users/itversity/Research/data/retail_db/orders/part-00000"
orders = open(orders_path). \
    read(). \
    splitlines()

In [None]:
orders[:10]

In [None]:
[(1, '2013-07-25 00:00:00.0'), (2, '2013-07-25 00:00:00.0')]

In [None]:
[(1, '2013-07-25 00:00:00.0'), (2, '2013-07-25 00:00:00.0')]

In [None]:
order_tuples = myMap(orders,
                     lambda order: (int(order.split(',')[0]), order.split(',')[1])
                    )

In [None]:
order_tuples[:10]

## Develop myReduce

Develop a function by name myReduce which takes a collection and a function as arguments. Function should do the following:
* Iterate through elements
* Perform aggregation operation using the argument passed. Argument should have necessary arithmetic logic.
* Return the aggregated result.

In [None]:
l = [1, 4, 6, 2, 5]

In [None]:
l[1:]

In [None]:
def myReduce(c, f):
    t = c[0]
    for e in c[1:]:
        t = f(t, e)
    return t

In [None]:
myReduce(l, lambda t, e: t + e)

In [None]:
myReduce(l, lambda t, e: t * e)

In [None]:
min(7, 5)

In [None]:
myReduce(l, lambda t, e: min(t, e))

In [None]:
myReduce(l, lambda t, e: max(t, e))

## Validate myReduce Function

Let us validate myReduce Function.
* Compute order revenue for a given order id using order_items.
* We will use myFilter to filter for the order items for the given order id.
* Use myMap to extract order_item_subtotal. We will also convert data type of order_item_subtotal.
* We can now compute order_revenue using myReduce Function.

In [None]:
order_items_path = "/Users/itversity/Research/data/retail_db/order_items/part-00000"
order_items = open(order_items_path). \
    read(). \
    splitlines()

In [None]:
order_items[:10]

In [None]:
order_item = '2,2,1073,1,199.99,199.99'
int(order_item.split(',')[1]) == 2

In [None]:
order_items_filtered = myFilter(order_items,
                                lambda order_item: int(order_item.split(',')[1]) == 2
                               )

In [None]:
order_items_filtered

In [None]:
order_item = '2,2,1073,1,199.99,199.99'
float(order_item.split(',')[4])

In [None]:
order_item_subtotals = myMap(order_items_filtered,
                             lambda order_item: float(order_item.split(',')[4])
                            )

In [None]:
order_item_subtotals

In [None]:
sum(order_item_subtotals)

In [None]:
myReduce(order_item_subtotals, lambda t, e: t + e)

In [None]:
myReduce(order_item_subtotals, lambda t, e: min(t, e))

## Develop myReduceByKey
Develop a function by name myReduceByKey which takes a collection of tuples and a function as arguments. Each element in the collection should have exactly 2 attributes. Function should do the following:
* Iterate through the collection of tuples.
* Group the data by first element in the collection of tuples and apply the function using the argument passed. Argument should have necessary arithmetic logic.
* Return a collection of tuples, where first element is unique and second element is aggregated result.

In [None]:
d = {}
d[2] = 199.99

In [None]:
d

In [None]:
if 2 in d: d[2] = d[2] + 250.0

In [None]:
if 4 in d: d[4] = d[4] + 100
else: d[4] = 100

In [None]:
d

In [None]:
def myReduceByKey(c_p, f):
    d = {}
    for e in c_p:
        if e[0] in d:
            d[e[0]] = f(d[e[0]], e[1])
        else:
            d[e[0]] = e[1]
    return list(d.items())

## Validate myReduceKey Function

Let us perform few tasks to validate myReduceKey Function.
* Use the function to get the count by date from orders.

In [None]:
orders_path = "/Users/itversity/Research/data/retail_db/orders/part-00000"
orders = open(orders_path). \
    read(). \
    splitlines()

In [None]:
orders[:10]

In [None]:
orders_map = myMap(orders, 
                   lambda order: (order.split(',')[1], 1)
                  )
orders_map[:10]

In [None]:
order_count_by_date = myReduceByKey(orders_map, 
                                    lambda t, e: t + e
                                   )

In [None]:
order_count_by_date[:10]

* Use the function to get the revenue for each order id.

In [None]:
order_items_path = "/Users/itversity/Research/data/retail_db/order_items/part-00000"
order_items = open(order_items_path). \
    read(). \
    splitlines()

In [None]:
order_items[:10]

In [None]:
order_items_map = myMap(order_items,
                        lambda order_item: (int(order_item.split(',')[1]),
                                            float(order_item.split(',')[4])
                                           )
                       )

In [None]:
order_items_map[:10]

In [None]:
revenue_per_order = myReduceByKey(order_items_map,
                                  lambda t, e: round(t + e, 2)
                                 )

In [None]:
revenue_per_order[:10]

In [None]:
myReduceByKey(order_items_map,
              lambda t, e: min(t, e)
             )[:10]

* Use the function to get the revenue as well as the number of items for each order id.

In [None]:
order_items_map = myMap(order_items,
                        lambda order_item: (int(order_item.split(',')[1]),
                                            (float(order_item.split(',')[4]), 1)
                                           )
                       )

In [None]:
order_items_map[:10]

In [None]:
[2, [(199.99, 1), (250.0, 1), (129.99, 1)]]

In [None]:
t1 = (199.99, 1)
t2 = (250.0, 1)
(t1[0] + t2[0], t1[1] + t2[1])

In [None]:
myReduceByKey(order_items_map,
              lambda t, e: (round(t[0] + e[0], 2), t[1] + e[1])
             )[:10]

## Exercises
Here are the same exercises which you have solved before. Try to solve these using mapReduce APIs.
* We will provide you a python script which will have all the above map reduce APIs. Use it as package and solve the below mentioned problems.
* Create a file with name mymapreduce.py
* Import and use it from mymapreduce import *.

In [None]:
def myFilter(c, f):
   c_f = []
   for e in c:
       if(f(e)):
           c_f.append(e)
   return c_f

def myMap(c, f):
   c_f = []
   for e in c:
       c_f.append(f(e))
   return c_f

def myReduce(c, f):
   t = c[0]
   for e in c[1:]:
       t = f(t, e)
   return t

def myReduceByKey(p, f):
   p_f = {}
   for e in p:
       if(e[0] in p_f):
           p_f[e[0]] = f(p_f[e[0]], e[1])
       else:
           p_f[e[0]] = e[1]
   return list(p_f.items())

* Get number of COMPLETE orders placed by each customer
* Get total number of PENDING or PENDING_PAYMENT orders for the month of 2014 January.
* Get outstanding amount for each month considering orders with status PAYMENT_REVIEW, PENDING, PENDING_PAYMENT and PROCESSING.