# Map + Reduce in Python

We can learn about the basic principles of Spark by practicing much of them in basic Python. The ideas we're going to cover here are: 

1. Laziness
2. The functions `map` and `reduce`

For some, this will be a short review. For others, this will be a challenge in-and-of-itself. That's OK, because it's best to understand the principles here before moving on to Spark.

## Map

In [None]:
def squarer(a):
    return a**2

ages = [13,14,15]

# We learned to map with list comprehensions
[squarer(a) for a in ages]

In [None]:
# Now we introduce a new function: map
# what does it return?

map(squarer, ages)

In [None]:
# the map function returns an instance of the "map"
# class. That's not very helpful. What can we do with that
# class?
# Well, it's an iterable, so we can iterate over it: 

for i in map(squarer, ages):
    print(i)

**`map` is lazy**

So why did map not just return a list? The key is that map is LAZY. Laziness is a fundamental concept in much of computer science and functional programming and Spark is based around this concept. 

Laziness means that Python will avoid doing the work until it has to. This means the work of multiplying the numbers isn't actually done until you try to access the result! Then Python goes and does the work. Lazy!

Let's see what laziness can accomplish

In [None]:
# I'm going to create a function that returns a "generator". 
# This is also a new concept, but for now, all you need
# to know is that a generator is a lazy iterable. It's like 
# a list, but it doesn't store the value until someone demands
# to see the value. 
#
# Using a generator, we can create an infinite iterable of numbers: 

def ages():
    prev = 0
    while True:
        prev += 1
        yield prev


ages_genny = ages()
ages_genny

In [None]:
# try to print out all the ages in ages:
# (hint: dont forget how to interrupt your kernel!)

for i in ages_genny:
    print(i**2)

In [None]:
# So we can't square everything in our infinite iterable.
# But we can map it: 

squared_ages = map(squarer, ages())

In [None]:
# now we have an infinite iterable of squared numbers.
# how can we look at the first few numbers? 
#
#
from itertools import islice

# this is kinda like "head" in the command line!
# here we tell it to just get the first few items
# and then we print them out.
for i in islice(squared_ages, 0, 10):
    print(i)

## Reduce

In [None]:
# We previously introduced a common pattern to perform reductions 
# in Python. It had the following components:

def my_sum(nums):

    # create accumulator, this was sometimes the first
    # item in the iterable or sometimes manually specified:
    acc = 0

    # loop over the iterator
    for n in nums:

        # perform some logic which updates the accumulator
        # from each element in the iterable
        acc += n

    # return the accumulator
    return acc

my_sum([1,2,3])

In [None]:
# Because there's a clear pattern, we can abstract the component
# parts into arguments and wrap it up into a function. That's
# what "reduce" does:

from functools import reduce


def summer(acc, nxt):
    # logic to update the accumulator
    # from each "nxt" element in the iterable
    return acc + nxt

# here we specify "0" as the initial value of the accumulator.
# note that this is an optional argument, if we want the value 
# to be the first element of the list, we just leave it off

reduce(summer, [1,2,3], 0)

In [None]:
# Note that we can, of course, combine our map + reduce. 
# This is a basic pattern we have already done while learning
# Python and Pandas:

reduce(summer, map(squarer, [1,2,3]))

## Map + Reduce

Let's practice combining map + reduce in Python. We've done this a lot already, but now we'll practice with our new functions. 

In [None]:
import pandas as pd

# Everyone's favorite orders data:
orders = pd.read_csv('data/orders.csv').to_dict(orient='records')

In [None]:
# Exercise 1

# Get the total sales (the quantity_ordered times the price of each item)
# of the company's entire history of sales. 

# Steps: 

# 1. How can you formulate this as a map + reduce? 

# 2. Work backwards. Start with the final reduce. Ask yourself: "if I had data
# in the form of ___, then this would be easy". Then figure out how to get the
# data in the form of ___. Sketch out your solution in comments.

# 3. Implement your solution

In [None]:
# Exercise 2

# make a "map" and a "reduce" that gets the max number of items ordered (in a single line item)!

In [None]:
# Sometimes our data comes in JSON and it might be nested!

import json

with open('data/orders.json') as f:
    orders = [json.loads(l) for l in f]

orders

In [None]:
# Exercise 3

# Find the sum of the total number of sales, via map+reduce, 
# with the new, nested format.
