Chapter 5. Accumulation operations with Reduce
====
### Mastering Large Datasets with Python by JT Wolohan 



### Early chapter functions: Frequnecy and filter

In [None]:
from functools import reduce

xs = [1, 2, 3, 4, 5, 6, 7, 8, 9]

def keep_if_even(acc, nxt):
    if nxt % 2 == 0:
        return acc + [nxt]
    else:
        return acc


reduce(keep_if_even, xs, [])


In [None]:
from functools import reduce

xs = ["A", "B", "C", "A", "A", "C", "A"]
ys = [1, 3, 6, 1, 2, 9, 3, 12]


def make_counts(acc, nxt):
    acc[nxt] = acc.get(nxt, 0) + 1
    return acc


def my_frequencies(xs):
    return reduce(make_counts, xs, {})


print(my_frequencies(xs))
print(my_frequencies(ys))
print(my_frequencies("mississippi"))

### Analyzing car trends with reduce

**SCENARIO:  CHANGING  CAR  TRENDS** *Your  customer  is  a used  car  dealer.  They  have data on cars that they’ve bought and sold in the last 6 months and are hoping you can help them find what type of used cars they make the most profit on. One salesman believes that its high fuel-efficiency cars (those that get more than 35 miles per gallon) that make the most  money,  while  another believe  that  medium-mileage  cars  (between  60,000  and 100,000 miles) result in the highest average profit on resale. Given a CSV file with a variety of attributes about some used cars, write a script to find the average profit on cars of low (<18 mpg), medium (18-35 mpg) and high (>35) fuel-efficiency as well as low (<60,000), medium (60,000-100,000), and high mileage (>100,000) and settle the debate.*

In [None]:
from functools import reduce

def low_med_hi(d, k, breaks):
    if float(d[k]) < breaks[0]:
        return "low"
    elif float(d[k]) < breaks[1]:
        return "medium"
    else:
        return "high"

In [None]:
def clean_entry(d):
    r = {'profit':None, 'mpg':None, 'odo':None}
    r['profit'] = float(d.get("price-sell", 0)) - float(d.get("price-buy", 0))
    r['mpg'] = low_med_hi(d, 'mpg', (18, 35))
    r['odo'] = low_med_hi(d, 'odo', (60000, 105000))
    return r

In [None]:
def acc_average(acc, profit):
    acc['total'] = acc.get('total', 0) + profit
    acc['count'] = acc.get('count', 0) + 1
    acc['average'] = acc['total']/acc['count']
    return acc

In [None]:
def sort_and_add(acc, nxt):
    p = nxt['profit']
    acc['mpg'][nxt['mpg']] = acc_average(acc['mpg'].get(nxt['mpg'], {}), p)
    acc['odo'][nxt['odo']] = acc_average(acc['odo'].get(nxt['odo'], {}), p)
    return acc

In [None]:
import json
with open("../Ch05/cars.json") as f:
    xs = json.load(f)
results = reduce(sort_and_add, map(clean_entry, xs), {"mpg": {}, "odo": {}})
print(json.dumps(results, indent=4))

[Read for more? Go to chapter 6!](./Ch06_notebook.ipynb)