# Python Higher Order Functions

### Practice Problems 

Please note that you may only use higher order functions **without access to global variables**. Your expression should contain only **map()**, **filter()**, **sorted**, **reduce()** and your custom functions.

We will be using only the citibike data (i.e. *citibike.csv*) for this homework.

In [4]:
import csv

## Task 1

You are provided a list of service status updates scraped from an MTA information website. Each update may indicate <i>Good Service</i>, <i>Planned Work</i>, or <i>Delays</i> for one or more subway lines. Our first objective is to list all the lines that are running with <i>Delays</i>. To guide you through the process, our problem are also split into smaller tasks.

In [1]:
# This is your input data, a list of subway line status.
# It is a list of string in a specific format

status = [
    '1,2,3 : Good Service',
    '4,5,6 : Delays',
    '7 : Good Service',
    'A,C : Good Service',
    'E : Planned Work',
    'G : Delays',
    'B,D,F,M : Good Service',
    'J,Z : Delays',
    'L : Good Service',
    'N,Q,R : Planned Work',
    'S : Good Service',
]

#### 1.1 Please complete the lambda expression to filter only the status updates for the lines that run with <i>Delays</i>.

In [2]:
#example how to quickly use lambda
example = lambda x: x+x
print example(10)

#Now lambda with a filter function to work on the list status
delays = filter(lambda x: 'Delays' in x, status)

print type(status)
print delays
print type(delays)

20
<type 'list'>
['4,5,6 : Delays', 'G : Delays', 'J,Z : Delays']
<type 'list'>


#### 1.2 Please complete the lambda expression below to convert each status line into a list of subway lines, i.e. <b><i>'4,5,6 : Delays'</i></b> would become <b><i>['4','5','6']</i></b>

In [3]:
# .split() takes in two parameters SEPARTOR and MAXSPLIT (optionals)
# .split(SEPARATOR, MAXSPLIT) the specified character ex: ',' and the maximum number of splits 

DelayLinesList = map( lambda x: x.split(' :')[0].split(','), filter(lambda x: 'Delays' in x, status) )
# We map the lambda function to the list we get from the previous filter
# the [0] is added bc we have a list of pairs separated by ':'

DelayLinesList

[['4', '5', '6'], ['G'], ['J', 'Z']]

#### 1.3 Please complete the reduce command below to convert each the list of subway lists given in <i>delayLineList</i> into a single list of subway lines running with delay.

In [4]:
delayLines = reduce( lambda x, y : x+y, DelayLinesList ) 
#To use reduce function, the second parameter has to be a sequence. 
#In the reduce function, the lambda fucntion takes two variables, 

delayLines
# After this, your delayLines should be
# ['4', '5', '6', 'G', 'J', 'Z']

['4', '5', '6', 'G', 'J', 'Z']

#### 1.4 Please complete the reduce command below to count the number of lines in <b>delayLines</b>.

In [5]:
delayLinesCount = reduce(lambda x, y: x+1 ,delayLines, 0) 
#NOT FORGET! Reduce can also take a THIRD parameter. It specifies where you want the reducing proccess to start

delayLinesCount

# After this, your delayLineCount should be
# 6

6

## Task 2

In this excercise, we would like to expand the combined service updatse into separate updates for each subway line. For example, instead of having a single line <b>'1,2,3 : Good Service'</b> to indicate that line 1, 2, and 3 are in good service, we would like to convert that into 3 separate updates: <b>'1 : Good Service'</b>, <b>'2 : Good Service'</b>, and <b>'3 : Good Service'</b>.

You are tasked to write a chain of map(), filter(), and/or reduce() to convert the <b>status</b> variable into a list like below:

['1 : Good Service',
 '2 : Good Service',
 '3 : Good Service',
 '4 : Delays',
 '5 : Delays',
 '6 : Delays',
 '7 : Good Service',
 'A : Good Service',
 'C : Good Service',
 'E : Planned Work',
 'G : Delays',
 'B : Good Service',
 'D : Good Service',
 'F : Good Service',
 'M : Good Service',
 'J : Delays',
 'Z : Delays',
 'L : Good Service',
 'N : Planned Work',
 'Q : Planned Work',
 'R : Planned Work',
 'S : Good Service']

In [123]:
def transformLine(row):
    subway, status = row.split(':') 
    subway = subway.split(',')
    return map(lambda x: x + ':' + status,  subway)
    
#updates = map(transformLine, status)

updates = reduce(lambda x, y: x + y , map(transformLine, status))
    
print ('\n').join(updates)

#status
# ['1,2,3 : Good Service',
#  '4,5,6 : Delays',
#  '7 : Good Service',
#  'A,C : Good Service',
#  'E : Planned Work',
#  'G : Delays',
#  'B,D,F,M : Good Service',
#  'J,Z : Delays',
#  'L : Good Service',
#  'N,Q,R : Planned Work',
#  'S : Good Service']


1: Good Service
2: Good Service
3 : Good Service
4: Delays
5: Delays
6 : Delays
7 : Good Service
A: Good Service
C : Good Service
E : Planned Work
G : Delays
B: Good Service
D: Good Service
F: Good Service
M : Good Service
J: Delays
Z : Delays
L : Good Service
N: Planned Work
Q: Planned Work
R : Planned Work
S : Good Service


## Task 3

In this excercise, you are tasked to perform a similar task as in Task 1 of Homework 1 but extracting the birth year of the first 'Subscriber' ride of the day from the *citibike.csv*. However, instead of iterating through the stream using generators, you are asked to complete the task using higher order functions map(), filter() and/or reduce(). You are free to define additional functions to be used in your higher order functions, however, you are not allowed to use global variables within these functions without being passed in as arguments.

In [36]:
def reducer(finalList, row): # (__output__, __input__) 
    if row['usertype'] == 'Subscriber': #checks if user is subscriber
        day = row['starttime'].split(' ')[0] #grabs the start time and splits it to grab current day: 2015-02-01
        
        if len(finaList) == 0 or xyz[-1][0] < day: # if list empty or if last element is less than the day grabbed above ^^^
            finalList.append((day, row['birth_year'])) #append to list a tuple: [(day , 'birth_year')]
    return finalList
    
with open('citibike.csv','r') as fi:
    reader = csv.DictReader(fi)
    first_birth_years = map(lambda x: x[1], reduce(reducer, reader, [])) # reduce( __function__ , __iterable__ , __initializer__ )
    
first_birth_years

['1978', '1992', '1982', '1969', '1971', '1989', '1963']


## Example for the use of sorted()

We would like to write an HOF expression to count the total number of trip activities involved each station. For example, if a rider starts a trip at station A and ends at station B, each station A and B will receive +1 count for  the trip. The output must be tuples, each consisting of a station name and a total count. A portion of the expected output are included below.

* **NOTE:** a suggested solution is given below to demonstrate the use of **sorted()**

In [37]:
def mapper1(row):
    return (row['start_station_name'], row['end_station_name'])

def reducer1(counts, pair): # (__output__, __input__) **SAME** ( initializer passed , list to work with ) 
    for p in pair:
        counts[p] = counts.get(p, 0)+1
    return counts

with open('citibike.csv', 'r') as fi:
    reader = csv.DictReader(fi)
    output1 = sorted(reduce(reducer1, map(mapper1, reader), {} ).items()) # {} optional / initializes in counts

output1[:10]

#The output that should be obtained
# [('1 Ave & E 15 St', 795),
#  ('1 Ave & E 44 St', 219),
#  ('10 Ave & W 28 St', 422),
#  ('11 Ave & W 27 St', 354),
#  ('11 Ave & W 41 St', 461),
#  ('11 Ave & W 59 St', 242),
#  ('12 Ave & W 40 St', 217),
#  ('2 Ave & E 31 St', 588),
#  ('2 Ave & E 58 St', 125),
#  ('3 Ave & Schermerhorn St', 34)]

[('1 Ave & E 15 St', 795),
 ('1 Ave & E 44 St', 219),
 ('10 Ave & W 28 St', 422),
 ('11 Ave & W 27 St', 354),
 ('11 Ave & W 41 St', 461),
 ('11 Ave & W 59 St', 242),
 ('12 Ave & W 40 St', 217),
 ('2 Ave & E 31 St', 588),
 ('2 Ave & E 58 St', 125),
 ('3 Ave & Schermerhorn St', 34)]

## Task 4

Next, we would like to do the same task as the example, but only keep the stations with more than 1000 trips involved. Please add your HOF expression below.

In [41]:
def mapper1(row):
    return (row['start_station_name'], row['end_station_name'])

def reducer1(counts, pair):
    for p in pair:
        counts[p] = counts.get(p, 0)+1
    return counts

with open('citibike.csv', 'r') as fi:
    reader = csv.DictReader(fi)
    output2 = filter( lambda x: x[1]>1000 ,sorted(reduce(reducer1, map(mapper1, reader), {} ).items())) 
    # {} optional / initializes in counts
    
output2[:10]

[('8 Ave & W 31 St', 1065),
 ('E 43 St & Vanderbilt Ave', 1003),
 ('Lafayette St & E 8 St', 1013),
 ('W 21 St & 6 Ave', 1057),
 ('W 41 St & 8 Ave', 1095)]

#### Quick example of adding initializer to the reduce function
list1 = [ ( 'key1', 0 ), ( 'key2', 1 ), ( 'key3', 2 ), ( 'key4', 3 ), ( 'key5', 4 )]

def reduceExample( counts, pair):
    print type(counts)
    print type(pair)
    
    
reduce(reduceExample, list1, {})  


## Task 5

We would like to count the number of trips taken between pairs of stations. Trips taken from station A to station B or  from station B to station A are both counted towards the station pair A and B. *Please note that the station pair should be identified by station names, as a tuple, and **in lexical order**, i.e. **(A,B)** instead of ~~(B,A)~~ in this case*. The output must be tuples, each consisting of the station pair identification and a count. A portion of the expected output are included below. Please provide your HOF expression.

In [5]:
def mapper2(row):
    return (row['start_station_name'], row['end_station_name'])

def reducer2(counts, pair):
    oppositePair = pair[1], pair[0]
    
    if pair[0]<oppositePair[0]:
        counts[pair] = counts.get(pair, 0)+1
    else:
        counts[oppositePair] = counts.get(oppositePair, 0)+1
    
    return counts 
    
with open('citibike.csv', 'r') as fi:
    reader = csv.DictReader(fi)
    output3 = sorted(reduce(reducer2 , map(mapper2, reader) , {}).items())
    
output3[:10]

#The output that should be obtained
#[(('1 Ave & E 15 St', '1 Ave & E 15 St'), 5),
#  (('1 Ave & E 15 St', '1 Ave & E 44 St'), 6),
#  (('1 Ave & E 15 St', '11 Ave & W 27 St'), 1),
#  (('1 Ave & E 15 St', '2 Ave & E 31 St'), 9),
#  (('1 Ave & E 15 St', '5 Ave & E 29 St'), 2),
#  (('1 Ave & E 15 St', '6 Ave & Broome St'), 3),
#  (('1 Ave & E 15 St', '6 Ave & Canal St'), 1),
#  (('1 Ave & E 15 St', '8 Ave & W 31 St'), 5),
#  (('1 Ave & E 15 St', '9 Ave & W 14 St'), 3),
#  (('1 Ave & E 15 St', '9 Ave & W 16 St'), 3)] 

[(('1 Ave & E 15 St', '1 Ave & E 15 St'), 5),
 (('1 Ave & E 15 St', '1 Ave & E 44 St'), 6),
 (('1 Ave & E 15 St', '11 Ave & W 27 St'), 1),
 (('1 Ave & E 15 St', '2 Ave & E 31 St'), 9),
 (('1 Ave & E 15 St', '5 Ave & E 29 St'), 2),
 (('1 Ave & E 15 St', '6 Ave & Broome St'), 3),
 (('1 Ave & E 15 St', '6 Ave & Canal St'), 1),
 (('1 Ave & E 15 St', '8 Ave & W 31 St'), 5),
 (('1 Ave & E 15 St', '9 Ave & W 14 St'), 3),
 (('1 Ave & E 15 St', '9 Ave & W 16 St'), 3)]

## Task 6

Next, we would like to futher process the output from Task 3 to determine the station popularity among all of the station pairs that have 35 or more trips. The popularity of station is calculated by how many times it appears on the list. In other words, we would like to first filter the station pairs to only those that have 35 or more trips. Then, among these pairs, we count how many times each station appears and report back the counts. The output will be tuples, each consisting of the station name and a count. The expected output are included below. As illustrated, *W 41 St & 8 Ave* station is the most "popular" with 4 appearances. Please provide your HOF expression below. You can use the output3 from the previous task.

In [7]:
def reducer2(counts, pair):
    location1 = pair[0][0]
    location2 = pair[0][1]
    counts[location1] = counts.get(location1, 0)+1
    counts[location2] = counts.get(location2, 0)+1
    return counts

output4 = reduce(reducer2 , filter(lambda x: x[1]>35 , output3),{})

output4

#The output that should be obtained
# [('10 Ave & W 28 St', 1),
#  ('11 Ave & W 27 St', 2),
#  ('11 Ave & W 41 St', 1),
#  ('8 Ave & W 31 St', 3),
#  ('8 Ave & W 33 St', 1),
#  ('9 Ave & W 22 St', 1),
#  ('Adelphi St & Myrtle Ave', 1),
#  ('DeKalb Ave & Hudson Ave', 1),
#  ('E 10 St & Avenue A', 1),
#  ('E 24 St & Park Ave S', 2),
#  ('E 27 St & 1 Ave', 1),
#  ('E 32 St & Park Ave', 1),
#  ('E 33 St & 2 Ave', 2),
#  ('E 43 St & Vanderbilt Ave', 2),
#  ('E 47 St & Park Ave', 1),
#  ('E 6 St & Avenue B', 1),
#  ('E 7 St & Avenue A', 1),
#  ('Lafayette St & E 8 St', 3),
#  ('Pershing Square North', 1),
#  ('Pershing Square South', 2),
#  ('Vesey Pl & River Terrace', 1),
#  ('W 17 St & 8 Ave', 1),
#  ('W 20 St & 11 Ave', 2),
#  ('W 21 St & 6 Ave', 1),
#  ('W 26 St & 8 Ave', 1),
#  ('W 31 St & 7 Ave', 2),
#  ('W 33 St & 7 Ave', 2),
#  ('W 41 St & 8 Ave', 4),
#  ('West Thames St', 1)]

{'10 Ave & W 28 St': 1,
 '11 Ave & W 27 St': 2,
 '11 Ave & W 41 St': 1,
 '8 Ave & W 31 St': 3,
 '8 Ave & W 33 St': 1,
 '9 Ave & W 22 St': 1,
 'Adelphi St & Myrtle Ave': 1,
 'DeKalb Ave & Hudson Ave': 1,
 'E 10 St & Avenue A': 1,
 'E 24 St & Park Ave S': 2,
 'E 27 St & 1 Ave': 1,
 'E 32 St & Park Ave': 1,
 'E 33 St & 2 Ave': 2,
 'E 43 St & Vanderbilt Ave': 2,
 'E 47 St & Park Ave': 1,
 'E 6 St & Avenue B': 1,
 'E 7 St & Avenue A': 1,
 'Lafayette St & E 8 St': 3,
 'Pershing Square North': 1,
 'Pershing Square South': 1,
 'Vesey Pl & River Terrace': 1,
 'W 17 St & 8 Ave': 1,
 'W 20 St & 11 Ave': 2,
 'W 21 St & 6 Ave': 1,
 'W 26 St & 8 Ave': 1,
 'W 31 St & 7 Ave': 2,
 'W 33 St & 7 Ave': 2,
 'W 41 St & 8 Ave': 3,
 'West Thames St': 1}

## Task 7

In this task, you are asked to compute the station with the most riders started from, per each gender of the *'Subscriber'* user. Meaning, what was the station name with the highest number of bike pickups for female riders, for male riders and for unknown riders.

The output will be a list of tuples, each includes a gender label (as indicated below) and another tuple consisting of a station name, and the total number of trips started at that station for that gender. The expected output are included below. Please provide your HOF expression below.

The label mapping for the gender column in citibike.csv is: (Zero=**Unknown**; 1=**Male**; 2=**Female**)

In [43]:
def mapper3(row):
    if row['usertype']=='Subscriber':
        return (row['start_station_name'], row['gender'])
    
def reducer3(counts, pair):
    counts[pair] = counts.get(pair, 0)+1
    return counts

def reducer4(counts, pair):
    gender = pair[0][1]
    location = pair[0][0]
    values = pair[1]
    newPair = (location, values)
    
    if gender=='0':
        if counts['Unknown']==None or values>counts['Unknown'][1]:
            counts['Unknown'] = newPair
        return counts

    if gender=="1":
        if counts["Male"]==None or values > counts["Male"][1]:
            counts["Male"] = newPair
        return counts

    if gender=="2":
        if counts["Female"]==None or values > counts["Female"][1]:
            counts["Female"] = newPair
        return counts


with open('citibike.csv', 'r') as fi:
    reader = csv.DictReader(fi)
    output5_1 = map(mapper3, reader) #grabs the start station and gender from every row in file
    output5_2 = filter(lambda x: x!=None, output5_1) #takes all the None out of the list
    output5_3 = reduce(reducer3, output5_2, {}) #creates dictionary with { (location, gender) : values}
    output5_4 = sorted(output5_3.items()) #Use sorted() to make turn dict into list
    output5_5 = reduce(reducer4, output5_4, {'Unknown':None, 'Female':None, 'Male':None})
    output5_6 = sorted(output5_5.items()) #last sorted to turn it into list
    
output5_6

#The ouput that should be obtained
# [('Female', ('W 21 St & 6 Ave', 107)),
#  ('Male', ('8 Ave & W 31 St', 488)),
#  ('Unknown', ('Catherine St & Monroe St', 1))]

[('Female', ('W 21 St & 6 Ave', 107)),
 ('Male', ('8 Ave & W 31 St', 488)),
 ('Unknown', ('Catherine St & Monroe St', 1))]