# Lab 3

In this lab, we will practice Python's higher order functions, in particular, map(), filter() and reduce().


## Task 1

You are provided a list of service status updates scraped from an MTA information website. Each update may indicate <i>Good Service</i>, <i>Planned Work</i>, or <i>Delays</i> for one or more subway lines. Our first objective is to list all the lines that are running with <i>Delays</i>. To guide you through the process, we split the problem into smaller tasks.

In [16]:
from functools import reduce
import csv

In [17]:
# This is your input data, a list of subway line status.
# It is a list of string in a specific format

status = [
    '1,2,3 : Good Service',
    '4,5,6 : Delays',
    '7 : Good Service',
    'A,C : Good Service',
    'E : Planned Work',
    'G : Delays',
    'B,D,F,M : Good Service',
    'J,Z : Delays',
    'L : Good Service',
    'N,Q,R : Planned Work',
    'S : Good Service',
]

### Sub-Task 1

Please complete the lambda expression to filter only the status updates for the lines that run with <i>Delays</i>.

In [18]:
delayUpdates = list(filter(lambda x: 'Delays' in x,status))
delayUpdates

['4,5,6 : Delays', 'G : Delays', 'J,Z : Delays']

### Sub-Task 2

Please complete the lambda expression below to convert each status line into a list of subway lines, i.e. <b><i>'4,5,6 : Delays'</i></b> would become <b><i>['4','5','6']</i></b>

In [19]:
delayLineList = list(map(lambda x:x.split(':')[0].split(','),delayUpdates))
delayLineList

[['4', '5', '6 '], ['G '], ['J', 'Z ']]

### Sub-Task 3

Please complete the reduce command below to convert each the list of subway lists given in <i>delayLineList</i> into a single list of subway lines running with delay.

In [20]:
delayLines = reduce(lambda x,y: y+x,delayLineList)
delayLines

['J', 'Z ', 'G ', '4', '5', '6 ']

### Sub-Task 4

Please complete the reduce command below to count the number of lines in <b>delayLines</b>.

In [21]:
sum_delayLines = reduce(lambda x,y: x+1,delayLines,0)
sum_delayLines

6

## Task 2

In this excercise, we would like to expand the combined service updatse into separate updates for each subway line. For example, instead of having a single line <b>'1,2,3 : Good Service'</b> to indicate that line 1, 2, and 3 are in good service, we would like to convert that into 3 separate updates: <b>'1 : Good Service'</b>, <b>'2 : Good Service'</b>, and <b>'3 : Good Service'</b>.

You are tasked to write a chain of map(), filter(), and/or reduce() to convert the <b>status</b> variable into a list like below:

Please note that you may only use higher order functions without access to global variables. Your expression should contain only map(), filter() and/or reduce() and your custom function definitions.

In [22]:
status[:2]


['1,2,3 : Good Service', '4,5,6 : Delays']

In [23]:
def map1(x):
     lines ,status =x.split(':')
     l=list(map(lambda x:x+':'+status,lines.split(',')))
     return l
def reduce1(x,y):
    return x+y
updates = reduce(lambda x,y:x+y,map(map1,status))
updates

['1: Good Service',
 '2: Good Service',
 '3 : Good Service',
 '4: Delays',
 '5: Delays',
 '6 : Delays',
 '7 : Good Service',
 'A: Good Service',
 'C : Good Service',
 'E : Planned Work',
 'G : Delays',
 'B: Good Service',
 'D: Good Service',
 'F: Good Service',
 'M : Good Service',
 'J: Delays',
 'Z : Delays',
 'L : Good Service',
 'N: Planned Work',
 'Q: Planned Work',
 'R : Planned Work',
 'S : Good Service']

## Task 3



We would like to write an HOF expression to count the total number of trip activities involved each station. For example, if a rider starts a trip at station A and ends at station B, each station A and B will receive +1 count for  the trip. The output must be tuples, each consisting of a station name and a total count. A portion of the expected output are included below.

In [24]:
def mapper2(x):
    return(x['start_station_name'],x['end_station_name'])
def reducer2(x,y):
    start,end = y
    x[start] = x.get(start,0)+1
    x[end] = x.get(end,0)+1
    return x
with open('citibike.csv','r') as fi:
    reader = csv.DictReader(fi)
    #output = list(map(mapper2,reader))
    output = list(map(lambda x: x ,reduce(reducer2,map(mapper2,reader),{}).items()))
output[:10]

[('8 Ave & W 31 St', 1065),
 ('W 54 St & 9 Ave', 134),
 ('E 17 St & Broadway', 943),
 ('1 Ave & E 15 St', 795),
 ('Grand Army Plaza & Central Park S', 291),
 ('Barrow St & Hudson St', 426),
 ('6 Ave & Broome St', 227),
 ('6 Ave & W 33 St', 517),
 ('Lawrence St & Willoughby St', 128),
 ('Atlantic Ave & Fort Greene Pl', 70)]

## Task 4

Next, we would like to do the same task as Task 3, but only keep the stations with more than 1000 trips involved. Please add your HOF expression below.

In [25]:
output4 = list(filter(lambda x : x[1]>1000,output))
output4

[('8 Ave & W 31 St', 1065),
 ('W 41 St & 8 Ave', 1095),
 ('Lafayette St & E 8 St', 1013),
 ('W 21 St & 6 Ave', 1057),
 ('E 43 St & Vanderbilt Ave', 1003)]

## Task 5

We would like to count the number of trips taken between pairs of stations. Trips taken from station A to station B or  from station B to station A are both counted towards the station pair A and B. *Please note that the station pair should be identified by station names, as a tuple, and **in lexical order**, i.e. **(A,B)** instead of ~~(B,A)~~ in this case*. The output must be tuples, each consisting of the station pair identification and a count. A portion of the expected output are included below. Please provide your HOF expression.

In [26]:
def mapper5(x):
    sorted_list= sorted((x['start_station_name'],x['end_station_name']))
    return tuple(sorted_list)
def reducer5(x,y):
    x[y] = x.get(y,0)+1
    return x
with open('citibike.csv') as fi:
    reader = csv.DictReader(fi)
    output5 = reduce(reducer5,map(mapper5,reader),{})
sorted(output5.items())[:10]

[(('1 Ave & E 15 St', '1 Ave & E 15 St'), 5),
 (('1 Ave & E 15 St', '1 Ave & E 44 St'), 6),
 (('1 Ave & E 15 St', '11 Ave & W 27 St'), 1),
 (('1 Ave & E 15 St', '2 Ave & E 31 St'), 9),
 (('1 Ave & E 15 St', '5 Ave & E 29 St'), 2),
 (('1 Ave & E 15 St', '6 Ave & Broome St'), 3),
 (('1 Ave & E 15 St', '6 Ave & Canal St'), 1),
 (('1 Ave & E 15 St', '8 Ave & W 31 St'), 5),
 (('1 Ave & E 15 St', '9 Ave & W 14 St'), 3),
 (('1 Ave & E 15 St', '9 Ave & W 16 St'), 3)]

## Task 6

Next, we would like to futher process the output from Task 5 to determine the station popularity among all of the station pairs that have 35 or more trips. The popularity of station is calculated by how many times it appears on the list. In other words, we would like to first filter the station pairs to only those that have 35 or more trips. Then, among these pairs, we count how many time each station appears and report back the counts. The output will be tuples, each consisting of the station name and a count. The expected output are included below. As illustrated, *W 41 St & 8 Ave* station is the most "popular" with 4 appearances. Please provide your HOF expression below. You can use the output3 from the previous task.

In [28]:
def reducer6(x,y):
    first,second =y[0]
    x[first]=x.get(first,0)+1
    x[second]=x.get(second,0)+1
    return x
output6 = list(filter(lambda x:x[1]>35,output5.items()))
output6 = reduce(reducer6,output6,{})
sorted(output6.items(),key = lambda x:-x[1])

[('Lafayette St & E 8 St', 3),
 ('8 Ave & W 31 St', 3),
 ('W 41 St & 8 Ave', 3),
 ('E 33 St & 2 Ave', 2),
 ('W 31 St & 7 Ave', 2),
 ('11 Ave & W 27 St', 2),
 ('W 20 St & 11 Ave', 2),
 ('W 33 St & 7 Ave', 2),
 ('E 24 St & Park Ave S', 2),
 ('E 43 St & Vanderbilt Ave', 2),
 ('E 10 St & Avenue A', 1),
 ('E 6 St & Avenue B', 1),
 ('10 Ave & W 28 St', 1),
 ('E 32 St & Park Ave', 1),
 ('E 27 St & 1 Ave', 1),
 ('W 26 St & 8 Ave', 1),
 ('W 17 St & 8 Ave', 1),
 ('9 Ave & W 22 St', 1),
 ('W 21 St & 6 Ave', 1),
 ('11 Ave & W 41 St', 1),
 ('8 Ave & W 33 St', 1),
 ('E 7 St & Avenue A', 1),
 ('Pershing Square South', 1),
 ('Vesey Pl & River Terrace', 1),
 ('West Thames St', 1),
 ('Pershing Square North', 1),
 ('Adelphi St & Myrtle Ave', 1),
 ('DeKalb Ave & Hudson Ave', 1),
 ('E 47 St & Park Ave', 1)]

## Task 7

In this task, you are asked to compute the station with the most riders started from, per each gender of the *'Subscriber'* user. Meaning, what was the station name with the highest number of bike pickups for female riders, for male riders and for unknown riders.

The output will be a list of tuples, each includes a gender label (as indicated below) and another tuple consisting of a station name, and the total number of trips started at that station for that gender. The expected output are included below. Please provide your HOF expression below.

The label mapping for the gender column in citibike.csv is: (Zero=**Unknown**; 1=**Male**; 2=**Female**)

In [44]:
def mapper7(x):
    return (x['start_station_name'],x['gender'])
def reducer7(x,station_gender):
    x[station_gender]=x.get(station_gender,0)+1
    return x
def mapper7_1(x):
    station,gender = x[0]
    count = x[1]
    label = ['Unknown','Male','Female']
    return(label[int(gender)],(station,count))
def reducer7_1(x,y):
    new_gender , new_station_count = y
    if new_gender in x:
        new_count = new_station_count[1]
        old_count = x.get(new_gender)[1]
        if old_count>new_count:
            return x 
        else:
            x[new_gender] = new_station_count
            return x
    else:
        x[new_gender] = new_station_count
        return x
        
with open('citibike.csv') as fi:
    reader = csv.DictReader(fi)
    output7 = list(filter(lambda x:x['usertype']=='Subscriber',reader))
    output7 = reduce(reducer7,map(mapper7,output7),{})
    output7 = list(map(mapper7_1,output7.items()))
    output7 = reduce(reducer7_1,output7,{})
output7

{'Female': ('W 21 St & 6 Ave', 107),
 'Male': ('8 Ave & W 31 St', 488),
 'Unknown': ('Stanton St & Mangin St', 1)}

## Task 8

In this excercise, you are tasked to perform a similar task as in Lab 2: extracting the birth year of the first 'Subscriber' ride of the day from the *citibike.csv*. However, instead of iterating through the stream using generators, you are asked to complete the task using higher order functions map(), filter() and/or reduce(). You are free to define additional functions to be used in your higher order functions, however, you are not allowed to use global variables within these functions without being passed in as arguments.

In [56]:
def mapper8(x):
    year, _ = x['starttime'].split(' ')
    return year , x['birth_year']
def reducer8(x,y):
    year , birth_year = y
    if year in x:
        return x
    else:
        x[year] = birth_year
        return x
with open('citibike.csv','r') as fi:
    reader = csv.DictReader(fi)
    output8 = filter(lambda x: x['usertype']=='Subscriber',reader)
    output8 = list(map(mapper8,output8))
    output8 = reduce(reducer8,output8,{})
list(output8.values())

['1978', '1992', '1982', '1969', '1971', '1989', '1963']