# Map, reduce, filter

## Introduction



Mapping, and filtering are important concepts in functional programming. In this lesson we will expand on the functional programming concepts we have learnt in previous lessons and apply these concepts using mapping, reducing, and filtering.

## Mapping



The goal of using the map() function is to apply a function to a sequence (like a list or a set). It takes:

* a function;
* an argument; 
* a sequence (e.g. a list). 

and returns a sequence with the function applied to every element in the sequence. 

For example, let's create a function that divides a number by 2 and returns the result.

In [1]:
# write a function that takes x as an input and returns half
def half(x):
    return x/2

In [2]:
half(6)

3.0

In [3]:
half(62.2)

31.1

In [4]:
half([6.3, 5.2])

TypeError: unsupported operand type(s) for /: 'list' and 'int'

As can be seen, a function like this one does not accept a list. However, we can use the map() function to iterate over objects with multiple values. 

In [6]:
map(half, [6.3, 5.2])

list(map(half, [6.3, 5.2]))

[3.15, 2.6]

The map() function creates a map object which is an iterable object. To view the result, we can convert the iterable into a list.

In [11]:
# use map to create a map object and apply it to the list of numbers
lst = [10, 12, 34, 23]


var = map(half, lst)
print(list(var))
print(var)

#for item in lst:
#    print(item /2)

[5.0, 6.0, 17.0, 11.5]
<map object at 0x113313110>


Similarly, we can cast the iterable into a set.

In [9]:
# convert the map object to a set
set(map(half,lst))

{5.0, 6.0, 11.5, 17.0}

## Filtering



Like the map() function, the filter function takes a function and a sequence and returns an iterable. The goal of this function is to *remove* elements from our sequence. **Our function should return true for all the elements we want to keep and false for the ones we want to remove.** 

For example, we can create a function that returns true if a number is even and false if it is odd. In fact, let's use a lambda expression for this task.

In [12]:
# create a lambda function that determines oddness and eveness
even = lambda x: x % 2 == 0

In [14]:
#even(5)
even(4)

True

Let's now use to filter function in combination with a list.

In [16]:
# create a filter object
lst = [10, 12, 34, 23]

list(filter(even, lst)) 
# T # T # T # F -> [10, 12, 34]

[10, 12, 34]

In [17]:
# convert this object to a list
lst = [10, 12, 34, 23] 

list(filter(lambda x: x % 2 == 0, lst))

[10, 12, 34]

We can also use the filter operation to get rid of words such as 'the', 'and' and 'a'. Removing the 'stop words' is standard practice in fields such as Natural Language Processing. 

In [24]:
text = "In the late summer of that year we lived in a and house in a village that looked across the river and the plain to the mountains."

In [25]:
removed = lambda x: x not in ['the', 'a', 'an', 'and']
list(filter(removed, text.split(" ")))

['In',
 'late',
 'summer',
 'of',
 'that',
 'year',
 'we',
 'lived',
 'in',
 'house',
 'in',
 'village',
 'that',
 'looked',
 'across',
 'river',
 'plain',
 'to',
 'mountains.']

In [27]:
# One liner version  
list(filter(lambda word: word not in ['the', 'a', 'an', 'and'], text.split(" ")))

['In',
 'late',
 'summer',
 'of',
 'that',
 'year',
 'we',
 'lived',
 'in',
 'house',
 'in',
 'village',
 'that',
 'looked',
 'across',
 'river',
 'plain',
 'to',
 'mountains.']

## Reducing



While the map() function applies the function to *each* element in the sequence, sometimes we might want to apply a function that will **aggregate all elements in the sequence**.


In a sense, the reduce() function is similar to methods like sum(), max() etc.


Let's write a lambda expression that will take two elements and sum them.

In [28]:
# create a lambda summation function
summation = lambda a,b: a + b

In [32]:
lst = [10, 12, 34, 23]

summation(lst)

TypeError: <lambda>() missing 1 required positional argument: 'b'

This will not work as our anonymous function only accepts two input arguments. In those cases, we can make use of reduce().

In [33]:
# import functools here
from functools import reduce

In [51]:
# create a reduce object with lambda and lst
lst = [10, 12, 34, 23, 34]
    
    
#reduce(lambda a,b: a + b, lst) 
reduce(lambda a,b: a + b, set(lst)) 
reduce(lambda a,b: a + b, lst) 

set(lst)

{10, 12, 23, 34}

## Functional Programming in Pandas



In pandas, we can use the apply() function to apply a function to a dataset. 



Here is an example with a generated dataframe.

In [54]:
# Import pandas and numpy and generate df 
import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randint(10, size=(4,3)), columns=['a', 'b', 'c'])
df

Unnamed: 0,a,b,c
0,7,5,0
1,5,9,6
2,8,6,1
3,7,7,3


We can use the half() function we defined earlier and use the **apply() method** to apply it to every cell in the dataframe.

In [55]:
def half(x):
    return x/2

df.apply(half)

Unnamed: 0,a,b,c
0,3.5,2.5,0.0
1,2.5,4.5,3.0
2,4.0,3.0,0.5
3,3.5,3.5,1.5


Furthermore, we can define an aggregate function that will return the range of a column (the difference between the maximum and the minimum values).

In [56]:
housing = pd.read_csv('https://raw.githubusercontent.com/loukjsmalbil/datasets_ws/master/housing_prices.csv')
housing.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000


In [58]:
housing_numeric = housing._get_numeric_data()
housing_numeric.head()

Unnamed: 0,Id,MSSubClass,LotFrontage,LotArea,OverallQual,OverallCond,YearBuilt,YearRemodAdd,MasVnrArea,BsmtFinSF1,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SalePrice
0,1,60,65.0,8450,7,5,2003,2003,196.0,706,...,0,61,0,0,0,0,0,2,2008,208500
1,2,20,80.0,9600,6,8,1976,1976,0.0,978,...,298,0,0,0,0,0,0,5,2007,181500
2,3,60,68.0,11250,7,5,2001,2002,162.0,486,...,0,42,0,0,0,0,0,9,2008,223500
3,4,70,60.0,9550,7,5,1915,1970,0.0,216,...,0,35,272,0,0,0,0,2,2006,140000
4,5,60,84.0,14260,8,5,2000,2000,350.0,655,...,192,84,0,0,0,0,0,12,2008,250000


In [59]:
def range_func(x):
    return max(x) - min(x)

When we apply the function to our dataframe, it will compute the range for each column by default.

In [62]:
# apply range to df
housing_numeric.apply(range_func)

Id                 1459.0
MSSubClass          170.0
LotFrontage         292.0
LotArea          213945.0
OverallQual           9.0
OverallCond           8.0
YearBuilt           138.0
YearRemodAdd         60.0
MasVnrArea         1600.0
BsmtFinSF1         5644.0
BsmtFinSF2         1474.0
BsmtUnfSF          2336.0
TotalBsmtSF        6110.0
1stFlrSF           4358.0
2ndFlrSF           2065.0
LowQualFinSF        572.0
GrLivArea          5308.0
BsmtFullBath          3.0
BsmtHalfBath          2.0
FullBath              3.0
HalfBath              2.0
BedroomAbvGr          8.0
KitchenAbvGr          3.0
TotRmsAbvGrd         12.0
Fireplaces            3.0
GarageYrBlt         110.0
GarageCars            4.0
GarageArea         1418.0
WoodDeckSF          857.0
OpenPorchSF         547.0
EnclosedPorch       552.0
3SsnPorch           508.0
ScreenPorch         480.0
PoolArea            738.0
MiscVal           15500.0
MoSold               11.0
YrSold                4.0
SalePrice        720100.0
dtype: float

In [64]:
# show apply with lambda functions
#housing_numeric.apply(lambda x: max(x)- min(x))

housing._get_numeric_data().apply(lambda x: max(x)- min(x))

Id                 1459.0
MSSubClass          170.0
LotFrontage         292.0
LotArea          213945.0
OverallQual           9.0
OverallCond           8.0
YearBuilt           138.0
YearRemodAdd         60.0
MasVnrArea         1600.0
BsmtFinSF1         5644.0
BsmtFinSF2         1474.0
BsmtUnfSF          2336.0
TotalBsmtSF        6110.0
1stFlrSF           4358.0
2ndFlrSF           2065.0
LowQualFinSF        572.0
GrLivArea          5308.0
BsmtFullBath          3.0
BsmtHalfBath          2.0
FullBath              3.0
HalfBath              2.0
BedroomAbvGr          8.0
KitchenAbvGr          3.0
TotRmsAbvGrd         12.0
Fireplaces            3.0
GarageYrBlt         110.0
GarageCars            4.0
GarageArea         1418.0
WoodDeckSF          857.0
OpenPorchSF         547.0
EnclosedPorch       552.0
3SsnPorch           508.0
ScreenPorch         480.0
PoolArea            738.0
MiscVal           15500.0
MoSold               11.0
YrSold                4.0
SalePrice        720100.0
dtype: float

## Summary



In this lesson, we learnt about:

* The map() function which allows us to apply a function to an entire sequence. 


* The filter() function which allows us to use a function to remove elements from a sequence that do not meet the condition specified by the function. 


* The reduce() function. This function allows us to perform an aggregation on a sequence of elements. 


* The apply() function for dataframes. 