# Map, Reduce, Filter


Lesson Goals

In this lesson you will learn all about:

    Mapping, reducing, and filtering in Python
    The apply function in pandas and how it relates to the Python functions above

Introduction

Mapping, and filtering are important concepts in functional programming. These concepts come up in other distributed programming frameworks and exist in Python as well. In this lesson we will expand on the functional programming concepts we have learned in previous lessons and apply these concepts using mapping, reducing, and filtering.
Functional Programming Recap

As we have learned in previous lessons, functional programming is a programming paradigm where code is written in such a way that avoids mutability or sharing state. Operations are performed by passing data through functions and storing the result in a new variable.
Immutability

An immutable object is an object that cannot be changed after it is created. By sticking with functional programming, we ensure that no two processes will modify the same data. Instead, when each function modifies the data, it will then store the resulting data in a new variable. This results in cleaner, safer and easier to read code.
Transforming State

When writing functionally, we can think of a function as a mapping from input to output. An example of shared state is a computer game where we have multiple characters moving on the screen. Moving one character might affect another character, even unintentionally. With functional programming, this is not a concern for us.
Mapping

The goal of using the map() function is to apply a function to a sequence (like a list or a set). The map() function takes a function as an argument as well as a sequence and returns a sequence with the function applied to every element in the sequence. For example, let's create a function that divides a number by 2 and returns the result. 

In [1]:
def half(x):
    return x / 2

Now that we have our function, let's apply it to a list of numbers.



In [2]:
l = [10, 12, 34, 23]
map(half, l)

<map at 0x7fcb1457a5f8>

The map() function creates a map object which is an iterable object. To create a new list, one option is to convert the iterable into a list. 

In [3]:
list(map(half, l))

[5.0, 6.0, 17.0, 11.5]

Similarly, we can cast the iterable into a set.


# Filtering

Like the map() function, the filter function takes a function and a sequence and returns an iterable. The goal of this function is to use the function we pass to it to remove elements from our sequence. Our function should return true for all the elements we want to keep and false for the ones we want to remove. For example, we can create a function that returns true if a number is odd and false if it is even. In fact, let's use a lambda expression for this task.



In [4]:
filter(lambda x: x % 2 == 1, l)

<filter at 0x7fcb1457a588>

Again, this returns an iterable, so we will cast it to a list.

In [5]:
list(filter(lambda x: x % 2 == 1, l))

[23]

# Reducing

While the map() function applies the function to each element in the sequence, sometimes we might want to apply a function that will aggregate all elements in the sequence. There are built-in examples in Python for this like the max() function or the sum() function. The reduce() does exactly this. This function is not standard in Python and needs to be imported from the functools library. The reduce function starts from the beginning of the sequence and operates on two elements at a time. This is why the function passed to reduce() should always take two elements and return one.

For example, if we would like to create a summation function using reduce(), we will sum two elements at a time.

Let's write a lambda expression that will take two elements and sum them.

In [6]:
summation = lambda a, b: a + b

By passing this lambda expression to the reduce() function (along with a list), we will perform the following operations in this order:

reduce order

Here is the completed snippet of code to find the sum:

In [7]:
from functools import reduce

reduce(lambda a, b: a + b, l)

79

Here is another example of reduce(). This time we will use reduce() to find the maximum of a list. We will do this by comparing two elements at a time and returning the largest of the two. 

In [8]:
reduce(lambda a, b: a if a > b else b, l)

34


# Functional Programming in Pandas

In pandas, we can use the apply() function to apply a function to a dataset. We do not make a distinction between functions that are applied to every row or every column and aggregate functions. We can use the apply function for both types.

Here is an example of a dataframe.

In [9]:
import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randn(4, 3), columns=['a', 'b', 'c'])
df

Unnamed: 0,a,b,c
0,-0.056673,-0.586679,1.316608
1,0.317204,-0.670107,-0.402551
2,1.510662,0.801214,1.097762
3,-1.542299,0.058841,-0.053087


We can use the half() function we defined earlier and apply it to every cell in the dataframe. 

In [10]:
df.apply(half)

Unnamed: 0,a,b,c
0,-0.028337,-0.29334,0.658304
1,0.158602,-0.335053,-0.201276
2,0.755331,0.400607,0.548881
3,-0.771149,0.029421,-0.026543


Furthermore, we can define an aggregate function that will return the range of a column (the difference between the maximum and the minimum values).

In [11]:
def range_func(x):
    return max(x) - min(x)

When we apply the function to our dataframe, it will compute the range for each column by default.

In [12]:
df.apply(range_func)

a    3.052961
b    1.471320
c    1.719159
dtype: float64