# Lambda Expressions, Map, Filter & Reduce

Now its time to quickly learn about two built in functions, filter and map. Once we learn about how these operate, we can learn about the lambda expression, which will come in handy when you begin to develop your skills further!

## lambda expression

One of Pythons most useful (and for beginners, confusing) tools is the lambda expression. lambda expressions allow us to create "anonymous" functions. This basically means we can quickly make ad-hoc functions without needing to properly define a function using def.

Function objects returned by running lambda expressions work exactly the same as those created and assigned by defs. There is key difference that makes lambda useful in specialized roles:

**lambda's body is a single expression, not a block of statements.**

* The lambda's body is similar to what we would put in a def body's return statement. We simply type the result as an expression instead of explicitly returning it. Because it is limited to an expression, a lambda is less general that a def. We can only squeeze design, to limit program nesting. lambda is designed for coding simple functions, and def handles the larger tasks.

Lets slowly break down a lambda expression by deconstructing a function:

In [12]:
def square(num):
    result = num**2
    return result

In [13]:
square(2)

4

We could simplify it:

In [14]:
def square(num):
    return num**2

In [15]:
square(2)

4

We could actually even write this all on one line.

In [16]:
def square(num): return num**2

In [17]:
square(2)

4

This is the form a function that a lambda expression intends to replicate. A lambda expression can then be written as:

In [18]:
lambda num: num ** 2

<function __main__.<lambda>>

In [19]:
# You wouldn't usually assign a name to a lambda expression, this is just for demonstration!
square = lambda num: num **2

In [20]:
square(2)

4

Here are a few more examples, keep in mind the more comples a function is, the harder it is to translate into a lambda expression, meaning sometimes its just easier (and often the only way) to create the def keyword function.

** Lambda expression for grabbing the first character of a string: **

In [31]:
lambda s: s[0]

<function __main__.<lambda>>

** Lambda expression for reversing a string: **

In [32]:
lambda s: s[::-1]

<function __main__.<lambda>>

You can even pass in multiple arguments into a lambda expression. Again, keep in mind that not every function can be translated into a lambda expression.

In [34]:
lambda x,y : x + y

<function __main__.<lambda>>

You will find yourself using lambda expressions often with certain non-built-in libraries, for example the pandas library for data analysis works very well with lambda expressions.

Many function calls need a function passed in, such as map and filter. Often you only need to use the function you are passing in once, so instead of formally defining it, you just use the lambda expression. Let's repeat some of the examples from above with a lambda expression

## map function

_map(func, iterables)_ --> map object

The **map** function allows you to "map" a function to an iterable object. That is to say you can quickly call the same function to every item in an iterable, such as a list. For example:

![](imgs/map.png)

In [21]:
map?

In [22]:
def square(num):
    return num**2

In [23]:
my_nums = [1,2,3,4,5]

In [25]:
map(square,my_nums)

<map at 0x7efd98215a20>

In [26]:
# To get the results, either iterate through map() 
# or just cast to a list
list(map(square,my_nums))

[1, 4, 9, 16, 25]

In [27]:
# anonymous, lambda function
lambda x: x**2

<function __main__.<lambda>>

In [28]:
# You can store lambda reference in variable
square_bis = lambda x: x**2

In [29]:
square_bis

<function __main__.<lambda>>

In [8]:
square_bis(12)

144

In [32]:
# map - executes function on all elements of collection (iterable)
# map(function, iterable)
list(map(lambda x: x**2, [1, 2, 3, 4, 5]))

[1, 4, 9, 16, 25]

The functions can also be more complex

In [34]:
def splicer(mystring):
    if len(mystring) % 2 == 0:
        return 'even'
    else:
        return mystring[0]

In [35]:
mynames = ['John','Cindy','Sarah','Kelly','Mike']

In [36]:
list(map(splicer,mynames))

['even', 'C', 'S', 'K', 'even']

## filter function

__filter__ - filters collection using boolean function ( a predicate )

filter(_function, iterable_)

Returns an iterator yielding those items of iterable for which function(item)
is true. Meaning you need to filter by a function that returns either True or False. Then passing that into filter (along with your iterable) and you will get back only the results that would return True when passed to the function.

In [37]:
def check_even(num):
    return num % 2 == 0 

In [38]:
nums = [0,1,2,3,4,5,6,7,8,9,10]

In [39]:
filter(check_even,nums)

<filter at 0x7efd9818f668>

In [40]:
list(filter(check_even,nums))

[0, 2, 4, 6, 8, 10]

In [18]:
filter(lambda x: x % 2 == 1, [1, 2, 3, 4, 5])

<filter at 0x7f3d2c866c50>

returns iterator, use list to get collection

In [19]:
list(filter(lambda x: x % 2 == 1, [1, 2, 3, 4, 5]))

[1, 3, 5]

### Exercise

Using `map`, `filter` and `reduce`, get:

* Product `[1, 2, 3, 4, 5]`.
* length of words `["Python", "Spark", "Big", "Data", "ML", "scikit-learn"]`.
* (★) Sum of all small letters in words that not contain letter `"i"`.

## reduce  function

reduce(_function, iterable, accumulator=0_)

folds collection to one element using binary function

In [None]:
def reduce(x,y):
    return x+y

In [43]:
from functools import reduce
reduce(lambda x, y: x + y, [1, 2, 3, 4, 5])

120

## MapReduce in Hadoop

W Hadoop MapReduce realizowane jest z użyciem par klucz-wartość. Zobacz poniższy przykład:
![](imgs/MapReduce_example.png)

In [44]:
import sys
try:
    from StringIO import StringIO
except ImportError:
    from io import StringIO
# Python 3
#from io import StringIO
import contextlib

# Funkcja pomocnicza przechwytująca strumień wyjściowy
@contextlib.contextmanager
def stdoutIO(stdout=None):
    old = sys.stdout
    if stdout is None:
        stdout = StringIO()
    sys.stdout = stdout
    yield stdout
    sys.stdout = old

# Entry lines for processing
lines = ['123199901', '567200806', '645200811', '989199933', '452199904', '224200822']

# Mapper extracting year and number
def mapper(lines):
    for line in lines:
        key = int(line[3:7])
        value = int(line[7:])
        print("{0}<>{1}".format(key, value))

# Reducer counts sum
def reducer(lines):
    lastKey = None
    reduce_sum = 0
    for line in lines: 
        key, value = line.split("<>")
        if lastKey is None:
            lastKey = key
        if key != lastKey:
            print("{0},{1}".format(lastKey, reduce_sum))
            reduce_sum = 0

        reduce_sum += int(value)
        lastKey = key
    print("{0},{1}".format(lastKey, reduce_sum))
    
# MapReduce 
# Input
print("Input: {}".format(lines))
# Map
with stdoutIO() as mapper_out:
    mapper(lines)
shuffled = mapper_out.getvalue().strip().split('\n')
print("Mapper out: {}".format(shuffled))
# Shuffle
shuffled.sort()
print("Shuffeled mapper out: {}".format(shuffled))
# Reduce
with stdoutIO() as reducer_out:
    reducer(shuffled)
# Output
output = reducer_out.getvalue().strip().split('\n')
print("Output: {}".format(output))

Input: ['123199901', '567200806', '645200811', '989199933', '452199904', '224200822']
Mapper out: ['1999<>1', '2008<>6', '2008<>11', '1999<>33', '1999<>4', '2008<>22']
Shuffeled mapper out: ['1999<>1', '1999<>33', '1999<>4', '2008<>11', '2008<>22', '2008<>6']
Output: ['1999,38', '2008,39']
