In [1]:
import re
import numpy as np


# Map, Filter & Reduce

Let's recap the differences between the two programming paradigms we've seen so far:

**Imperative Paradigm**
- The program is a series of **instructions** that modify a **state**:

```python
x = 0
for i in range(10):
    x = (x + i)*2
print x
```

- The *variable* `x` is the state of our program, which is modified through the `for` loop.

- One the simplest forms of programming, typical of older programming languages (C, Fortran e COBOL por exemplo).

**Functional Programming**
- There is no state: the program defines functions which are applied over the input.

```python
def somar_2(x):
    return x + 2

def mult_4(x):
    return x * 4

saida = somar_2(mult_4(somar_2(entrada)))
```

- In the functional paradigm, functions are variables.
- Originated in the 1970s with LISP and is present today in many data-oriented languages such as R, Julia, Python (em parte).

## Functions are variables

In [2]:
soma_1 = lambda x: x + 1


In [8]:
soma_1

<function __main__.<lambda>(x)>

In [3]:
soma_1(soma_1(soma_1(10)))


13

In [4]:
def divisibles(b):
  return lambda x: 1 if x % b == 0 else 0

In [10]:
[divisibles(x) for x in range(10)]

[<function __main__.divisibles.<locals>.<lambda>(x)>,
 <function __main__.divisibles.<locals>.<lambda>(x)>,
 <function __main__.divisibles.<locals>.<lambda>(x)>,
 <function __main__.divisibles.<locals>.<lambda>(x)>,
 <function __main__.divisibles.<locals>.<lambda>(x)>,
 <function __main__.divisibles.<locals>.<lambda>(x)>,
 <function __main__.divisibles.<locals>.<lambda>(x)>,
 <function __main__.divisibles.<locals>.<lambda>(x)>,
 <function __main__.divisibles.<locals>.<lambda>(x)>,
 <function __main__.divisibles.<locals>.<lambda>(x)>]

In [5]:
def soma_1_c(x):
    return x + 1


In [9]:
[soma_1_c


<function __main__.soma_1_c(x)>

In [7]:
soma_1(soma_1_c(10))


12

We can use this to create functions that `return` other functions:

In [11]:
somar_n = lambda x, n: x + n


In [12]:
soma_1 = lambda x: somar_n(x, 1)


In [13]:
soma_1(10)


11

In [14]:
x = 1
x()

TypeError: ignored

## The `map` concept

One of the key concepts in functional programming is **mapping**: applying a function to the elements of a set, list or other iterable. 

In [15]:
lista_exemplo = [10, 12, 34, 23, 2, 6, 7]


In [16]:
def div_2(x):
    return x / 2


A simple call of `div_2(lista_exemplo)` will not work!

In [17]:
div_2(lista_exemplo)


TypeError: ignored

The `div_2` is expecting a number as an argumento, but `lista_exemplo` is a list!

We could create an empty list and use a loop to iterate over `lista_exemplo`:

In [18]:
new_list = []

for item in lista_exemplo:
  func_application = div_2(item)
  new_list.append(func_application)

new_list


[5.0, 6.0, 17.0, 11.5, 1.0, 3.0, 3.5]

Another way is using a `list comprehension`: one of the tools in the functional programming toolbox:

In [19]:
[div_2(item) for item in lista_exemplo]


[5.0, 6.0, 17.0, 11.5, 1.0, 3.0, 3.5]

A third way is using `map()`:

In [21]:
map(div_2, lista_exemplo)

<map at 0x7fcd7aa07f90>

In [22]:
for i in map(div_2, lista_exemplo):
    print(i)


5.0
6.0
17.0
11.5
1.0
3.0
3.5


The results from `map()` are **lazy**: it is not calculated when you call the functions but when you need the results!

In [28]:
map_le = list(map(div_2, lista_exemplo))


In [29]:
map_le[0]

5.0

A *interesting* behavior of **lazy** iterators is that they become **empty as you iterate over their elements**:

In [None]:
lista_exemplo = [10, 12, 34, 23, 2, 6, 7]

In [38]:
resultado_map = map(div_2, lista_exemplo)

In [39]:
for i in resultado_map:
  print(i)
  print(list(resultado_map))


5.0
[6.0, 17.0, 11.5, 1.0, 3.0, 3.5]


In [32]:
list(resultado_map)

[]

In [40]:
resultado_map = map(div_2, lista_exemplo)


In [41]:
list(resultado_map)


[5.0, 6.0, 17.0, 11.5, 1.0, 3.0, 3.5]

### Lazy evaluation

Lazy evaluation is an important concept in Big Data: it saves memory and CPU by performing computations only **when they are needed**.

In [43]:
lista_telefones = [
    19999571559,
    "(21) 2412-0107",
    "(34) 99762-1166",
    "91-4002-8282",
    "(19) 3542-1820",
    "(19) 3561-9525",
    "(34) 3333-5802",
]
pattern = r"[0-9]{2}"


In [65]:
numero_telefone = str(lista_telefones[1])
print(numero_telefone)
ddd = re.findall(pattern, numero_telefone)
print(ddd)

(21) 2412-0107
['21', '24', '12', '01', '07']


In [61]:
lambda x: re.findall(pattern, str(x))

<function __main__.<lambda>(x)>

In [None]:
for telefone in lista_telefones:
  str_telefone = str(telefone)
  pair_list = re.findall(r'[0-9]{2}', str_telefone)
  ddd = pair_list[0]

In [67]:
func_ddd = lambda x: re.findall(pattern, str(x))[0]

In [71]:
lista_dds = list(map(lambda x: re.findall(pattern, str(x))[0], lista_telefones))
print(lista_dds)


['19', '21', '34', '91', '19', '19', '34']


In [72]:
for ddd in map(lambda x: "".join(re.findall(pattern, str(x)))[:2], lista_telefones):
    print(ddd)


19
21
34
91
19
19
34


## Filtering `filter()`

A segunda parte importante do paradigma funcional é a função `filter()`: ela nos permite filtrar os elementos de um iterável a partir de uma função que retorna valores booleanos. Assim como `map()`, `filter()` avalia (de forma preguiçosa) um iterável e retorna apenas os elementos onde a função aplicada retorna `True`.

Vamos continuar o nosso exemplo com uma lista de telefones e uma função para extrair o DDD:

In [73]:
lista_telefones = [
    19999571559,
    "(21) 2412-0107",
    "(34) 99762-1166",
    "91-4002-8282",
    "(19) 3542-1820",
    "(19) 3561-9525",
    "(34) 3333-5802",
]


def extrair_ddd(telefone):
    """
    Recebe um telefone e retorna seu DDD

    telefone (str or int): Telefone onde os dois primeiros digitios numéricos são o DDD
    """
    pattern = r"[0-9]{2}"
    return "".join(re.findall(pattern, str(telefone)))[:2]


In [74]:
list(map(extrair_ddd, lista_telefones))

['19', '21', '34', '91', '19', '19', '34']

In [75]:
lista_ddds = list(map(extrair_ddd, lista_telefones))
lista_telefone_ddd = list(zip(lista_ddds, lista_telefones))

In [80]:
lista_ddds

['19', '21', '34', '91', '19', '19', '34']

In [78]:
list(lista_telefone_ddd)

[('19', 19999571559),
 ('21', '(21) 2412-0107'),
 ('34', '(34) 99762-1166'),
 ('91', '91-4002-8282'),
 ('19', '(19) 3542-1820'),
 ('19', '(19) 3561-9525'),
 ('34', '(34) 3333-5802')]

In [79]:
lista_ddds = list(map(extrair_ddd, lista_telefones))
lista_telefone_ddd = zip(lista_ddds, lista_telefones)

lista_telefones_19 = []

for telefone_tuple in lista_telefone_ddd:
  if telefone_tuple[0] == '19':
    lista_telefones_19.append(telefone_tuple[1])

print(lista_telefones_19)

[19999571559, '(19) 3542-1820', '(19) 3561-9525']


In [83]:
1 == 'a'

False

In [89]:
filter(lambda x: extrair_ddd(x) == "19", lista_telefones)

<filter at 0x7fcd73245810>

In [90]:
map_19 = filter(lambda x: extrair_ddd(x) == "19", lista_telefones)
for i in map_19:
    print(i)


19999571559
(19) 3542-1820
(19) 3561-9525


In [91]:
lista_ddd_19 = list(
    filter(lambda x: extrair_ddd(x) == "19", lista_telefones)
)
print(lista_ddd_19)


[19999571559, '(19) 3542-1820', '(19) 3561-9525']


Both `map()` and `filter()` are similar to `list comprehensions` - the only difference is that they're *lazy evaluators*!

In [93]:
[extrair_ddd(telefone) for telefone in lista_telefones]

['19', '21', '34', '91', '19', '19', '34']

In [92]:
[telefone for telefone in lista_telefones if extrair_ddd(telefone) == "19"]

[19999571559, '(19) 3542-1820', '(19) 3561-9525']

In [96]:
list(map(extrair_ddd , filter(lambda x: extrair_ddd(x) == '19', lista_telefones)))

['19', '19', '19']

In [97]:
[extrair_ddd(telefone) for telefone in lista_telefones if extrair_ddd(telefone) == "19"]

['19', '19', '19']

## Agregando iteráveis com `reduce()`

The function `reduce()` implements an `accumulator`. Let's see how this works with the simple function `sum_two_elements(a, b)`:

```python
def sum_two_elements(a,b):
    return a+b
```

now, let's use `reduce()` to *reduce* our list through summing:

```python
reduce( sum_two_elements, [1,4,6,8] )
```

```python
a = 0 # accumulator
b = 1 # value
a + b = 1 # so the accumulator receives this cummulative sum

a = 1 # accumulator
b = 4 # value
a + b = 5
...
a = 5 # accumulator
b = 6 # value 
a + b = 11
...
a = 11 # accumulator
b = 8 # value
a + b = 19

return 19
```

In [98]:
from functools import reduce

### Example 1: Numbers

In [99]:
def somar_ab(a, b):
    print(f"a={a}, b={b}")
    return a + b


In [100]:
lista_numeros = [1, 4, 6, 8]
reduce(somar_ab, lista_numeros)


a=1, b=4
a=5, b=6
a=11, b=8


19

In [102]:
def comp_ab(x, y):
    if x > y:
        return x
    else:
        return y


results = reduce(comp_ab, [2, 10, 25, 1, -10, 13, 40, 20])
print(results)

40


### Example 2: Strings

In [109]:
lista_letras = ["P", "e", "d", "r", "o"]


In [110]:
reduce(lambda x, y: x + y, lista_letras)


'Pedro'

Let's use reduce to select the longest string in a list:

In [105]:
lista_nomes = ["Amapá", "Roraima", "Pará", "Piauí", "Maranhão"]

In [106]:
reduce(lambda x, y: x if len(x) > len(y) else y, lista_nomes)


'Maranhão'

In [None]:
!pip install langdetect

In [112]:
import langdetect
from functools import reduce

In [170]:
words = ['good morning', '早上好', 'доброго', 'おはようございます', 'everyone', '大家', 'каждый', 'みんな']

In [173]:
reduce(lambda x,y: x + ' ' + y , filter(lambda x: langdetect.detect(x)=='en', words))

'everyone'

In [150]:
list_eval = []
for i in range(1000):
  list_eval.append(langdetect.detect('morning'))

In [151]:
list_eval[0:10]

['en', 'en', 'en', 'en', 'en', 'en', 'en', 'en', 'en', 'en']

In [None]:
map(lambda x: langdetect.detect('morning'))

In [152]:
len(list(filter(lambda x: x == 'en', list_eval)))/len(list_eval)

1.0


### Example 3: Chaining Map, Filter & Reduce

In [174]:
list_tuples = [(12, 119), (-12, 43), (28, 39), (12, 21), (-14, 43)]

In [177]:
map_prod = map(lambda x: x[0] * x[1], list_tuples)
filt_neg = filter(lambda x: x > 0, map_prod)
list(filt_neg)

[1428, 1092, 252]

In [178]:
map_prod = map(lambda x: x[0] * x[1], list_tuples)
filt_neg = filter(lambda x: x > 0, map_prod)
smallest = reduce(lambda x, y: x if x < y else y, filt_neg)

print(smallest)

252


In [None]:
train_test_split(X, y, random_state = 42)