# Intro to map/reduce

## Lambda function

Python allows to create anonymous function, called lambda.
Lambda function doesn't include return statement and doesn't have name.

In [1]:
# Standard function
def power(base, exponent):
    return base**exponent

print(power(2,3))

# Lambda construction
p = lambda b,e: b**e
    
print(p(2,3))

8
8



## filter function

In [2]:
# We use list comprehension to create list with 100 numbers
numbers = [x for x in range(30)]
print(numbers)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]


The filter resembles a for loop but it is a builtin function and faster.
Filter always return an iterator!

In [5]:
# Simple function that return true when the number is divided by 5
def is_divided_by_5(number):
    if number % 5 == 0:
        return True
    else:
        return False

numbers_div5_iterator = filter(is_divided_by_5, numbers)

# filter return iterator, so we need use for loop to get all elements
for n in numbers_div5_iterator:
    print(n)

0
5
10
15
20
25


In [6]:
# Similar code but with lambda construction
numbers_div5_iterator = filter(lambda x: x % 5 == 0, numbers)

for n in numbers_div5_iterator:
    print(n)

0
5
10
15
20
25


In [7]:
numbers_div5_iterator = filter(lambda x: x % 5 == 0, numbers)

# Simple way to create list from iterator
numbers_div5_list = list(numbers_div5_iterator)
print(numbers_div5_list)

# HINT: iterator can be used only once
#       Python's iterator protocol is very simple, and only provides 
#       one single method (.next() or __next__()), 
#       and no method to reset an iterator in general.

numbers_div5_list = list(numbers_div5_iterator)
print(numbers_div5_list)

[0, 5, 10, 15, 20, 25]
[]


In [9]:
# filter can be used with different type of lists

names = ['Anne', 'Amy', 'Cob', 'David', 'Carrie', 'Darbara']
names_start_with_b = list(filter(lambda s: s.startswith('B'), names))
print(names_start_with_b)

[]


## map function

Blueprint:   map(function_to_apply, list_of_inputs)

In [10]:
list_1 = [x for x in range(1,6)]
list_2 = [x for x in range(6,11)]

print('List1: ', list_1)
print('List2: ', list_2)
list_result = list(map(lambda x,y:x+y, list_1,list_2))
print('List result: ', list_result)

List1:  [1, 2, 3, 4, 5]
List2:  [6, 7, 8, 9, 10]
List result:  [7, 9, 11, 13, 15]


## reduce function

Blueprint:   map(function_to_apply, list_of_inputs)
             list_of_inputs = [el_1, el_2, el_3]

The function is used to apply a function to all of the list elements. 
1. At the beginning the first the first two elements of list is applied to the function
2. In the next step functiom is applied on the previous result and the third element of the list: function(function(el_1, el_2),el_3)

In [11]:
from functools import reduce

def add(x,y):
    return x + y

list_1 = [x for x in range(1,6)]

print('List1: ', list_1)
print('List1 reduced: ', reduce(add, list_1))

# The same example using lambda func
print('List1 reduced: ', reduce(lambda x,y: x+y, list_1))

List1:  [1, 2, 3, 4, 5]
List1 reduced:  15
List1 reduced:  15


## TODO

In [13]:
xx = ['Snappy', 'Kitty', 'Jessie', 'Chester']
#xx = [1,2,3]

# TODO Create a list with the number of character of each word. Use map & len function
no_of_char = list(map(lambda x: len(x), xx))
print(no_of_char)


[6, 5, 6, 7]


In [53]:
sentences = "Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec odio. \
Quisque volutpat mattis eros. Nullam malesuada erat ut turpis. Suspendisse urna nibh, \
viverra non, semper suscipit, posuere a, pede. \
Donec nec justo eget felis facilisis fermentum. Aliquam porttitor mauris sit amet orci. \
Aenean dignissim pellentesque felis."

import string
sentences = list(filter(None, sentences.split('.')))
print(sentences)
sentences = list(map(lambda x: (x.strip()).replace(',', ''), sentences))
print(sentences)
sentences = list(map(lambda x: len(x.split()), sentences))
print(sentences)

# TODO Find the number of words in the sentence:

## Hint:
# 1. remove punctuations
# 2. split the resulting sentence
# 3. map "1" to each word of sentence
# 4. reduce to find the number of words in the sentence

['Lorem ipsum dolor sit amet, consectetuer adipiscing elit', ' Donec odio', ' Quisque volutpat mattis eros', ' Nullam malesuada erat ut turpis', ' Suspendisse urna nibh, viverra non, semper suscipit, posuere a, pede', ' Donec nec justo eget felis facilisis fermentum', ' Aliquam porttitor mauris sit amet orci', ' Aenean dignissim pellentesque felis']
['Lorem ipsum dolor sit amet consectetuer adipiscing elit', 'Donec odio', 'Quisque volutpat mattis eros', 'Nullam malesuada erat ut turpis', 'Suspendisse urna nibh viverra non semper suscipit posuere a pede', 'Donec nec justo eget felis facilisis fermentum', 'Aliquam porttitor mauris sit amet orci', 'Aenean dignissim pellentesque felis']
[8, 2, 4, 5, 10, 7, 6, 4]


In [64]:
# Log:  Date product no_of_items price

log_1 = """Apr-04 cola 1 5
Dec-15 cola 2 4
Feb-02 Sandwith 3 22
Mar-03 burger 8 11
Feb-22 Sandwith 3 22
Feb-23 burger 5 15
Mar-08 burger 2 14"""    ## Add more examples

print(log_1)

# TODO Find the best-selling item
# TODO Create sales summary  [(product, total_items, average_price), (product, total_items, average_price) ...] 
log_2 = log_1.split('\n')
print(log_2)

def split_remove_first(string_list):
    temp = string_list.split()
    temp[0] = None
    return temp

log_3 = list(filter(split_remove_first, log_2))
print(log_3)

Apr-04 cola 1 5
Dec-15 cola 2 4
Feb-02 Sandwith 3 22
Mar-03 burger 8 11
Feb-22 Sandwith 3 22
Feb-23 burger 5 15
Mar-08 burger 2 14
['Apr-04 cola 1 5', 'Dec-15 cola 2 4', 'Feb-02 Sandwith 3 22', 'Mar-03 burger 8 11', 'Feb-22 Sandwith 3 22', 'Feb-23 burger 5 15', 'Mar-08 burger 2 14']
['Apr-04 cola 1 5', 'Dec-15 cola 2 4', 'Feb-02 Sandwith 3 22', 'Mar-03 burger 8 11', 'Feb-22 Sandwith 3 22', 'Feb-23 burger 5 15', 'Mar-08 burger 2 14']


## Miniproject

1. Import book, clean the text and get the total number of words
https://www.gutenberg.org/files/11/11-0.txt

2. Try to run your script with text that include all TOP100 books from https://www.gutenberg.org/browse/scores/top

3. What problems could appear during processing? Create a script to measure the execution/processing time. 
