<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#MAP" data-toc-modified-id="MAP-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>MAP</a></span></li><li><span><a href="#Filter" data-toc-modified-id="Filter-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Filter</a></span></li><li><span><a href="#Reduce" data-toc-modified-id="Reduce-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Reduce</a></span></li><li><span><a href="#Lambda-Functions" data-toc-modified-id="Lambda-Functions-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Lambda Functions</a></span><ul class="toc-item"><li><span><a href="#Create-a-function-that-creates-functions" data-toc-modified-id="Create-a-function-that-creates-functions-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Create a function that creates functions</a></span></li><li><span><a href="#Another-case-for-lambdas" data-toc-modified-id="Another-case-for-lambdas-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Another case for lambdas</a></span></li><li><span><a href="#One-more-example" data-toc-modified-id="One-more-example-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>One more example</a></span></li></ul></li><li><span><a href="#Map,-Apply,-Applymap-+-lambdas" data-toc-modified-id="Map,-Apply,-Applymap-+-lambdas-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Map, Apply, Applymap + lambdas</a></span><ul class="toc-item"><li><span><a href="#Select-Action-movies" data-toc-modified-id="Select-Action-movies-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Select Action movies</a></span></li><li><span><a href="#I-dont-want-to-see-movies-that-are-action-fantasy-movies.-All-I-want-to-see-is-a-list-of-movies-that-are-just-action" data-toc-modified-id="I-dont-want-to-see-movies-that-are-action-fantasy-movies.-All-I-want-to-see-is-a-list-of-movies-that-are-just-action-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>I dont want to see movies that are action fantasy movies. All I want to see is a list of movies that are just action</a></span></li><li><span><a href="#Ok,-the-list-is-pretty-short.-Add-some-action-thrillers-and-action-adventure-movies" data-toc-modified-id="Ok,-the-list-is-pretty-short.-Add-some-action-thrillers-and-action-adventure-movies-5.3"><span class="toc-item-num">5.3&nbsp;&nbsp;</span>Ok, the list is pretty short. Add some action thrillers and action adventure movies</a></span></li><li><span><a href="#Calculate-the-average-number-of-words-in-description-and-name" data-toc-modified-id="Calculate-the-average-number-of-words-in-description-and-name-5.4"><span class="toc-item-num">5.4&nbsp;&nbsp;</span>Calculate the average number of words in description and name</a></span></li></ul></li></ul></div>

# MAP

The goal of using the `map()` function is to apply a function to an iterable. The `map()` function takes a function as an argument as well as an iterable and returns an iterable with the function applied to every element in the sequence. 

For example, let's create a function that checks evenness of a number.

In [10]:
def func(x):
    return x%2==0

In [11]:
l=[10,11,13,15,18]

In [12]:
[func(i) for i in l]

[True, False, False, False, True]

Instead of actually applying it to every number in the list, we can map the function to list.

In [5]:
map(func, l)

<map at 0x244abc07080>

Function returns a `map` object, which is iterable. Therefore, we can convert it to any collection.

In [13]:
list(map(func, l))

[True, False, False, False, True]

Similarly, Pandas developped their own prototype for `map()` function. 

`map()` is a method of `Series` class, therefore **can't be** applied to a `DataFrame`, but its columns.

In [14]:
import numpy as np
import pandas as pd

In [15]:
df=pd.DataFrame([[1,2,3],[2,3,np.nan]])

In [16]:
df

Unnamed: 0,0,1,2
0,1,2,3.0
1,2,3,


In [20]:
#function takes a number and converts it into a float
def ff(x):
    return float(x)

In [21]:
df[0].map(ff)

0    1.0
1    2.0
Name: 0, dtype: float64

Another great benefit of `map()` method is that we can map a dictionary. For every value in the Series `map()` matches a key in the dictionary and returns the value instead.

Example:

In [22]:
dct={1:'apple',2:'plum'}

df[0].map(dct)

0    apple
1     plum
Name: 0, dtype: object

If the value in the Series is not met, `map()` will return `np.nan` value.

In [23]:
df[1].map(dct)

0    plum
1     NaN
Name: 1, dtype: object

# Filter

`Filter()` function maps a function to every element of an iterable and returns the values for which the function returned `True`.

Using function `ff` defined earlier we can obtain the following:

In [24]:
#map the function
list(map(func, l))

[True, False, False, False, True]

In [25]:
#use filter instead
list(filter(func, l))

[10, 18]

As you can see, only even numbers were returned.

Pandas also have `filter()` method. Using this method you can filter columns and rows. I wouldn't reccommend it, since we have boolean filters and `loc/iloc`, but here you have some examples.

In [29]:
df.index=['mouse','cat']

In [30]:
df

Unnamed: 0,0,1,2
mouse,1,2,3.0
cat,2,3,


In [31]:
#axis 0 defines index. 
df.filter(like='ouse',axis=0)

Unnamed: 0,0,1,2
mouse,1,2,3.0


In [35]:
#same with loc + str.contains
df.loc[df.index.str.contains('ouse')]

Unnamed: 0,0,1,2
mouse,1,2,3.0


In [32]:
df.filter(like='a',axis=0)

Unnamed: 0,0,1,2
cat,2,3,


In [33]:
#axis 1 stands for columns, items selects by value
df.filter(items=[0,2],axis=1)

Unnamed: 0,0,2
mouse,1,3.0
cat,2,


In [34]:
#same with loc
df.loc[:,[0,2]]

Unnamed: 0,0,2
mouse,1,3.0
cat,2,


# Reduce

`reduce()` is a great optimization tool that doesn't have lots of applications in data world.

<img src='https://www.python-course.eu/images/reduce_diagram.png'>

Image above explains the concept in the best way. `reduce()` applies function to an iterable and returns a single value - cumulative output.

In [36]:
#summator of 2 numbers
def func(a,b):
    return a+b

In [37]:
from functools import reduce

In [38]:
#sum of all numbers
reduce(func,l)

67

In [39]:
sum(l)

67

In [40]:
#select the biggest among 2 numbers

def func(a,b):
    return a if a>b else b

In [41]:
#test
func(21,16)

21

In [42]:
#max number among l
reduce(func,l)

18

In [43]:
max(l)

18

In [56]:
#a bit pointless function, but why not

def func(a,b):
    if (b%2==0) or ((a+b)>25): #if second number is even or sum of 2 numbers is greater than 25 -> first
        return a
    else:
        return b #->else second

In [57]:
l

[10, 11, 13, 15, 18]

In [58]:
reduce(func,l)

13

# Lambda Functions

Lambda functions are so-called anonymous functions. This means that lambda functions don't have a name. Lambda functions are typically short expressions. They can take multiple arguments but unlike simple functions, they can have only one expression.

For the most part, any lambda expression can also be written as a function (but not vice versa). 
```python
lambda arguments: expression
```

In [59]:
# regular function 
def squared(x):
    return x**2

In [60]:
#apply function to each element
list(map(squared,l))

[100, 121, 169, 225, 324]

In [61]:
#define a lambda function with a name
squared=lambda x: x**2

In [62]:
#apply lambda function
list(map(squared,l))

[100, 121, 169, 225, 324]

In [63]:
#apply lambda function without saving it
list(map(lambda x: x**2,l))

[100, 121, 169, 225, 324]

In [64]:
#apply lambda function to a number
(lambda x: x**2)(11)

121

Here we define a lambda function and by taking it into parenthesis we pack it. Then we call the function and pass an argument $11$ in this case.

In [65]:
#which could be done easier here:
squared(11)

121

In [67]:
# define a regular division function
def div(x,y):
    if y!=0:
        return x/y
    else:
        raise ValueError ('Do not divide by 0 you stupid idiot!')

In [69]:
#test 1
div(4,2)

2.0

In [70]:
#test 2
div(4,0)

ValueError: Do not divide by 0 you stupid idiot!

In [77]:
#define similar function in the same way as generators
def div2(x,y):
    return x/y if y!=0 else 0

In [78]:
#test 1
div2(4,2)

2.0

In [79]:
#test2
div2(4,0)

0

In [80]:
#lambda function
div3=lambda x,y: x/y if y!=0 else 0

In [81]:
div3(4,2)

2.0

In [82]:
div3(4,0)

0

## Create a function that creates functions

Very dumb task, but in very rare cases it can be handy.

In [83]:
def generate_range(lower):
    return lambda upper: range(lower,upper)

In the cell above we generate a function that returns a lambda function. Main function has only 1 argument and second argument will be passed into lambda function.

In [84]:
#func1 is a lambda function that will generate ranges starting from 1
func1=generate_range(1)

In [88]:
#range from 1 to 100
func1(100)

range(1, 100)

In [86]:
#func11 is a lambda function that will generate ranges starting from 11
func11=generate_range(11)

In [87]:
#range from 11 to 100
func11(100)

range(11, 100)

## Another case for lambdas

We can generate complex functions with lambda, therefore make lots of transformations using one-liners.

In [89]:
list_of_topics=['python-programming','mysql','web-scrapping','tableau','data-viz','statistics','machine-learning','neural-nets']

In [90]:
# list comprehension to do text transformations
[' '.join(i.split('-')).title() for i in list_of_topics]

['Python Programming',
 'Mysql',
 'Web Scrapping',
 'Tableau',
 'Data Viz',
 'Statistics',
 'Machine Learning',
 'Neural Nets']

In [91]:
#same but simplier
[i.replace('-',' ').title() for i in list_of_topics]

['Python Programming',
 'Mysql',
 'Web Scrapping',
 'Tableau',
 'Data Viz',
 'Statistics',
 'Machine Learning',
 'Neural Nets']

In [92]:
#same using lambda instead of list comprehension
new_list_of_topics=list(map(lambda x: x.replace('-',' ').title(), list_of_topics))
new_list_of_topics

['Python Programming',
 'Mysql',
 'Web Scrapping',
 'Tableau',
 'Data Viz',
 'Statistics',
 'Machine Learning',
 'Neural Nets']

## One more example

We can sort dictionary by value using lambda function.

In [93]:
#dictionary with length of each topic
dct_of_topics={i:len(i) for i in new_list_of_topics}
dct_of_topics

{'Python Programming': 18,
 'Mysql': 5,
 'Web Scrapping': 13,
 'Tableau': 7,
 'Data Viz': 8,
 'Statistics': 10,
 'Machine Learning': 16,
 'Neural Nets': 11}

In [94]:
#approach we learnt previously
from operator import itemgetter
dict(sorted(dct_of_topics.items(), key=itemgetter(1)))

{'Mysql': 5,
 'Tableau': 7,
 'Data Viz': 8,
 'Statistics': 10,
 'Neural Nets': 11,
 'Web Scrapping': 13,
 'Machine Learning': 16,
 'Python Programming': 18}

In [95]:
#lambda approach
(sorted(dct_of_topics.items(),key=lambda x: x[1]))

[('Mysql', 5),
 ('Tableau', 7),
 ('Data Viz', 8),
 ('Statistics', 10),
 ('Neural Nets', 11),
 ('Web Scrapping', 13),
 ('Machine Learning', 16),
 ('Python Programming', 18)]

Here with lambda function we select the first element of tuple (value in this case) as `key` for `sorted()` function.

# Map, Apply, Applymap + lambdas

Pandas has a beautiful trio:
* Map - map a function/dictionary to every value of `Series`
* Apply - apply a function to every value of `Series` / every Series of `DataFrame`
* Applymap - apply a function to every value of `DataFrame`

Let's see how they work.

In [96]:
df=pd.read_csv('https://ironhack.school/asset-v1:IRONHACK+DAFT+201910_PAR+type@asset+block@IMDB-Movie-Data.csv')

In [97]:
df.head()

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
0,1,Guardians of the Galaxy,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
1,2,Prometheus,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
2,3,Split,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0
3,4,Sing,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,59.0
4,5,Suicide Squad,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727,325.02,40.0


Let's consider some basic functions first.

In [98]:
df.Rating.max()

9.0

In [99]:
df.Rating.min()

1.9

What if I want to rescale my `Rating` column to scale up to 5?

In [101]:
#since originally it is up to 10, division by half is more than enough
display((df.Rating/2).head())

#or use map
df.Rating.map(lambda x: x/2).head()

0    4.05
1    3.50
2    3.65
3    3.60
4    3.10
Name: Rating, dtype: float64

0    4.05
1    3.50
2    3.65
3    3.60
4    3.10
Name: Rating, dtype: float64

Calculate a range of Rating column ($max-min$)

In [102]:
#takes Series as argument
def range_numbers(x):
    return x.max()-x.min()

Since the function above takes only Series as argument, I need to apply it to a DataFrame

In [105]:
df[['Rating']].apply(range_numbers)

Rating    7.1
dtype: float64

Same can be obtained in 3 steps - calculate max, min and substract

In [108]:
df[['Rating']].max()-df[['Rating']].min()

Rating    7.1
dtype: float64

Let's take only numeric columns now.

In [112]:
df_n=df._get_numeric_data().copy()

In [113]:
def half(x):
    return x/2

In [114]:
def missing_col(x):
    return x.isna().sum()

We can apply function to every Series in DataFrame

In [118]:
display(df_n.apply(half).head())

df_n.apply(lambda x: x/2).head()

Unnamed: 0,Rank,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
0,0.5,1007.0,60.5,4.05,378537.0,166.565,38.0
1,1.0,1006.0,62.0,3.5,242910.0,63.23,32.5
2,1.5,1008.0,58.5,3.65,78803.0,69.06,31.0
3,2.0,1008.0,54.0,3.6,30272.5,135.16,29.5
4,2.5,1008.0,61.5,3.1,196863.5,162.51,20.0


Unnamed: 0,Rank,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
0,0.5,1007.0,60.5,4.05,378537.0,166.565,38.0
1,1.0,1006.0,62.0,3.5,242910.0,63.23,32.5
2,1.5,1008.0,58.5,3.65,78803.0,69.06,31.0
3,2.0,1008.0,54.0,3.6,30272.5,135.16,29.5
4,2.5,1008.0,61.5,3.1,196863.5,162.51,20.0


Apply the function that calculates number of missing values in the Series. 

In [119]:
df_n.apply(missing_col)

Rank                    0
Year                    0
Runtime (Minutes)       0
Rating                  0
Votes                   0
Revenue (Millions)    128
Metascore              64
dtype: int64

Built-in functions could be also applied. But missing numbers won't be helpful.

In [122]:
df_n.apply(sum)

Rank                     500500.0
Year                    2012783.0
Runtime (Minutes)        113172.0
Rating                     6723.2
Votes                 169808255.0
Revenue (Millions)            NaN
Metascore                     NaN
dtype: float64

If you want to apply function to every row of DataFrame, you can change the value of `axis` attribute.

In [123]:
df_n.apply(sum, axis=1)

0      759627.23
1      488156.46
2      159949.42
3       63009.52
4      396242.22
         ...    
995          NaN
996     76319.04
997     73917.21
998          NaN
999     15573.94
Length: 1000, dtype: float64

Now let's go for more complex analysis.

## Select Action movies

In [125]:
df.loc[df.Genre.str.contains('Action')].head()

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
0,1,Guardians of the Galaxy,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
4,5,Suicide Squad,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727,325.02,40.0
5,6,The Great Wall,"Action,Adventure,Fantasy",European mercenaries searching for black powde...,Yimou Zhang,"Matt Damon, Tian Jing, Willem Dafoe, Andy Lau",2016,103,6.1,56036,45.13,42.0
8,9,The Lost City of Z,"Action,Adventure,Biography","A true-life drama, centering on British explor...",James Gray,"Charlie Hunnam, Robert Pattinson, Sienna Mille...",2016,141,7.1,7188,8.01,78.0
12,13,Rogue One,"Action,Adventure,Sci-Fi",The Rebel Alliance makes a risky move to steal...,Gareth Edwards,"Felicity Jones, Diego Luna, Alan Tudyk, Donnie...",2016,133,7.9,323118,532.17,65.0


## I dont want to see movies that are action fantasy movies. All I want to see is a list of movies that are just action

In [126]:
df[(df.Genre=='Action')|(df.Genre=='Thriller')].head()

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
123,124,Boyka: Undisputed IV,Action,In the fourth installment of the fighting fran...,Todor Chapkanov,"Scott Adkins, Teodora Duhovnikova, Alon Aboutb...",2016,86,7.4,10428,,
282,283,Death Proof,Thriller,Two separate sets of voluptuous women are stal...,Quentin Tarantino,"Kurt Russell, Zoë Bell, Rosario Dawson, Vaness...",2007,113,7.1,220236,,
289,290,Iris,Thriller,"Iris, young wife of a businessman, disappears ...",Jalil Lespert,"Romain Duris, Charlotte Le Bon, Jalil Lespert,...",2016,99,6.1,726,,
444,445,The Thinning,Thriller,"""The Thinning"" takes place in a post-apocalypt...",Michael J. Gallagher,"Logan Paul, Peyton List, Lia Marie Johnson,Cal...",2016,81,6.0,4531,,31.0
580,581,Kickboxer: Vengeance,Action,A kick boxer is out to avenge his brother.,John Stockwell,"Dave Bautista, Alain Moussi, Gina Carano, Jean...",2016,90,4.9,6809,131.56,37.0


## Ok, the list is pretty short. Add some action thrillers and action adventure movies

List of accepted values:
1. Action
2. Action/Thriller
3. Action/Adventure

In [134]:
# Without lambda
df[(df.Genre=='Action')|((df.Genre.str.split(',').map(len)==2)&(df.Genre.str.contains('Action'))&(df.Genre.str.split(',').apply(set).map(lambda x: (x-set(['Action','Thriller','Adventure']))==set())))];

In [135]:
# Lambda
df[df.Genre.str.split(',').map(set).apply(lambda x:
                                          (x=={'Action'}) |
                                          ((len(x)==2)& 
                                           ('Action' in x) &
                                           ((x-set(['Action','Thriller','Adventure']))==set())))];

In [136]:
#Step by step conditions
df.Genre=='Action';                         #movie has only 1 genre which is Action
df.Genre.str.split(',').map(len)==2;        #movie has only 2 genres
df.Genre.str.contains('Action');            #movie has Action as genre
df.Genre.str.split(',').apply(set).map(lambda x: (x-set(['Action','Thriller','Adventure']))==set()); #Genre of the movie is one of the combinations between Action, Thriller and Adveture


## Calculate the average number of words in description and name

And the final `applymap()` method.

In [137]:
df_d=df[['Description','Director']]

In [138]:
df_d

Unnamed: 0,Description,Director
0,A group of intergalactic criminals are forced ...,James Gunn
1,"Following clues to the origin of mankind, a te...",Ridley Scott
2,Three girls are kidnapped by a man with a diag...,M. Night Shyamalan
3,"In a city of humanoid animals, a hustling thea...",Christophe Lourdelet
4,A secret government agency recruits some of th...,David Ayer
...,...,...
995,"A tight-knit team of rising investigators, alo...",Billy Ray
996,Three American college students studying abroa...,Eli Roth
997,Romantic sparks occur between two dance studen...,Jon M. Chu
998,A pair of friends embark on a mission to reuni...,Scot Armstrong


Workflow:

In every cell:
1. Get words
2. Count them


In [139]:
print(df_d.Description.str.split().map(len).mean())
print(df_d.Director.str.split().map(len).mean())

27.921
2.092


In [140]:
df_d.applymap(lambda x:len(x.split())).mean()

Description    27.921
Director        2.092
dtype: float64