![rmotr](https://user-images.githubusercontent.com/7065401/39119486-4718e386-46ec-11e8-9fc3-5250a49ef570.png)
<hr style="margin-bottom: 40px;">

<img src="https://user-images.githubusercontent.com/7065401/39119910-5f70eaa4-46ed-11e8-8236-b68568c39971.jpg"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# `apply`, `applymap` and `map`

![separator2](https://user-images.githubusercontent.com/7065401/39119518-59fa51ce-46ec-11e8-8503-5f8136558f2b.png)

## Hands on! 

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [29]:
#Map refresher
names = ["tom", 'Jane', 'Rodgeer']

In [30]:
list(map(len, names))

[3, 4, 7]

In [2]:
pd.options.display.float_format = '{:,.2f}'.format

In [3]:
players = pd.DataFrame({
    'salary': [
        33285709,
        31269231,
        34682550,
        25000000,
        17826150,
        29512900,
        28530608,
        26243760,
        18868625,
        2500000
    ],
    'season_start': [
        2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017
    ],
    'season_end': [2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018],
    'team': [
        'CLE',
        'DEN',
        'GSW',
        'GSW',
        'GSW',
        'LAC',
        'OKC',
        'OKC',
        'SAS',
        'SAS'
    ],
    'Pos': [
        'SF', 'PF', 'PG', np.nan, 'SG', 'PF', 'PG', 'SF', 'SF', 'SG'
    ],
    'Age': [32.0, 31.0, 28.0, 28.0, 26.0, np.nan, 28.0, 32.0, 25.0, 39.0]
}, index=[
    'LeBron James',
    'Paul Millsap',
    'Stephen Curry',
    'Kevin Durant',
    'Klay Thompson',
    'Blake Griffin',
    'Russell Westbrook',
    'Carmelo Anthony',
    'Kawhi Leonard',
    'Manu Ginobili'
])

In [4]:
players

Unnamed: 0,Age,Pos,salary,season_end,season_start,team
LeBron James,32.0,SF,33285709,2018,2017,CLE
Paul Millsap,31.0,PF,31269231,2018,2017,DEN
Stephen Curry,28.0,PG,34682550,2018,2017,GSW
Kevin Durant,28.0,,25000000,2018,2017,GSW
Klay Thompson,26.0,SG,17826150,2018,2017,GSW
Blake Griffin,,PF,29512900,2018,2017,LAC
Russell Westbrook,28.0,PG,28530608,2018,2017,OKC
Carmelo Anthony,32.0,SF,26243760,2018,2017,OKC
Kawhi Leonard,25.0,SF,18868625,2018,2017,SAS
Manu Ginobili,39.0,SG,2500000,2018,2017,SAS


| Function | Datastructure   |    Applied to   |
|----------|-----------------|-----------------|
| `map`| `Series` | Each value |
| `apply` | `Series` | All values |

![separator1](https://user-images.githubusercontent.com/7065401/39119545-6d73d9aa-46ec-11e8-98d3-40204614f000.png)

## Series

Most important Series methos are `map` and `apply`. DataFrames also have an `apply` method, which makes it confusing. For now, we'll focus ONLY in `Series`.

### [`map`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.map.html)

`map` is a method from **Series**, and will let you _map_ from the series' values, to new values:

In [5]:
players['Pos'].unique()

array(['SF', 'PF', 'PG', nan, 'SG'], dtype=object)

In [6]:
players['Pos'].map({
    'PG': 'Point Guard',
    'SG': 'Shooting Guard',
    'SF': 'Small Forward',
    'PF': 'Power Forward',
})

LeBron James          Small Forward
Paul Millsap          Power Forward
Stephen Curry           Point Guard
Kevin Durant                    NaN
Klay Thompson        Shooting Guard
Blake Griffin         Power Forward
Russell Westbrook       Point Guard
Carmelo Anthony       Small Forward
Kawhi Leonard         Small Forward
Manu Ginobili        Shooting Guard
Name: Pos, dtype: object

In [7]:
players['Pos'].map('Position: {}'.format)

LeBron James          Position: SF
Paul Millsap          Position: PF
Stephen Curry         Position: PG
Kevin Durant         Position: nan
Klay Thompson         Position: SG
Blake Griffin         Position: PF
Russell Westbrook     Position: PG
Carmelo Anthony       Position: SF
Kawhi Leonard         Position: SF
Manu Ginobili         Position: SG
Name: Pos, dtype: object

It takes an optional `na_action` parameter that specify what to do with `nan` values:

In [8]:
players['Pos'].map('Position: {}'.format, na_action='ignore')

LeBron James         Position: SF
Paul Millsap         Position: PF
Stephen Curry        Position: PG
Kevin Durant                  NaN
Klay Thompson        Position: SG
Blake Griffin        Position: PF
Russell Westbrook    Position: PG
Carmelo Anthony      Position: SF
Kawhi Leonard        Position: SF
Manu Ginobili        Position: SG
Name: Pos, dtype: object

![separator1](https://user-images.githubusercontent.com/7065401/39119545-6d73d9aa-46ec-11e8-98d3-40204614f000.png)

### [`apply`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html)

In a Series, `apply` _applies_ a custom function to each element and return a new Series.

For example, apply the function `age_to_days` to each player's age:

In [9]:
# player['Age'] * 365
# same as:
players['Age'].apply(lambda age: age * 365)
# use apply when there isn't a vectorized operoation
# could use apply to send email

LeBron James        11,680.00
Paul Millsap        11,315.00
Stephen Curry       10,220.00
Kevin Durant        10,220.00
Klay Thompson        9,490.00
Blake Griffin             nan
Russell Westbrook   10,220.00
Carmelo Anthony     11,680.00
Kawhi Leonard        9,125.00
Manu Ginobili       14,235.00
Name: Age, dtype: float64

Sometimes (and for Series specially) the functionalities of `map` and `apply` overlap. When you have a custom function, favor `apply`, when you have a 1-on-1 mapping (like the dict one), use `map`.

In [10]:
players['salary'].apply('{:,.2f}'.format)

LeBron James         33,285,709.00
Paul Millsap         31,269,231.00
Stephen Curry        34,682,550.00
Kevin Durant         25,000,000.00
Klay Thompson        17,826,150.00
Blake Griffin        29,512,900.00
Russell Westbrook    28,530,608.00
Carmelo Anthony      26,243,760.00
Kawhi Leonard        18,868,625.00
Manu Ginobili         2,500,000.00
Name: salary, dtype: object

In [11]:
players['salary'].map('{:,.2f}'.format)

LeBron James         33,285,709.00
Paul Millsap         31,269,231.00
Stephen Curry        34,682,550.00
Kevin Durant         25,000,000.00
Klay Thompson        17,826,150.00
Blake Griffin        29,512,900.00
Russell Westbrook    28,530,608.00
Carmelo Anthony      26,243,760.00
Kawhi Leonard        18,868,625.00
Manu Ginobili         2,500,000.00
Name: salary, dtype: object

`apply` let's you specify other parameters (arguments and keyword arguments) to pass to the function:

In [33]:
def pretty_formate_salary(salary, precision):
    return '{salary:,.{prec}f}'.format(salary=salary, prec=precision)

In [12]:
players['salary'].apply(lambda salary, precision: '{salary:,.{prec}f}'.format(
    salary=salary, prec=precision), args=(3, ))

LeBron James         33,285,709.000
Paul Millsap         31,269,231.000
Stephen Curry        34,682,550.000
Kevin Durant         25,000,000.000
Klay Thompson        17,826,150.000
Blake Griffin        29,512,900.000
Russell Westbrook    28,530,608.000
Carmelo Anthony      26,243,760.000
Kawhi Leonard        18,868,625.000
Manu Ginobili         2,500,000.000
Name: salary, dtype: object

In [13]:
players['salary'].apply(lambda salary, precision: '{salary:,.{prec}f}'.format(
    salary=salary, prec=precision), precision=3)

LeBron James         33,285,709.000
Paul Millsap         31,269,231.000
Stephen Curry        34,682,550.000
Kevin Durant         25,000,000.000
Klay Thompson        17,826,150.000
Blake Griffin        29,512,900.000
Russell Westbrook    28,530,608.000
Carmelo Anthony      26,243,760.000
Kawhi Leonard        18,868,625.000
Manu Ginobili         2,500,000.000
Name: salary, dtype: object

![separator1](https://user-images.githubusercontent.com/7065401/39119545-6d73d9aa-46ec-11e8-98d3-40204614f000.png)

### Indexes

Indexes are special, they're not as versatile as Series or DataFrames, but you can still _apply_ functions. `Index` doesn't have the `apply` method, it on

In [14]:
players.index.map(len)

Int64Index([12, 12, 13, 12, 13, 13, 17, 15, 13, 13], dtype='int64')

The `apply` method is not defined, if you absolutely need `apply`, you need to reset the index first:

In [15]:
players.reset_index()['index'].apply(len)

0    12
1    12
2    13
3    12
4    13
5    13
6    17
7    15
8    13
9    13
Name: index, dtype: int64

Most of these common operations are already provided in base String functions provided by pandas:

In [16]:
players.index.str.len()

Int64Index([12, 12, 13, 12, 13, 13, 17, 15, 13, 13], dtype='int64')

![separator1](https://user-images.githubusercontent.com/7065401/39119545-6d73d9aa-46ec-11e8-98d3-40204614f000.png)

## DataFrames

DataFrames most important methods are `apply` and `applymap`. `applymap` is similar to Series' `apply`: it performs an operation element-wise ("value per value").

### `DataFrame.applymap`

In [17]:
players[['Age', 'salary']].applymap(lambda x: '{:,.2f}'.format(x))

Unnamed: 0,Age,salary
LeBron James,32.0,33285709.0
Paul Millsap,31.0,31269231.0
Stephen Curry,28.0,34682550.0
Kevin Durant,28.0,25000000.0
Klay Thompson,26.0,17826150.0
Blake Griffin,,29512900.0
Russell Westbrook,28.0,28530608.0
Carmelo Anthony,32.0,26243760.0
Kawhi Leonard,25.0,18868625.0
Manu Ginobili,39.0,2500000.0


Again, you're applying your function to **each element** individually.

![separator1](https://user-images.githubusercontent.com/7065401/39119545-6d73d9aa-46ec-11e8-98d3-40204614f000.png)

### `DataFrame.apply`
Probably the most interesting method of a DataFrame is `apply`, as it works on a per-row or per-column basis. The default behavior is "per column":

In [18]:
def range_per_column(a_column):
    return a_column.max() - a_column.min()

In [19]:
players[['Age', 'salary']].apply(range_per_column)

Age              14.00
salary   32,182,550.00
dtype: float64

And as you can see, the `DataFrame` has been "pivoted", the columns `Age` and `salary` are now the indexes of the resulting `Series`.

Finally, using `apply` per row is really useful too, because your custom function receives an entire row, and you can operate on all those row values:

In [20]:
players

Unnamed: 0,Age,Pos,salary,season_end,season_start,team
LeBron James,32.0,SF,33285709,2018,2017,CLE
Paul Millsap,31.0,PF,31269231,2018,2017,DEN
Stephen Curry,28.0,PG,34682550,2018,2017,GSW
Kevin Durant,28.0,,25000000,2018,2017,GSW
Klay Thompson,26.0,SG,17826150,2018,2017,GSW
Blake Griffin,,PF,29512900,2018,2017,LAC
Russell Westbrook,28.0,PG,28530608,2018,2017,OKC
Carmelo Anthony,32.0,SF,26243760,2018,2017,OKC
Kawhi Leonard,25.0,SF,18868625,2018,2017,SAS
Manu Ginobili,39.0,SG,2500000,2018,2017,SAS


In [21]:
def salary_per_year_of_age(a_row):
    return a_row['salary'] / a_row['Age']

In [22]:
players.apply(salary_per_year_of_age, axis=1)

LeBron James        1,040,178.41
Paul Millsap        1,008,684.87
Stephen Curry       1,238,662.50
Kevin Durant          892,857.14
Klay Thompson         685,621.15
Blake Griffin                nan
Russell Westbrook   1,018,950.29
Carmelo Anthony       820,117.50
Kawhi Leonard         754,745.00
Manu Ginobili          64,102.56
dtype: float64

`DataFrame.apply` also takes possible extra arguments:

In [23]:
def salary_per_age_period(a_row, period=1):
    return a_row['salary'] / (a_row['Age'] * period)

In [24]:
players.apply(salary_per_age_period, axis=1, period=1)  # per year of age

LeBron James        1,040,178.41
Paul Millsap        1,008,684.87
Stephen Curry       1,238,662.50
Kevin Durant          892,857.14
Klay Thompson         685,621.15
Blake Griffin                nan
Russell Westbrook   1,018,950.29
Carmelo Anthony       820,117.50
Kawhi Leonard         754,745.00
Manu Ginobili          64,102.56
dtype: float64

In [25]:
players.apply(salary_per_age_period, axis=1, period=12)  # per month of age

LeBron James         86,681.53
Paul Millsap         84,057.07
Stephen Curry       103,221.88
Kevin Durant         74,404.76
Klay Thompson        57,135.10
Blake Griffin              nan
Russell Westbrook    84,912.52
Carmelo Anthony      68,343.12
Kawhi Leonard        62,895.42
Manu Ginobili         5,341.88
dtype: float64

![separator2](https://user-images.githubusercontent.com/7065401/39119518-59fa51ce-46ec-11e8-8503-5f8136558f2b.png)

In [34]:
apple = pd.read_csv('https://gist.githubusercontent.com/Muskey88/65478802c87a542689d929a0b9ff3d48/raw/a98d3a543391f5565068a14e42af33a41562abe3/Apple.csv')

In [35]:
mcsf = pd.read_csv('https://gist.githubusercontent.com/Muskey88/65478802c87a542689d929a0b9ff3d48/raw/a98d3a543391f5565068a14e42af33a41562abe3/Microsoft.csv')

In [36]:
apple.head()

Unnamed: 0,timestamp,open,high,low,close,volume
0,2019-09-12,206.43,226.41,204.22,223.09,213475162
1,2019-08-30,213.9,218.03,192.58,208.74,681074600
2,2019-07-31,203.17,221.37,198.41,213.04,473957000
3,2019-06-28,175.6,201.57,170.27,197.92,515218700
4,2019-05-31,209.88,215.31,174.99,175.07,739456600


In [37]:
mcsf.head()

Unnamed: 0,timestamp,open,high,low,close,volume
0,2019-09-12,136.61,140.38,134.51,137.52,187949022
1,2019-08-30,137.0,140.94,130.78,137.86,584474300
2,2019-07-31,136.63,141.68,134.67,136.27,484547500
3,2019-06-28,123.85,138.4,119.01,133.96,508324300
4,2019-05-31,130.53,130.65,123.04,123.68,547218800


In [None]:
apple.set_index('timestamp')

In [39]:
pd.concat([apple.head, mcsf.head()], keys['Apple', 'Microsoft'])

NameError: name 'keys' is not defined