In Data Processing, it is often necessary to perform operations (such as statistical calculations, splitting, or substituting value) on a certain row or column to obtain new data. Writing a for-loop to iterate through Pandas DataFrame and Series will do the job, but that doesn’t seem like a good idea. The for-loop tends to have more lines of code, less code readability, and slower performance.
Fortunately, there are already great methods that are built into Pandas to help you accomplish the goals! In this article, we will see how to perform operations using apply() and applymap(), and how to substitute value using map().
First of all, you should be aware that DataFrame and Series will have some or all of these three methods, as follows:

![title](map.png)

### apply() is used to apply a function along an axis of the DataFrame or on values of Series.
### applymap() is used to apply a function to a DataFrame elementwise.
### map() is used to substitute each value in a Series with another value.

In [1]:
import pandas as pd
df = pd.DataFrame({ 'A': [1,2,3,4], 
                   'B': [10,20,30,40],
                   'C': [20,40,60,80]
                  }, 
                  index=['Row 1', 'Row 2', 'Row 3', 'Row 4'])
df

Unnamed: 0,A,B,C
Row 1,1,10,20
Row 2,2,20,40
Row 3,3,30,60
Row 4,4,40,80


In [4]:
# The Pandas apply() is used to apply a function along an axis of the DataFrame or on values of Series.
# Let’s begin with a simple example, to sum each row and save the result to a new column “D”
# Let's call this "custom_sum" as "sum" is a built-in function
def custom_sum(row):
    return row.sum()
df['D'] = df.apply(custom_sum, axis=1)
df

Unnamed: 0,A,B,C,D
Row 1,1,10,20,62
Row 2,2,20,40,124
Row 3,3,30,60,186
Row 4,4,40,80,248


Do you really understand what just happened?
Let’s take a look df.apply(custom_sum, axis=1)
The first parameter custom_sum is a function.
The second parameter axis is to specify which axis the function is applied to. 0 for applying the function to each column and 1 for applying the function to each row.
Let me explain this process in a more intuitive way. The second parameter axis = 1 tells Pandas to use the row. So, the custom_sum is applied to each row and returns a new Series with the output of each row as value.

In [6]:
# With the understanding of the sum of each row, the sum of each column is just to use axis = 0 instead
df.loc['Row 5'] = df.apply(custom_sum, axis=0)
df

Unnamed: 0,A,B,C,D
Row 1,1,10,20,62
Row 2,2,20,40,124
Row 3,3,30,60,186
Row 4,4,40,80,248
Row 5,20,200,400,1240


In [7]:
def multiply_by_2(val):
    return val * 2
df['D'] = df['C'].apply(multiply_by_2)
df

Unnamed: 0,A,B,C,D
Row 1,1,10,20,40
Row 2,2,20,40,80
Row 3,3,30,60,120
Row 4,4,40,80,160
Row 5,20,200,400,800


In [8]:
# You can also use lambda expression with Pandas apply() function.
# The lambda equivalent for the sum of each row of a DataFrame:
df['D'] = df.apply(lambda x:x.sum(), axis=1)
df

Unnamed: 0,A,B,C,D
Row 1,1,10,20,71
Row 2,2,20,40,142
Row 3,3,30,60,213
Row 4,4,40,80,284
Row 5,20,200,400,1420


In [9]:
# The lambda equivalent for the sum of each column of a DataFrame:
df.loc['Row 5'] = df.apply(lambda x:x.sum(), axis=0)
df

Unnamed: 0,A,B,C,D
Row 1,1,10,20,71
Row 2,2,20,40,142
Row 3,3,30,60,213
Row 4,4,40,80,284
Row 5,30,300,600,2130


In [10]:
# And the lambda equivalent for multiply by 2 on a Series:
df['D'] = df['C'].apply(lambda x:x*2)

### With result_type parameter
result_type is a parameter in apply() set to 'expand', 'reduce', or 'broadcast' to get the desired type of result.
In the above scenario if result_type is set to 'broadcast' then the output will be a DataFrame substituted by the custom_sum value.

In [11]:
df.apply(custom_sum, axis=1, result_type='broadcast')
# The result is broadcasted to the original shape of the frame, the original index and columns are retained.

Unnamed: 0,A,B,C,D
Row 1,71,71,71,71
Row 2,142,142,142,142
Row 3,213,213,213,213
Row 4,284,284,284,284
Row 5,2130,2130,2130,2130


In [12]:
# To understand result_type as 'expand' and 'reduce', we will first create a function that returns a list.
def cal_multi_col(row):
    return [row['A'] * 2, row['B'] * 3]

df.apply(cal_multi_col, axis=1, result_type='expand')
# The output is a new DataFrame with column names 0 and 1.

Unnamed: 0,0,1
Row 1,2,30
Row 2,4,60
Row 3,6,90
Row 4,8,120
Row 5,60,900


In [13]:
# In order to append this to the existing DataFrame, 
# the result has to be kept in a variable so the column names can be accessed by res.columns.
res = df.apply(cal_multi_col, axis=1, result_type='expand')
df[res.columns] = res

In [14]:
df

Unnamed: 0,A,B,C,D,0,1
Row 1,1,10,20,40,2,30
Row 2,2,20,40,80,4,60
Row 3,3,30,60,120,6,90
Row 4,4,40,80,160,8,120
Row 5,30,300,600,1200,60,900


In [15]:
# Next, apply the function across the DataFrame column with result_type as 'reduce' .
# result_type='reduce' is just opposite of 'expand' 
# and returns a Series if possible rather than expanding list-like results.
df['New'] = df.apply(cal_multi_col, axis=1, result_type='reduce')
df

Unnamed: 0,A,B,C,D,0,1,New
Row 1,1,10,20,40,2,30,"[2, 30]"
Row 2,2,20,40,80,4,60,"[4, 60]"
Row 3,3,30,60,120,6,90,"[6, 90]"
Row 4,4,40,80,160,8,120,"[8, 120]"
Row 5,30,300,600,1200,60,900,"[60, 900]"


### How to use applymap()?
applymap() is only available in DataFrame and used for element-wise operation across the whole DataFrame. It has been optimized and some cases work much faster than apply() , but it’s good to compare it with apply() before going for any heavier operation.


In [17]:
# For example: to output a DataFrame with number squared
import numpy as np
df.applymap(np.square)

Unnamed: 0,A,B,C,D,0,1,New
Row 1,1,100,400,1600,4,900,"[4, 900]"
Row 2,4,400,1600,6400,16,3600,"[16, 3600]"
Row 3,9,900,3600,14400,36,8100,"[36, 8100]"
Row 4,16,1600,6400,25600,64,14400,"[64, 14400]"
Row 5,900,90000,360000,1440000,3600,810000,"[3600, 810000]"


### How to use map()?
map() is only available in Series and used for substituting each value in a Series with another value. To understand how the map() works, we first create a Series.

In [18]:
s = pd.Series(['cat', 'dog', np.nan, 'rabbit'])
s

0       cat
1       dog
2       NaN
3    rabbit
dtype: object

In [19]:
s.map({'cat': 'kitten', 'dog': 'puppy'})

0    kitten
1     puppy
2       NaN
3       NaN
dtype: object

In [20]:
s.map('I am a {}'.format)

0       I am a cat
1       I am a dog
2       I am a nan
3    I am a rabbit
dtype: object

In [21]:
s.map('I am a {}'.format, na_action='ignore')

0       I am a cat
1       I am a dog
2              NaN
3    I am a rabbit
dtype: object

## Summary
Finally, here is a summary:
### For DataFrame:
apply(): It is used when you want to apply a function along the row or column. axis = 0 for column and axis = 1 for row.

applymap(): It is used for element-wise operation across the whole DataFrame.
### For Series:
apply(): It is used when you want to apply a function on the values of Series.

map(): It is used to substitute each value with another value.