# Why use apply() and tranform() on DataFrame?

- Both apply() and transform() are used to manipulate an entire DataFrame or any specific column in given DataFrame. 


# 3 main differences


- transform() can take a function, a string function, a list of functions, and a dict. However, apply() is only allowed a function.
-  transform() cannot produce aggregated results
- apply() works with multiple Series at a time. But, transform() is only allowed to work with a single Series at a time.


## Source Notebook:
- https://github.com/BindiChen/machine-learning/blob/master/data-analysis/014-pandas-apply-vs-transform/pandas-apply-vs-transform.ipynb

Note: I am using most of the content here from the above source notebook. Please do review the above as needed.

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.DataFrame({'A': [1,2,3], 'B': [10,20,30] })

In [3]:
df

Unnamed: 0,A,B
0,1,10
1,2,20
2,3,30


In [4]:
def plus_10(x):
    return x+10

In [5]:
df.apply(plus_10)

Unnamed: 0,A,B
0,11,20
1,12,30
2,13,40


In [9]:
df['B'].apply(plus_10)

0    20
1    30
2    40
Name: B, dtype: int64

In [6]:
df

Unnamed: 0,A,B
0,1,10
1,2,20
2,3,30


In [7]:
df.transform(plus_10)

Unnamed: 0,A,B
0,11,20
1,12,30
2,13,40


In [8]:
df['B'].transform(plus_10)

0    20
1    30
2    40
Name: B, dtype: int64

In [11]:
df.apply(lambda x: x+10)

Unnamed: 0,A,B
0,11,20
1,12,30
2,13,40


In [12]:
df

Unnamed: 0,A,B
0,1,10
1,2,20
2,3,30


In [13]:
df.transform(lambda x: x+10)

Unnamed: 0,A,B
0,11,20
1,12,30
2,13,40


In [14]:
df['B'].transform(lambda x: x+10)

0    20
1    30
2    40
Name: B, dtype: int64

In [15]:
df['B_apply'] = df['B'].apply(plus_10)

In [16]:
df

Unnamed: 0,A,B,B_apply
0,1,10,20
1,2,20,30
2,3,30,40


In [17]:
# The lambda equivalent
df['B_apply_lambda'] = df['B'].apply(lambda x: x+10)

In [18]:
df

Unnamed: 0,A,B,B_apply,B_apply_lambda
0,1,10,20,20
1,2,20,30,30
2,3,30,40,40


In [19]:
df['B_transform'] = df['B'].transform(plus_10)

In [20]:
df

Unnamed: 0,A,B,B_apply,B_apply_lambda,B_transform
0,1,10,20,20,20
1,2,20,30,30,30
2,3,30,40,40,40


In [21]:
# The lambda equivalent
df['B_transform_lambda'] = df['B'].transform(lambda x: x+10)

In [22]:
df

Unnamed: 0,A,B,B_apply,B_apply_lambda,B_transform,B_transform_lambda
0,1,10,20,20,20,20
1,2,20,30,30,30,30
2,3,30,40,40,40,40


# 3 main differences
- transform() works with function, a string function, a list of functions, and a dict. However, apply() is only allowed with function.
- transform() cannot produce aggregated results.
- apply() works with multiple Series at a time. But, transform() is only allowed to work with a single Series at a time.


In [24]:
df

Unnamed: 0,A,B,B_apply,B_apply_lambda,B_transform,B_transform_lambda
0,1,10,20,20,20,20
1,2,20,30,30,30,30
2,3,30,40,40,40,40


In [23]:
df.transform('sqrt')

Unnamed: 0,A,B,B_apply,B_apply_lambda,B_transform,B_transform_lambda
0,1.0,3.162278,4.472136,4.472136,4.472136,4.472136
1,1.414214,4.472136,5.477226,5.477226,5.477226,5.477226
2,1.732051,5.477226,6.324555,6.324555,6.324555,6.324555


In [25]:
df

Unnamed: 0,A,B,B_apply,B_apply_lambda,B_transform,B_transform_lambda
0,1,10,20,20,20,20
1,2,20,30,30,30,30
2,3,30,40,40,40,40


In [26]:
df.transform([np.sqrt, np.exp])

Unnamed: 0_level_0,A,A,B,B,B_apply,B_apply,B_apply_lambda,B_apply_lambda,B_transform,B_transform,B_transform_lambda,B_transform_lambda
Unnamed: 0_level_1,sqrt,exp,sqrt,exp,sqrt,exp,sqrt,exp,sqrt,exp,sqrt,exp
0,1.0,2.718282,3.162278,22026.47,4.472136,485165200.0,4.472136,485165200.0,4.472136,485165200.0,4.472136,485165200.0
1,1.414214,7.389056,4.472136,485165200.0,5.477226,10686470000000.0,5.477226,10686470000000.0,5.477226,10686470000000.0,5.477226,10686470000000.0
2,1.732051,20.085537,5.477226,10686470000000.0,6.324555,2.353853e+17,6.324555,2.353853e+17,6.324555,2.353853e+17,6.324555,2.353853e+17


In [27]:
df.transform({
    'A': np.sqrt,
    'B': np.exp,
})

Unnamed: 0,A,B
0,1.0,22026.47
1,1.414214,485165200.0
2,1.732051,10686470000000.0


In [40]:
df.apply({
    'A': np.sqrt,
    'B': np.exp,
})

Unnamed: 0,A,B
0,1.0,22026.47
1,1.414214,485165200.0
2,1.732051,10686470000000.0


In [29]:
df

Unnamed: 0,A,B,B_apply,B_apply_lambda,B_transform,B_transform_lambda
0,1,10,20,20,20,20
1,2,20,30,30,30,30
2,3,30,40,40,40,40


In [30]:
df.apply(lambda x:x.sum())

A                      6
B                     60
B_apply               90
B_apply_lambda        90
B_transform           90
B_transform_lambda    90
dtype: int64

In [32]:
# You will get an Error
#df.transform(lambda x:x.sum())

#Error Message>>> ValueError: transforms cannot produce aggregated results

In [33]:
def subtract_two(x):
    return x['B'] - x['A']

In [34]:
df.apply(subtract_two, axis=1)

0     9
1    18
2    27
dtype: int64

In [36]:
# You will get an Error
#df.transform(subtract_two, axis=1)

# ValueError: transforms cannot produce aggregated results

In [37]:
# It is working
df.apply(lambda x: x['B'] - x['A'], axis=1)

0     9
1    18
2    27
dtype: int64

In [39]:
# Getting same error
#df.transform(lambda x: x['B'] - x['A'], axis=1)

# ValueError: transforms cannot produce aggregated results