# Pandas apply() vs transform()

This is a notebook for the medium article [Difference between apply() and transform() in Pandas](https://medium.com/@bindiatwork/difference-between-apply-and-transform-in-pandas-242e5cf32705)

Please check out article for instructions

**License**: [BSD 2-Clause](https://opensource.org/licenses/BSD-2-Clause)

In [1]:
import pandas as pd
import numpy as np

## 1 Manipulating values

In [2]:
df = pd.DataFrame({'A': [1,2,3], 'B': [10,20,30] })

In [3]:
def plus_10(x):
    return x+10

#### For the entire DataFrame

In [4]:
df.apply(plus_10)

Unnamed: 0,A,B
0,11,20
1,12,30
2,13,40


In [5]:
df.transform(plus_10)

Unnamed: 0,A,B
0,11,20
1,12,30
2,13,40


In [6]:
## lambda equivalent
df.apply(lambda x: x+10)

Unnamed: 0,A,B
0,11,20
1,12,30
2,13,40


In [7]:
## lambda equivalent
df.transform(lambda x: x+10)

Unnamed: 0,A,B
0,11,20
1,12,30
2,13,40


#### For a single column

In [8]:
df['B_ap'] = df['B'].apply(plus_10)
df

Unnamed: 0,A,B,B_ap
0,1,10,20
1,2,20,30
2,3,30,40


In [9]:
df['B_tr'] = df['B'].transform(plus_10)
df

Unnamed: 0,A,B,B_ap,B_tr
0,1,10,20,20
1,2,20,30,30
2,3,30,40,40


### Difference

3 main differences
1. `transform()` can take a function, a string function, a list of functions, and a dict. However, `apply()` is only allowed a function.
2. `transform()` cannot produce aggregated results
3. `apply()` works with multiple Series at a time. However, `transform()` is only allowed to work with a single Series at a time.

In [10]:
df = pd.DataFrame({'A': [1,2,3], 'B': [10,20,30] })

**1. `transform()` can takes a function, a string function, a list of functions, and a dict. However, `apply()` is only allowed a function.**

In [11]:
# A string function
df.transform('sqrt')

Unnamed: 0,A,B
0,1.0,3.162278
1,1.414214,4.472136
2,1.732051,5.477226


In [12]:
# A list of functions
df.transform([np.sqrt, np.exp])

Unnamed: 0_level_0,A,A,B,B
Unnamed: 0_level_1,sqrt,exp,sqrt,exp
0,1.0,2.718282,3.162278,22026.47
1,1.414214,7.389056,4.472136,485165200.0
2,1.732051,20.085537,5.477226,10686470000000.0


In [13]:
# A dict of axis labels -> function
df.transform({
    'A': np.sqrt,
    'B': np.exp,
})

Unnamed: 0,A,B
0,1.0,22026.47
1,1.414214,485165200.0
2,1.732051,10686470000000.0


**2. `transform()` cannot produce aggregated results**

In [14]:
# This is working for apply()
df.apply(lambda x:x.sum())

A     6
B    60
dtype: int64

In [15]:
## but getting error with transform()
df.transform(lambda x:x.sum())

ValueError: Function did not transform

**3. `apply()` works with multiple Series at a time. However, `transform()` is only allowed to work with a single Series at a time.**

In [16]:
def subtract_two(x):
    return x['B'] - x['A']

In [17]:
# Working for apply with axis=1
df.apply(subtract_two, axis=1)

0     9
1    18
2    27
dtype: int64

In [18]:
# Getting error when trying the same with transform
df.transform(subtract_two, axis=1)

ValueError: Function did not transform

In [19]:
# apply() works fine with lambda expression
df.apply(lambda x: x['B'] - x['A'], axis=1)

0     9
1    18
2    27
dtype: int64

In [20]:
# Same error when using lambda expression
df.transform(lambda x: x['B'] - x['A'], axis=1)

ValueError: Function did not transform

## 2 In conjunction with groupby()

In [21]:
df = pd.DataFrame({
    'key': ['a','b','c'] * 3,
    'A': np.arange(9),
    'B': [1,2,3] * 3,
})
df

Unnamed: 0,key,A,B
0,a,0,1
1,b,1,2
2,c,2,3
3,a,3,1
4,b,4,2
5,c,5,3
6,a,6,1
7,b,7,2
8,c,8,3


2 differences
1. `transform()` returns a Series that has the same length as the input
2. `apply()` works with multiple Series at a time. However, `transform()` is only allowed to work with a single Series at a time.

**1. `transform()` returns a Series that has the same length as the input**

In [22]:
def group_sum(x):
    return x.sum()

In [23]:
gr_data_ap = df.groupby('key')['A'].apply(group_sum)
gr_data_ap

key
a     9
b    12
c    15
Name: A, dtype: int64

In [24]:
gr_data_tr = df.groupby('key')['A'].transform(group_sum)
gr_data_tr

0     9
1    12
2    15
3     9
4    12
5    15
6     9
7    12
8    15
Name: A, dtype: int64

**2. `apply()` works with multiple Series at a time. However, `transform()` is only allowed to work with a single Series at a time.**

In [25]:
def subtract_two(x):
    return x['B'] - x['A']

In [26]:
df.groupby('key').apply(subtract_two)

key   
a    0    1
     3   -2
     6   -5
b    1    1
     4   -2
     7   -5
c    2    1
     5   -2
     8   -5
dtype: int64

In [27]:
## Getting error
df.groupby('key').transform(subtract_two)

KeyError: 'B'

## Thanks for reading
This is a notebook for the medium article [Difference between apply() and transform() in Pandas](https://medium.com/@bindiatwork/difference-between-apply-and-transform-in-pandas-242e5cf32705)

Please check out article for instructions