 **IMP:** `transform` works on just one Series at a time and `apply` works on the entire DataFrame at once. Keep in mind during applying `groupby()` method.

Follow the link for best understanding : https://stackoverflow.com/questions/13854476/pandas-transform-doesnt-work-sorting-groupby-output/13854901#13854901

In [17]:
import pandas as pd
import numpy as np

df = pd.DataFrame(  {   'A': [5,6,7],
                        'B': [8,9,10]}) 

df1 = pd.read_csv(r"C:\Users\sanju\OneDrive\Desktop\Tutorials\Pandas\Dataset\weather.csv", usecols= ['temperature',	'humidity'])

df2 = pd.DataFrame({
    'key': ['a','b','c'] * 4,
    'A': np.arange(12),
    'B': [1,2,3] * 4,
})
display(df, df1, df2.head())

Unnamed: 0,A,B
0,5,8
1,6,9
2,7,10


Unnamed: 0,temperature,humidity
0,65,56
1,61,54
2,70,60
3,30,50
4,28,52
5,25,51


Unnamed: 0,key,A,B
0,a,0,1
1,b,1,2
2,c,2,3
3,a,3,1
4,b,4,2


**1.** `apply(func, axis= 0)` it applies the function `func` along the given axis(series) (rows for 1, columns for 0) of the DataFrame. Returns a pandas Series or DataFrame.

**2.** `transform(func, axis= 0)` it applies the function `func` along the given axis. Returns a DataFrame which must have same axis length as the original DataFrame. 

In [10]:
# Using apply() method to manipulate dataframe

def add( x ):
    return x+10

display(df.apply( add ))  # Sums along the row of the dataframe 
# Equivalent to display(df.apply( lambda n : n + 10))

Unnamed: 0,A,B
0,15,18
1,16,19
2,17,20


In [11]:
# Using transform() method to manipulate dataframe
def add( x ):
    return x+10

display(df.transform( add ) ) # Sums along the row of the dataframe 
# Equivalent to display(df.transform( lambda n : n + 10))

Unnamed: 0,A,B
0,15,18
1,16,19
2,17,20


In [12]:
# Using apply() method to manipulate a Series
display(df['B'].apply( lambda n : n + 15 ))

0    23
1    24
2    25
Name: B, dtype: int64

In [13]:
# Using apply() method to work with each Series separately
df.apply(lambda n : n['A']*2 * n['B']*5, axis= 1)   

0    400
1    540
2    700
dtype: int64

In [14]:
# Using transform() method to manipulate a Series
display(df['A'].transform( lambda n : n + 15 ))

0    20
1    21
2    22
Name: A, dtype: int64

In [15]:
# Using transform() method to work with each Series separately. will Produce an ERROR.
df.transform(lambda n : n['A']*2 * n['B']*5, axis= 1)   

ValueError: Function did not transform

**3.** Using transform() methods

In [18]:
# using a single fucntion
display(df.transform( func= ['exp']))   # Remember the resulting df must have same AXIS length as the original df
display(df1.transform( 'sqrt', axis= 1))

# Using a list of function
display(df1.transform( func= ['exp','sqrt'], axis=0))  # Generates a MultiIndex column DataFrame, have same axis (row) length as original 
display(df.transform(['exp','sqrt'], axis=1))          # Generates a MultiIndex DataFrame, have same axis (column) length as original 

Unnamed: 0_level_0,A,B
Unnamed: 0_level_1,exp,exp
0,148.413159,2980.957987
1,403.428793,8103.083928
2,1096.633158,22026.465795


Unnamed: 0,temperature,humidity
0,8.062258,7.483315
1,7.81025,7.348469
2,8.3666,7.745967
3,5.477226,7.071068
4,5.291503,7.211103
5,5.0,7.141428


Unnamed: 0_level_0,temperature,temperature,humidity,humidity
Unnamed: 0_level_1,exp,sqrt,exp,sqrt
0,1.694889e+28,8.062258,2.091659e+24,7.483315
1,3.104298e+26,7.81025,2.830753e+23,7.348469
2,2.515439e+30,8.3666,1.142007e+26,7.745967
3,10686470000000.0,5.477226,5.184706e+21,7.071068
4,1446257000000.0,5.291503,3.831008e+22,7.211103
5,72004900000.0,5.0,1.409349e+22,7.141428


Unnamed: 0,Unnamed: 1,A,B
0,exp,148.413159,2980.957987
0,sqrt,2.236068,2.828427
1,exp,403.428793,8103.083928
1,sqrt,2.44949,3.0
2,exp,1096.633158,22026.465795
2,sqrt,2.645751,3.162278


In [19]:
# Using Dictionary to pass function
df1.transform({'temperature' : 'sqrt'}, axis=0)

Unnamed: 0,temperature
0,8.062258
1,7.81025
2,8.3666
3,5.477226
4,5.291503
5,5.0


In [20]:
df2

Unnamed: 0,key,A,B
0,a,0,1
1,b,1,2
2,c,2,3
3,a,3,1
4,b,4,2
5,c,5,3
6,a,6,1
7,b,7,2
8,c,8,3
9,a,9,1


In [21]:
# groupby() method, mostly used 
df2.groupby('key').transform(sum) # Here sum() method is working, as it produces same axis length as parent

Unnamed: 0,A,B
0,18,4
1,22,8
2,26,12
3,18,4
4,22,8
5,26,12
6,18,4
7,22,8
8,26,12
9,18,4


In [22]:
df2.groupby('key')['A'].transform(sum)

0     18
1     22
2     26
3     18
4     22
5     26
6     18
7     22
8     26
9     18
10    22
11    26
Name: A, dtype: int32

In [46]:
# Can also be used to filter data
temp_df = pd.DataFrame({
  'restaurant_id': [101,102,103,104,105,106,107],
  'address': ['A','B','C','D', 'E', 'F', 'G'],
  'city': ['London','London','London','Oxford','Oxford', 'Durham', 'Durham'],
  'sales': [10,500,48,12,21,22,14]
})

temp_df[temp_df.groupby('city')['sales'].transform(sum) > 50]

Unnamed: 0,restaurant_id,address,city,sales
0,101,A,London,10
1,102,B,London,500
2,103,C,London,48


In [49]:
temp_df.groupby('city').get_group('Oxford')   # Using get_group() method to select a group from groupby object

Unnamed: 0,restaurant_id,address,city,sales
3,104,D,Oxford,12
4,105,E,Oxford,21


In [29]:
# Can handle missing values

temp_df = pd.DataFrame({
    'name': ['A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
    'B': [np.nan, 4, np.nan, 5,6,np.nan, 8,2]
})
temp_df

Unnamed: 0,name,B
0,A,
1,A,4.0
2,B,
3,B,5.0
4,B,6.0
5,C,
6,C,8.0
7,C,2.0


In [25]:
temp_df.fillna(temp_df['B'].mean())

Unnamed: 0,name,B
0,A,5.0
1,A,4.0
2,B,5.0
3,B,5.0
4,B,6.0
5,C,5.0
6,C,8.0
7,C,2.0


In [None]:
temp_df['B'] = temp_df.groupby('name')['B'].transform( lambda x: x.fillna(x.mean()))
temp_df

Unnamed: 0,name,B
0,A,4.0
1,A,4.0
2,B,5.5
3,B,5.0
4,B,6.0
5,C,5.0
6,C,8.0
7,C,2.0


**4.** Using apply() Method

In [None]:
df1.apply(['sum'], axis=0)  # Same as above, using a single function. Wrapping [] around fucntion produces a df otherwise a Series 
# equivalent to df1.apply(lambda n : [n.sum()])

Unnamed: 0,temperature,humidity
sum,279,323


In [None]:
#  For groupby() method with apply(), it returns one value for each group(a,b,c) and the output shape is (num_of_groups, 1) .
df2.groupby('key').apply('sum')

Unnamed: 0_level_0,A,B
key,Unnamed: 1_level_1,Unnamed: 2_level_1
a,18,4
b,22,8
c,26,12


In [None]:
df2.groupby('key')['B'].apply('sum')

key
a     4
b     8
c    12
Name: B, dtype: int64

In [None]:
# We cannot get an aggregated result like sum or count using transform() method
# Becoz it would violate the cond. that the resulting df should have same
# Axis length as the original df.

df1.transform('count', axis=0)

ValueError: Function did not transform