## Pandas apply, map and applymap

#### When to use apply, applymap and map?
__Apply__: It is used when you want to apply a function along the axis of a dataframe, it accepts a Series whose index is either column (axis=0) or row (axis=1). For example: __df.apply(np.square)__, it will give a dataframe with number squared

__applymap__: It is used for element wise operation across one or more rows and columns of a dataframe. It has been optimized and some cases work faster than apply but it’s good to compare it with apply before going for any heavier operation . Example: __df.applymap(np.square)__, it will give a dataframe with number squared

__map__: It can be used only for a Series object and helps to substitutes the series value from the lookup dictionary, Series or a function and missing value will be substituted as NaN. Since it works only with series or dictionary so you can expect a better and optimized performance. Example: __df[‘Col1’].map({‘Trenton’:’New Jersey’, ‘NYC’:’New York’, ‘Los Angeles’:’California’})__

In [1]:
import pandas as pd
import numpy as np
df = pd.DataFrame({'a' : [ 10,20,30],'b' : [5,10,15],'c' : [10,100,1000]},index=['A1','B1','C1'])
df

Unnamed: 0,a,b,c
A1,10,5,10
B1,20,10,100
C1,30,15,1000


### How to use Pandas apply?

In [2]:
# Define Functions
def multiply_by_2(col):
    # applied across the column
    return col*2

def multiply_col1_col2(col):
    # applied across the rows of dataframe
    return col['a']*col['b']

#### Apply function across dataframe columns(axis=0)

In [3]:
# by default the value of axis=0, so we have to just pass the function without axis parameter
df.apply(multiply_by_2) # All the cell values are doubled

Unnamed: 0,a,b,c
A1,20,10,20
B1,40,20,200
C1,60,30,2000


#### Apply function across dataframe rows(axis=1)
Now apply the function multiply_col1_col2 across the rows of the dataframe. Here we have set the axis parameter as 1 (axis=1)

In [5]:
df.apply(multiply_col1_col2,axis=1)  # It will return a series object with values obtained by multiplying col1 and col2 with the same indexes

A1     50
B1    200
C1    450
dtype: int64

#### Create a new Column col1xcol2 with the above series

In [6]:
df2 = df.copy()
df2['col1Xcol2'] = df.apply(multiply_col1_col2,axis=1)
df2

Unnamed: 0,a,b,c,col1Xcol2
A1,10,5,10,50
B1,20,10,100,200
C1,30,15,1000,450


### Pandas apply function with Result_type parameter
It’s a parameter set to {__expand__, __reduce__ or __broadcast__} to get the desired type of result. the default value is __None__.

In the above scenario if result_type is set to broadcast then the output will be a dataframe substituted by the Col1xCol2 value

In [7]:
# The results is broadcasted to the original shape of the frame, the original index and columns is retained
df2.apply(multiply_col1_col2,axis=1,result_type='broadcast')

Unnamed: 0,a,b,c,col1Xcol2
A1,50,50,50,50
B1,200,200,200,200
C1,450,450,450,450


In [8]:
def multi_and_list(col):
    '''
    function that returns a list value
    '''
    return [col['a']*2,col['b']*2,col['c']*2]

In [9]:
# Now apply this function across the dataframe column with result_type as expand
df.apply(multi_and_list,axis=1,result_type='expand') 

Unnamed: 0,0,1,2
A1,20,10,20
B1,40,20,200
C1,60,30,2000


if result_type is set as expand then It returns a dataframe though the function returns a list.

result_type reduce is just opposite of expand and returns a Series if possible rather than expanding list-like results

In [11]:
df.apply(multi_and_list,axis=1,result_type='reduce') 

A1      [20, 10, 20]
B1     [40, 20, 200]
C1    [60, 30, 2000]
dtype: object

### How to use lambda with apply?

In [13]:
# We will multiply the values at Col1 and Col2 using the lambda function. Since we have to apply this for each row so we will use axis=1
df.apply(lambda x: x['a']*x['b'],axis=1)

A1     50
B1    200
C1    450
dtype: int64

### Create a Function with argument

In [14]:
from math import radians, cos, sin, asin, sqrt

In [15]:
def haversine(row,rad):
    '''
    Calculate the great circle distance between two points on the earth (specified in decimal degrees)
    '''
    # convert decimal degrees to radians
    lon1, lat1, lon2, lat2 = map(radians, [row['dest_long'], row['dest_lat'],row['orig_long'], row['orig_lat']])

    # haversine formula
    dlon = lon2 - lon1
    dlat = lat2 - lat1
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a))
    r = rad # Radius of earth in kilometers. Use 3956 for miles
    return c * r

In [16]:
# Create dataframe with origin and destination city latitude and longitude
df_coords = pd.DataFrame({'orig_city':['New York','Charlotte','Boston','Bridgewater'],
'orig_lat':[40.7128,35.2271,42.36,40.594],
'orig_long':[74.006,80.843,71.0589,74.604],
'dest_city':['Trenton','Texas','Sunnyvale','San Jose'],
'dest_lat':[40.2206,31.9686,37.3688,37.3382],
'dest_long':[40.2206,31.9686,37.3688,37.3382]})
df_coords

Unnamed: 0,orig_city,orig_lat,orig_long,dest_city,dest_lat,dest_long
0,New York,40.7128,74.006,Trenton,40.2206,40.2206
1,Charlotte,35.2271,80.843,Texas,31.9686,31.9686
2,Boston,42.36,71.0589,Sunnyvale,37.3688,37.3688
3,Bridgewater,40.594,74.604,San Jose,37.3382,37.3382


# Apply function with arguments
Now we will find haversine distance between origin and destination city in the above dataframe. So we will apply the haversine function defined above using the apply function.

In haversine function above rad is a required argument and the dataframe doesn’t have any radius column. We will pass the radius values as args=(3956,) in the apply function as a positional argument. to calculate distance in miles

In [24]:
df['haversine_dist']=df_coords.apply(haversine,axis=1,args=(3956,))
df_coords

Unnamed: 0,orig_city,orig_lat,orig_long,dest_city,dest_lat,dest_long,orig_dest_haver_dist
0,New York,40.7128,74.006,Trenton,40.2206,40.2206,1763.974392
1,Charlotte,35.2271,80.843,Texas,31.9686,31.9686,2791.594812
2,Boston,42.36,71.0589,Sunnyvale,37.3688,37.3688,1806.117892
3,Bridgewater,40.594,74.604,San Jose,37.3382,37.3382,1998.176818


In [21]:
df_coords['orig_dest_haver_dist']=df_coords.apply(haversine,axis=1,rad=3956)
df_coords

Unnamed: 0,orig_city,orig_lat,orig_long,dest_city,dest_lat,dest_long,orig_dest_haver_dist
0,New York,40.7128,74.006,Trenton,40.2206,40.2206,1763.974392
1,Charlotte,35.2271,80.843,Texas,31.9686,31.9686,2791.594812
2,Boston,42.36,71.0589,Sunnyvale,37.3688,37.3688,1806.117892
3,Bridgewater,40.594,74.604,San Jose,37.3382,37.3382,1998.176818


### How to use Pandas applymap?
it is used for element wise operation of a dataframe and a scalar value is returned for every elements. We will square each number in the above dataframe using lambda expression with applymap function

In [25]:
df.applymap(lambda x: x**2)

Unnamed: 0,a,b,c,haversine_dist
A1,100,25,100,
B1,400,100,10000,
C1,900,225,1000000,


__Remark:__ There are more vectorized way of doing this operation is available like df *2 which is much faster and optimized

### How to use Pandas map?
Maps are used to map or substitute a value from a lookup table i.e. a dictionary, function or a series here.

#### References
1. https://kanoki.org/2019/11/25/pandas-apply-map-and-applymap/