# The Apply Method in Pandas

The apply function lets you run a function on all the elements of a pandas series/column. It lets you apply a function on either the rows or columns of a data-frame.

For us to be able to experiment on the apply method, we'd need to import a few libraries to help us with the data loading. These libraries include the numpy package and the pandas library.

In [1]:
import numpy as np
import pandas as pd

After importing the libraries, we wil create a dataframe of random integers between 1 and 1000. The dataframe will also have 10,000 rows and 5 columns.

In [2]:
df = pd.DataFrame(np.random.randint(1, 1000, size=(10000, 5)), columns=list('LMNOP'))

We will now take a look at the first few rows of the dataframe.

In [4]:
df.head()

Unnamed: 0,L,M,N,O,P
0,629,594,412,553,292
1,293,699,963,252,183
2,225,973,665,4,381
3,548,24,731,753,177
4,713,73,747,777,682


Since the apply method runs a function on all the entries in a series, we could use a lambda function to create a function on the fly or create a normal function to use with `apply`. The function we are going to use is going to check every cell in a series and determine whether the value is less than or greater than 500. If greater, it will return `big`, else it will return `small`.

In [5]:
def my_func(cell):
    if cell > 500:
        return 'big'
    else:
        return 'small'

We will temporarily create a column `Q` that will hold the values returned from using `apply` on column `L`.

In [8]:
df['Q']  = df['L'].apply(my_func)

In [10]:
df.head()

Unnamed: 0,L,M,N,O,P,Q
0,629,594,412,553,292,big
1,293,699,963,252,183,small
2,225,973,665,4,381,small
3,548,24,731,753,177,big
4,713,73,747,777,682,big


A quick summary statistics of the numeric columns of the dataframe can be seen below.

In [13]:
df.describe()

Unnamed: 0,L,M,N,O,P
count,10000.0,10000.0,10000.0,10000.0,10000.0
mean,501.306,501.802,501.6683,496.8143,504.3349
std,286.848333,288.056495,286.701722,289.389585,287.392113
min,1.0,1.0,1.0,1.0,1.0
25%,253.0,253.0,257.0,247.0,256.0
50%,501.0,504.0,500.0,499.0,504.5
75%,750.0,750.0,748.0,747.0,752.25
max,999.0,999.0,999.0,999.0,999.0


You can even even use apply on a dataframe to calculate aggregation across different axis. 

The function below calculates the mean of the given column.

In [14]:
def average(col):
    return round(col.mean())

To be able to do this, we have to specify that the axis should be equal to 0. But for this to work, it needs to be applied on numeric columns. Since column `Q` is not numeric, running apply like this will throw an error. So first, we will need to delete this column.

In [15]:
df.drop(columns='Q', inplace=True)

In [18]:
df.apply(average, axis=0)

L    501
M    502
N    502
O    497
P    504
dtype: int64

We can do aggregation row wise as well. This can be accomplished by setting axis to 1. The below cell creates a new cloumn, `average` that calculates the average row wise.

In [19]:
df['average'] = df.apply(average, axis=1)

In [21]:
df.head()

Unnamed: 0,L,M,N,O,P,average
0,629,594,412,553,292,496
1,293,699,963,252,183,478
2,225,973,665,4,381,450
3,548,24,731,753,177,447
4,713,73,747,777,682,598


This is the tip of the iceberg but gives you a solid foundation when getting started with `apply`!