In Pandas, missing values, often represented as NaN (Not a Number), can cause problems during data processing and analysis. These gaps in data can lead to incorrect analysis and misleading conclusions.

Pandas provides a host of functions like dropna(), fillna() and combine_first() to handle missing values.

In [9]:
import pandas as pd
import numpy as np

Remove Rows Containing Missing Values

In [12]:
data = {
    'A': [1, 2, np.nan, 4, 5],
    'B': [np.nan, 2, 3, 4, 5],
    'C': [1, 2, 3, np.nan, 5],
    'D': [1, 2, 3, 4, 5]
}
df = pd.DataFrame(data)
print("Before Droping NA: \n",df)
df.dropna(inplace=True)
print("After Droping NA: \n",df)


Before Droping NA: 
      A    B    C  D
0  1.0  NaN  1.0  1
1  2.0  2.0  2.0  2
2  NaN  3.0  3.0  3
3  4.0  4.0  NaN  4
4  5.0  5.0  5.0  5
After Droping NA: 
      A    B    C  D
1  2.0  2.0  2.0  2
4  5.0  5.0  5.0  5


Replace Missing Values


In [13]:
data = {
    'A': [1, 2, np.nan, 4, 5],
    'B': [np.nan, 2, 3, 4, 5],
    'C': [1, 2, 3, np.nan, 5],
    'D': [1, 2, 3, 4, 5]
}
df = pd.DataFrame(data)

# replace missing values with 0
df.fillna(value=0,inplace=True)
print(df)

     A    B    C  D
0  1.0  0.0  1.0  1
1  2.0  2.0  2.0  2
2  0.0  3.0  3.0  3
3  4.0  4.0  0.0  4
4  5.0  5.0  5.0  5


Replace Missing Values With Mean, Median and Mode

In [15]:
data = {
    'A': [1, 2, np.nan, 4, 5],
    'B': [np.nan, 2, 3, 4, 5],
    'C': [1, 2, 3, np.nan, 5],
    'D': [1, 2, 3, 4, 5]
}
df = pd.DataFrame(data)

# replace missing values with mean
df['A'].fillna(value=df['A'].mean(),inplace=True)

# replace missing values with median
df['B'].fillna(value=df['B'].median(),inplace=True)

# replace missing values with mode
df['C'].fillna(value=df['C'].mode()[0],inplace=True)

print(df)

     A    B    C  D
0  1.0  3.5  1.0  1
1  2.0  2.0  2.0  2
2  3.0  3.0  3.0  3
3  4.0  4.0  1.0  4
4  5.0  5.0  5.0  5


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['A'].fillna(value=df['A'].mean(),inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['B'].fillna(value=df['B'].median(),inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are settin