# Handling Missing Values – A Practical Guide

This notebook demonstrates practical techniques to handle missing values using Pandas and NumPy.

## Import Required Libraries

In [None]:
import pandas as pd
import numpy as np

## Create Sample Dataset with Missing Values

In [None]:
df = pd.DataFrame({
    'CustomerID': [101, 102, 103, 104, 105, 106],
    'Age': [25, np.nan, 30, 28, np.nan, 40],
    'City': ['Bangalore', 'Mumbai', np.nan, 'Delhi', 'Mumbai', np.nan],
    'Salary': [50000, 60000, np.nan, 55000, 58000, np.nan],
    'Gender': ['Male', 'Female', 'Female', np.nan, 'Male', np.nan]
})

df

## Check Missing Values

In [None]:
df.isnull().sum()

In [None]:
(df.isnull().sum() / len(df)) * 100

## Filling Missing Values with Constant Values

In [None]:
df['City'] = df['City'].fillna('Unknown')
df['Gender'] = df['Gender'].fillna('Unknown')

df

## Forward Fill (ffill)

In [None]:
df['Salary'] = df['Salary'].fillna(method='ffill')

df

## Backward Fill (bfill)

In [None]:
df['Age'] = df['Age'].fillna(method='bfill')

df

## Filling with Mean, Median, Mode

In [None]:
df['Salary'] = df['Salary'].fillna(df['Salary'].mean())
df['Age'] = df['Age'].fillna(df['Age'].median())
df['Gender'] = df['Gender'].fillna(df['Gender'].mode()[0])

df

## Dropping Missing Values

In [None]:
df.dropna(axis=0)

In [None]:
df.dropna(axis=1)

## Replace Missing Values

In [None]:
df['Age'] = df['Age'].replace(np.nan, 30)

df

## Interpolation

In [None]:
df['Salary'] = df['Salary'].interpolate(method='linear')

df

## Rule of Thumb

| Missing % | Action |
|----------|--------|
| < 5% | Drop rows |
| 5% – 50% | Fill values |
| > 50% | Drop column |

## Conclusion

Handling missing values depends on data type, distribution, and business context. There is no single best method.