# Handling Missing Values in Pandas



**âœ… Why handle missing values?**
- Machine-learning models cannot handle NaNs (missing data) directly. So you must either remove or fill them.

 **We cover two common strategies:**
   - Dropping rows with missing values
   - Filling missing values (with mean)

In [1]:
import pandas as pd
import numpy as np

# Sample dataset with missing values
data = {
    'Age': [22, 35, np.nan, 40, 28],
    'Salary': [5000, np.nan, 7200, 6100, np.nan],
    'Department': ['HR', 'IT', 'IT', 'HR', None]
}

df = pd.DataFrame(data)
df

Unnamed: 0,Age,Salary,Department
0,22.0,5000.0,HR
1,35.0,,IT
2,,7200.0,IT
3,40.0,6100.0,HR
4,28.0,,


## Method 1: Drop rows with missing values

In [2]:
df_drop_any = df.dropna()
df_drop_any

Unnamed: 0,Age,Salary,Department
0,22.0,5000.0,HR
3,40.0,6100.0,HR


In [3]:
df_drop_all = df.dropna(how='all')
df_drop_all

Unnamed: 0,Age,Salary,Department
0,22.0,5000.0,HR
1,35.0,,IT
2,,7200.0,IT
3,40.0,6100.0,HR
4,28.0,,


## Method 2: Fill missing numeric values with mean

In [4]:
df_fill_mean = df.copy()

df_fill_mean['Age'] = df_fill_mean['Age'].fillna(df_fill_mean['Age'].mean())
df_fill_mean['Salary'] = df_fill_mean['Salary'].fillna(df_fill_mean['Salary'].mean())

df_fill_mean

Unnamed: 0,Age,Salary,Department
0,22.0,5000.0,HR
1,35.0,6100.0,IT
2,31.25,7200.0,IT
3,40.0,6100.0,HR
4,28.0,6100.0,


## Method 3: Fill missing categorical values with mode

E.g., fill with the most frequent category (mode):

In [5]:
df_fill_cat = df.copy()
df_fill_cat['Department'] = df_fill_cat['Department'].fillna(df_fill_cat['Department'].mode()[0])
df_fill_cat

Unnamed: 0,Age,Salary,Department
0,22.0,5000.0,HR
1,35.0,,IT
2,,7200.0,IT
3,40.0,6100.0,HR
4,28.0,,HR


## Automatic handling for all columns

Fill numeric with mean + categorical with mode:

In [6]:
df_auto = df.copy()

# Fill numeric columns with mean
for col in df_auto.select_dtypes(include='number'):
    df_auto[col] = df_auto[col].fillna(df_auto[col].mean())

# Fill categorical columns with mode
for col in df_auto.select_dtypes(include='object'):
    df_auto[col] = df_auto[col].fillna(df_auto[col].mode()[0])

df_auto

Unnamed: 0,Age,Salary,Department
0,22.0,5000.0,HR
1,35.0,6100.0,IT
2,31.25,7200.0,IT
3,40.0,6100.0,HR
4,28.0,6100.0,HR


## ðŸŽ¯ Summary

---

| Technique              | When to Use                                | Example                             |
| ---------------------- | ------------------------------------------ | ----------------------------------- |
| **Drop rows**          | When dataset is large & missing % is small | `df.dropna()`                       |
| **Fill with mean**     | Numeric continuous data                    | `df[col].fillna(df[col].mean())`    |
| **Fill with mode**     | Categorical data                           | `df[col].fillna(df[col].mode()[0])` |
| **Fill automatically** | Preprocessing pipelines                    | Loop over dtypes                    |
