In [110]:
import numpy as np
import pandas as pd

<span style="font-size:18px; font-weight:bold;">

# Handling Missing Data in Pandas
**In real-world datasets, missing values are common. Pandas provides powerful tools to detect, remove, or fill in missing data. Missing values are typically represented as NaN (Not a Number).**



<span style="font-size:18px; font-weight:bold;">

## Creating Example DataFrame with Missing Values

In [45]:

data = {
    'A': [1, 2, np.nan],
    'B': [5, np.nan, np.nan],
    'C': [1, 2, 3]
}

df = pd.DataFrame(data)

<span style="font-size:18px; font-weight:bold;">

## Detecting Missing Values (df.isnull()):

Returns a DataFrame of True/False values indicating where data is missing.

In [48]:
df.isnull()


Unnamed: 0,A,B,C
0,False,False,False
1,False,True,False
2,True,True,False


In [50]:
df.notnull() #Returns the opposite — where data is not missing.




Unnamed: 0,A,B,C
0,True,True,True
1,True,False,True
2,False,False,True


<span style="font-size:18px; font-weight:bold;">

## Removing Missing Data

<span style="font-size:18px; font-weight:bold;">

### Remove rows with any NaN values:

In [54]:
df.dropna()


Unnamed: 0,A,B,C
0,1.0,5.0,1


<span style="font-size:18px; font-weight:bold;">

### Remove columns with any NaN values:

In [57]:
df.dropna(axis=1)


Unnamed: 0,C
0,1
1,2
2,3


<span style="font-size:18px; font-weight:bold;">
    
### Only drop rows where all elements are missing:

In [69]:
df.dropna(how='all')

Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,,2
2,,,3


<span style="font-size:18px; font-weight:bold;">

Pro tip: Use thresh=n to require at least n non-null values in a row/column:

In [72]:
df.dropna(thresh=2)


Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,,2


<span style="font-size:18px; font-weight:bold;">

## Filling Missing Data


<span style="font-size:18px; font-weight:bold;">

### Fill all NaN values with a constant:


In [76]:
df.fillna(value='FILL')


Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,FILL,2
2,FILL,FILL,3


<span style="font-size:18px; font-weight:bold;">

### Fill with column means (common in numerical data):



In [81]:
df['A'].fillna(value=df['A'].mean())

0    1.0
1    2.0
2    1.5
Name: A, dtype: float64

<span style="font-size:18px; font-weight:bold;">

### Forward fill (fill using the previous value):

In [91]:
df.fillna(method='ffill')


  df.fillna(method='ffill')


Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,5.0,2
2,2.0,5.0,3


<span style="font-size:18px; font-weight:bold;">


### Backward fill (use the next value down):

In [94]:
df.fillna(method='bfill')


  df.fillna(method='bfill')


Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,,2
2,,,3


<span style="font-size:18px; font-weight:bold;">
  
## Summary 
When to Drop vs Fill?


<span style="font-size:18px; font-weight:bold;">

## When to Drop vs Fill?

| Situation                              | Recommended Action                     |
|----------------------------------------|----------------------------------------|
| Few rows with NaN, large dataset       | Drop them (`dropna()`)                 |
| Many missing values in critical column | Fill if meaningful (`fillna()`)        |
| Time series data                       | Forward/backward fill                  |
| Categorical data                       | Fill with mode or 'Unknown'            |