# Missing Data

In [1]:
import numpy as np
import pandas as pd

## **Finding missing Data**

In [12]:
data = {
    'A': [1, 2, np.nan, 4, 5], #np.nan ==> not a null value
    'B': [1, 2, 3, 4, 5],
    'C': [1, 2, 3, np.nan, np.nan],
    'D': [1, np.nan, np.nan, np.nan, 5]
}
df=pd.DataFrame(data)

In [13]:
df

Unnamed: 0,A,B,C,D
0,1.0,1,1.0,1.0
1,2.0,2,2.0,
2,,3,3.0,
3,4.0,4,,
4,5.0,5,,5.0


🔍 What it does:
df.isna()
Returns a DataFrame of same shape as df, with True where value is missing (NaN), else False.

.sum()
Sums up True values column-wise (because True = 1, False = 0 in Python).
So it gives count of missing values in each column.

In [14]:
df.isna() # checking null value in data and counting it

Unnamed: 0,A,B,C,D
0,False,False,False,False
1,False,False,False,True
2,True,False,False,True
3,False,False,True,True
4,False,False,True,False


In [15]:
df.isna().sum()

A    1
B    0
C    2
D    3
dtype: int64

In [16]:
df.isna().any()
#matlab: "Kya kisi column me missing (NaN) value hai?"
#✅ What It Does:
# df.isna() → returns True/False for each cell

# .any() → returns True if at least one True exists in that column

A     True
B    False
C     True
D     True
dtype: bool

In [17]:

#Agar check karna ho "Kya poore dataframe me kahin bhi NaN hai?":
df.isna().values.any()
# Output: True (if anywhere missing)


np.True_

## **Removing Missing Data**

In [18]:
df

Unnamed: 0,A,B,C,D
0,1.0,1,1.0,1.0
1,2.0,2,2.0,
2,,3,3.0,
3,4.0,4,,
4,5.0,5,,5.0


✅ Function: df.dropna()


  Remove Rows with Missing Data (default)

📌 Yeh by default axis=0 leta hai → rows remove karta hai jisme koi bhi NaN ho.

In [19]:
df.dropna()

Unnamed: 0,A,B,C,D
0,1.0,1,1.0,1.0


In [20]:
df.dropna(thresh=3)

Unnamed: 0,A,B,C,D
0,1.0,1,1.0,1.0
1,2.0,2,2.0,
4,5.0,5,,5.0


✅ Meaning of thresh=n

thresh=n ka matlab hai: ✅ Keep only those rows/columns which have at least n non-null (not NaN) values.

matlab only drop karo rows (ya columns) jisme 3 ya usse kam non-NaN values ho.

## **Filling The missing Data**

In [22]:
df

Unnamed: 0,A,B,C,D
0,1.0,1,1.0,1.0
1,2.0,2,2.0,
2,,3,3.0,
3,4.0,4,,
4,5.0,5,,5.0


 filling the missing value with any number, let's say 0

✅ Function: df.fillna(value)

Iska use hum karte hain missing values (NaN) ko kisi value se replace karne ke liye.

In [23]:
df.fillna(0) #📌 Sare NaN values ko 0 se replace karega.

Unnamed: 0,A,B,C,D
0,1.0,1,1.0,1.0
1,2.0,2,2.0,0.0
2,0.0,3,3.0,0.0
3,4.0,4,0.0,0.0
4,5.0,5,0.0,5.0


 ### Fill each column with different value

In [26]:
df.fillna({'A': 0, 'B': 99,'C':12,'D':15})  #📌 Column A me NaN → 0, column B me NaN → 99 and so on


Unnamed: 0,A,B,C,D
0,1.0,1,1.0,1.0
1,2.0,2,2.0,15.0
2,0.0,3,3.0,15.0
3,4.0,4,12.0,15.0
4,5.0,5,12.0,5.0


✅ Basic Summary (Beginner-Friendly):

| Task               | Function   | Use                                          |
| ------------------ | ---------- | -------------------------------------------- |
| 🔍 Finding Missing | `isna()`   | Detect where NaN/missing values exist        |
| ❌ Removing Missing | `dropna()` | Remove rows/columns with missing values      |
| 🔄 Filling Missing | `fillna()` | Replace missing values with something useful |


✅ Yes, you can confidently say:
"The basic code for handling missing data in Pandas is:

isna() to find,

dropna() to remove, and

fillna() to fill missing values."**

🔥 Ye 3 functions hi 90% cases me kaafi hote hain, especially for beginners.