# Handling Missing Data in Pandas

| Method                           | Description                                                                 | Example Code                    |
|----------------------------------|-----------------------------------------------------------------------------|---------------------------------|
| `isnull()` / `isna()`            | Detect missing values (returns True/False mask).                            | `df.isnull()`                   |
| `notnull()` / `notna()`          | Detect non-missing values.                                                  | `df.notnull()`                  |
| `dropna()`                       | Remove rows/columns with NaN.                                               | `df.dropna()`                   |
| `dropna(how="any")`              | Drop row/col if **any** value is NaN (default).                             | `df.dropna(how="any")`          |
| `dropna(how="all")`              | Drop row/col if **all** values are NaN.                                     | `df.dropna(how="all")`          |
| `dropna(thresh=N)`               | Keep row/col only if it has **at least N non-NaN values**.                  | `df.dropna(thresh=2)`           |
| `dropna(axis=0)`                 | Drop rows with NaN (default).                                               | `df.dropna(axis=0)`             |
| `dropna(axis=1)`                 | Drop columns with NaN.                                                      | `df.dropna(axis=1)`             |
| `fillna(value)`                  | Fill NaN with a constant value.                                             | `df.fillna(Value=value)`        |
| `fillna(method="ffill")`         | Forward fill (propagate previous value).                                    | `df.fillna(method="ffill")`     |
| `fillna(method="bfill")`         | Backward fill (propagate next value).                                       | `df.fillna(method="bfill")`     |
| `replace(np.nan, "Missing")`     | Replace NaN with custom value.                                              | `df.replace(np.nan, "Missing")` |
| `interpolate()`                  | Fill NaN using interpolation.                                               | `df.interpolate()`              |
| `inplace=True`                   | Apply changes permanently without creating a copy.                          | `df.dropna(inplace=True)`       |


In [None]:
import pandas as pd
import numpy as np

# Sample DataFrame with NaN
df = pd.DataFrame({
    "Name": ["Alice", "Bob", "Charlie", None],
    "Age": [25, np.nan, 35, None],
    "City": ["NY", None, None, None]
})

print("Original DataFrame:\n", df)

# 1. Drop row if ANY value is NaN
print("\nDrop rows where any value is NaN:\n", df.dropna(how="any"))

# 2. Drop row if ALL values are NaN
print("\nDrop rows where all values are NaN:\n", df.dropna(how="all"))

# 3. Drop rows with at least 2 NON-NaN values
print("\nKeep rows with at least 2 non-NaN values (thresh=2):\n", df.dropna(thresh=2))

# 4. Drop columns with at least 3 NON-NaN values
print("\nKeep columns with at least 3 non-NaN values (thresh=3, axis=1):\n", df.dropna(axis=1, thresh=3))
values = {"Name":"Surbhi","Age":{30,40},"City":{"A","B","C"}}
print("\n Values ", df.fillna(value=values))



Solve Problems, approach for ds
# 📘 Handling Missing Data in Pandas

| Method               | Explanation                              | Example                                |
|----------------------|------------------------------------------|----------------------------------------|
| `fillna(value)`      | Replace NaN with a fixed value           | `df["Age"].fillna(30)`                  |
| `fillna(mean/median)`| Replace NaN with mean/median of column   | `df["Age"].fillna(df["Age"].mean())`    |
|                      |                                          | `df["Age"].fillna(df["Age"].median())`  |
| `fillna(mode)`       | Replace NaN with most frequent value     | `df["Age"].fillna(df["Age"].mode()[0])` |
| `ffill` (forward)    | Copy previous row value to NaN           | `df["Age"].fillna(method="ffill")`      |
| `bfill` (backward)   | Copy next row value to NaN               | `df["Age"].fillna(method="bfill")`      |
| Custom values        | Fill NaN manually with different values  | `df.loc[df["Age"].isna(),"Age"]=[30,40]`|
| `dropna()`           | Remove rows with NaN values              | `df.dropna()`                           |
| `thresh` (dropna)    | Keep rows with ≥ threshold non-NaN       | `df.dropna(thresh=2)`                   |
| `isna()` / `notna()` | Check where NaN values exist             | `df["Age"].isna()`                      |


In [None]:
import pandas as pd
import numpy as np

# Sample DataFrame
df = pd.DataFrame({
    "Name": ["Alice", "Bob", "Charlie", None],
    "Age": [25, np.nan, 35, None],
    "City": ["NY", None, None, None]
})

print("Original DataFrame:\n", df)

# Fill with constant value
print("\nFill Age with 30:\n", df["Age"].fillna(30))

# Fill with mean use in ML
print("\nFill Age with mean:\n", df["Age"].fillna(df["Age"].mean()))

# Fill with median use in ML
print("\nFill Age with median:\n", df["Age"].fillna(df["Age"].median()))

# Fill with mode use in ML
print("\nFill Age with mode:\n", df["Age"].fillna(df["Age"].mode()[0]))

# Forward fill
print("\nForward Fill (ffill):\n", df["Age"].fillna(method="ffill"))

# Backward fill
print("\nBackward Fill (bfill):\n", df["Age"].fillna(method="bfill"))

# Custom fill values for missing Age Fast Way
df_custom = df.copy()
df_custom.loc[df_custom["Age"].isna(), "Age"] = [30, 40]
print("\nCustom Fill for Age:\n", df_custom)

# Drop rows with NaN
print("\nDrop rows with NaN:\n", df.dropna())

# Drop with thresh=2 (keep rows with ≥2 non-NaN values)
print("\nDrop rows with less than 2 non-NaN values:\n", df.dropna(thresh=2))

# Check where NaN exists
print("\nCheck missing values in Age column:\n", df["Age"].isna())


### 📘 Theory
- Missing values (`NaN`/`None`) are replaced using values in a **looping order**.
- `itertools.cycle` repeats the given list infinitely (e.g. [30,40] → 30,40,30,40…).
- For each column:
  - If a value is missing → replace with `next(cycle)`.
  - If not missing → keep original value.
- This allows filling NaNs with **different values in sequence**, not just one fixed value.


In [1]:
import pandas as pd
import numpy as np
from itertools import cycle

df = pd.DataFrame({
    "Name": ["Alice", "Bob", "Charlie", None],
    "Age": [25, np.nan, 35, None],
    "City": ["NY", None, None, None]
})

# Replacement values
fill_values = {
    "Name": ["Surbhi"],
    "Age": [30, 40],
    "City": ["A", "B", "C"]
}

# Fill NaNs column by column
for col, vals in fill_values.items():
    c = cycle(vals)  # infinite loop over values
    df[col] = df[col].apply(lambda x: next(c) if pd.isna(x) else x)

print(df)


      Name   Age City
0    Alice  25.0   NY
1      Bob  30.0    A
2  Charlie  35.0    B
3   Surbhi  40.0    C
