Lesson 6: Data Cleaning
- Missing values
- Renaming, dropping
- Type conversion


🧰 Topics Covered:

    Handling Missing Values:

        Use fillna() to replace NaN with defaults or statistical values (e.g., mean).

    Renaming Columns:

        rename(columns={...}, inplace=True) to change column labels.

    Applying Functions:

        Use .apply() for column-wise transformations, such as categorizing based on conditions.

    Type Conversion & Dropping (not shown, but typically includes):

        Convert with astype(), and drop with drop() or dropna().

In [1]:
import pandas as pd

In [2]:
df = pd.DataFrame({
    "Name": ["Alice", "Bob", None],
    "Age": [25, None, 30]
})

print(df)

    Name   Age
0  Alice  25.0
1    Bob   NaN
2   None  30.0


Handling Missing Values

In [3]:
df = df.fillna({"Name": "Unknown", "Age": df["Age"].mean()})
print(df)


      Name   Age
0    Alice  25.0
1      Bob  27.5
2  Unknown  30.0


Rename Columns

In [4]:
df.rename(columns={"Name": "FullName"}, inplace=True)
print(df)


  FullName   Age
0    Alice  25.0
1      Bob  27.5
2  Unknown  30.0


Apply a function

In [5]:
df["Age Category"] = df["Age"].apply(lambda x: "Young" if x < 30 else "Adult")
print(df)

  FullName   Age Age Category
0    Alice  25.0        Young
1      Bob  27.5        Young
2  Unknown  30.0        Adult
