# Topic 03 - Problem 10: Handling Missing Values

---

## 1. About the Problem

This problem asks me to handle **missing values** in a dataset.  
Missing data is common in real-world datasets, and there are several ways to handle it:
- **Imputation**: Replacing missing values with a calculated value (mean, median, or mode).
- **Dropping**: Removing rows or columns with missing values.

I will use both methods to handle missing data in a sample dataset and see how the dataset changes.

---


## 2. Solution Code

In [3]:
import pandas as pd
import numpy as np

data = {
    "age": [25, np.nan, 28, 60, 70, np.nan, 65],
    "salary": [50000, 60000, np.nan, 68000, 75000, 57000, 63000],
    "experience": [2, 5, 3, np.nan, 15, 8, 12]
}

df=pd.DataFrame(data)

df_imputed=df.fillna(df.mean())

df_dropped=df.dropna()

print("Imputed data: \n",df_imputed)

print("Dropped data: \n",df_dropped)

Imputed data: 
     age        salary  experience
0  25.0  50000.000000         2.0
1  49.6  60000.000000         5.0
2  28.0  62166.666667         3.0
3  60.0  68000.000000         7.5
4  70.0  75000.000000        15.0
5  49.6  57000.000000         8.0
6  65.0  63000.000000        12.0
Dropped data: 
     age   salary  experience
0  25.0  50000.0         2.0
4  70.0  75000.0        15.0
6  65.0  63000.0        12.0


---

## 3. Summary / Takeaways

By solving this problem, I learned two important ways to handle missing values:
1. **Imputation**: Filling missing values with the **mean**, **median**, or **mode** of the column. This method works well when missing data is relatively small.
2. **Dropping**: Removing rows or columns that have missing values. This is useful when the missing data is too extensive and imputation might introduce bias.

Handling missing data correctly is crucial for building reliable machine learning models. In many real-world scenarios, choosing the right method for missing values can significantly affect the accuracy of the model.  
Next, I want to explore **feature engineering** and how it can improve model performance.
