<a href="https://colab.research.google.com/github/samiha-mahin/Data-Analysis/blob/main/Handling_Missing_Values_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Frequent-Value Imputation**

---

### 💡 Frequent-Value Imputation (Most Common)

It means filling missing values with the **most common value** (also called the **mode**) in that column.

---

### 📊 Example:

| Gender |
| ------ |
| Male   |
| NaN    |
| Female |
| NaN    |
| Male   |

The most frequent value is **"Male"**, so after imputation:

| Gender |
| ------ |
| Male   |
| Male   |
| Female |
| Male   |
| Male   |

---

### ✅ Why use it?

* It’s good for **categorical columns** (like Gender, Country).
* Keeps the data consistent and simple.



In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
df = pd.read_csv('train.csv',usecols=['GarageQual','FireplaceQu','SalePrice'])
df.head()

Unnamed: 0,FireplaceQu,GarageQual,SalePrice
0,,TA,208500
1,TA,TA,181500
2,TA,TA,223500
3,Gd,TA,140000
4,TA,TA,250000


In [3]:
df.isnull().mean()*100

Unnamed: 0,0
FireplaceQu,47.260274
GarageQual,5.547945
SalePrice,0.0


In [6]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(df.drop(columns=['SalePrice']),df['SalePrice'],test_size=0.2)

In [7]:
from sklearn.impute import SimpleImputer

imputer = SimpleImputer(strategy='most_frequent')

X_train = imputer.fit_transform(X_train)
X_test = imputer.transform(X_train)



In [8]:
imputer.statistics_

array(['Gd', 'TA'], dtype=object)