# Feature Engineering Notes (Theory)

## 1. Feature Engineering
Feature Engineering is the process of transforming raw data into meaningful features that improve model performance.

It includes four main steps:
- **Feature Transformation**
- **Feature Construction**
- **Feature Selection**
- **Feature Extraction**

---

## 2. Feature Transformation

### (a) Missing Value Imputation
- Many datasets have missing values.
- Replace missing values with:
  - **Mean / Median / Mode** → keeps data balanced
  - Drop columns → if too many missing values

---

### (b) Handling Categorical Features
- ML models work with numbers, not text.
- Convert categorical features using:
  - **Label Encoding** → assigns numbers to categories
  - **One-Hot Encoding** → creates binary columns for each category

---

### (c) Outlier Detection
- Outliers = extreme values that can mislead the model.
- Detection methods:
  - **IQR (Interquartile Range)**
- Handling methods:
  - Remove them
  - Cap or transform them

---

### (d) Feature Scaling
- Some algorithms are sensitive to feature magnitude (e.g., Logistic Regression, SVM, KNN).
- Methods:
  - **Standardization** → mean = 0, std = 1
  - **Normalization** → scale values between [0,1]

---

## 3. Feature Construction
Creating new features from existing ones.

Examples:
- **Family Size** = (SibSp + Parch + 1)
- **IsAlone** = 1 if no family, else 0
- **Title extraction** from names (Mr, Mrs, Miss, etc.)

---

## 4. Feature Selection
Not all features are useful. Some may be redundant or noisy.

Techniques:
- **Statistical tests** (Chi-square, ANOVA)
- **Model-based selection** (Random Forest, XGBoost feature importance)

---

## 5. Feature Extraction
Reduce dimensionality while keeping important information.

- **PCA (Principal Component Analysis)** → converts features into fewer dimensions , LDA , TSNA
- Benefits:
  - Reduces overfitting
  - Helps visualization
  - Avoids curse of dimensionality
