# Feature Engineering

Feature Engineering is the process of transforming raw data into features that better represent the problem to the model, leading to improved performance.


- Understanding Problem Statement include:
    - **Data Ingestion**
    - **Data Preparation/ Preprocessing**
        - Missing Value Treatment
        - Class Imbalance
        - Outlier treatment
        - Data Encoding
        - Feature Engineering/ Feature Extraction

![image.png](attachment:image.png)

## Feature Extraction:

Process of selecting and extracting the relevant feature from the raw data.

- Creating new feature
- Modifying the existing feature
- Selecting the right feature

In ML, some feature can not relevant for model, data prediction.

# Curse of Dimensionality

The **Curse of Dimensionality** refers to problems that arise when we work with data having many features (high dimensions).  
As the number of dimensions increases:  
- Data becomes sparse (scattered).  
- Distance between points becomes less meaningful.  
- Models may overfit because they try to capture noise.  


![image.png](attachment:image.png)

# How to solve feature extraction?

## 1. Creating New Features

Feature Creation means making new variables from existing data to give models more useful information.  
This often improves accuracy and captures hidden relationships.



## 2. Modifying the Existing Data

Sometimes raw data is messy.  
We modify it to make it cleaner and more useful for machine learning.

## 3. Feature Scaling

Feature Scaling is the process of bringing all numerical features into the same range.  
It is important because many ML algorithms (like KNN, SVM, Gradient Descent) are sensitive to the scale of data.


### Why Feature Scaling is Needed?

*Many of algorithms are distance based that's why if high data, it becomes computationally expensive.*

> Optimisation become taster

> Interpretor become easier

Different features may have very different ranges.  
Example:  
- Age → 18 to 60  
- Salary → 20,000 to 1,20,000  

If we don’t scale:  
- Large values (like Salary) dominate small values (like Age).  
- Distance-based models (KNN, K-means) and gradient-based methods (Linear Regression, Neural Nets) get biased.  


## Types of Feature Scaling

There are different techniques to scale features.  
Choice depends on algorithm and data distribution.  

### 1. Min-Max Scaling (Normalization)
- Scales values between **0 and 1**.  
- Formula:  
  \[
  x' = \frac{x - min(x)}{max(x) - min(x)}
  \]
- Useful for algorithms that need bounded values (e.g., Neural Networks, KNN).

```python
scaler = MinMaxScaler()
df_minmax = scaler.fit_transform(df)
print("Min-Max Scaling:\n", df_minmax)
```

### 2. Standardization (Z-score Scaling)

Scales data to mean = 0, std = 1.

Formula:

`𝑥' = x-𝜇 / σ`

Works well if data follows a normal distribution.

Used in Logistic Regression, SVM, Linear Regression.
```python
scaler = StandardScaler()
df_standard = scaler.fit_transform(df)
print("Standardization:\n", df_standard)
```

### 3. Robust Scaling

Uses median and IQR (Interquartile Range).

Less sensitive to outliers.

Formula: `𝑥' = x-Median / IQR`
```python
scaler = RobustScaler()
df_robust = scaler.fit_transform(df)
print("Robust Scaling:\n", df_robust)
```

### 4. Normalization (Vector Normalization)

Scales each row so that its length = 1 (unit norm).

Useful in text classification, cosine similarity, NLP.

```python
scaler = Normalizer()
df_norm = scaler.fit_transform(df)
print("Normalization:\n", df_norm)
```



## Feature Selection

Feature Selection is the process of selecting **only the most important features** for a machine learning model.  
It helps in:
- Reducing overfitting  
- Improving accuracy  
- Decreasing training time  


- Feature Selection Methods:

    - **Univariate Selection:** Select features based on statistical tests.
    - **Tree-based feature importance:** Random Forest, XGBoost.
    - **Correlation analysis:** Remove redundant features.
    - **Regularization-based methods:** Lasso, Ridge (penalizes less important features).

- Benefit: Smaller, cleaner dataset → faster and better model.

---

## Filter Method in Feature Selection

**Filter Method** selects features **based on statistical measures** without involving any machine learning model.  
- Works independently of any algorithm.  
- Uses correlation, chi-square, ANOVA, or mutual information.  
- Fast and simple, especially for large datasets.

### Key Points

- Filter Method Advantages:
    - Fast
    - Works for high-dimensional data
    - Simple to implement

- Disadvantages:
    - Ignores interaction between features
    - Might select redundant features

- Conclusion
    - Filter Method = feature selection based on statistics.
    - Other methods like Wrapper and Embedded consider model performance, while Filter is independent of any model.
---
## Embedded Method in Feature Selection

**Embedded Methods** perform feature selection **during model training**.  
- The model itself decides which features are important.  
- Combines advantages of Filter and Wrapper methods.  
- Common algorithms: **Lasso (L1), Ridge (L2), Decision Trees, Random Forest, XGBoost**.

### Key Points

- Advantages:
    - Considers interaction with model
    - Automatically selects features during training
    - Less computationally expensive than wrapper methods
- Disadvantages:
    - Depends on model choice
    - Thresholds for importance are subjective
- Conclusion
    - Embedded Method = feature selection integrated with model training
    - Examples: Lasso, Ridge, Decision Trees, Random Forest, XGBoost
    - Balances speed of Filter method and accuracy of Wrapper method
---
## Wrapper Method in Feature Selection

**Wrapper Methods** select features **by trying different combinations** and evaluating performance with a machine learning model.  
- Uses a predictive model to score feature subsets.  
- Can be **forward selection, backward elimination, or recursive feature elimination (RFE)**.  
- More accurate than Filter method but **slower**.

### Key Points
- Advantages:
    - Usually gives higher accuracy because it considers interaction between features
- Disadvantages:
    - Computationally expensive for large datasets
    - Can overfit if data is small
- Conclusion
    - Wrapper Method = trial-and-error with a predictive model
    - Recursive Feature Elimination (RFE) is the most common technique
    - Slower than Filter and Embedded methods, but often more accurate