---

# **Feature Engineering: Scaling, Normalization and Standardization**

Feature engineering is the process of **creating**, **transforming**, or **selecting** the most relevant variables (features) from raw data to improve model performance.
Well-designed features help machine learning models recognize important patterns and relationships, directly influencing how effectively the model learns.

---

## **Why Feature Engineering Matters**

Feature engineering contributes to model building in several important ways:

* **Improves Learning:**
  Well-designed features allow models to capture complex patterns more effectively.

* **Enhances Accuracy:**
  Reduces noise and irrelevant information, leading to better predictions.

* **Prevents Overfitting:**
  Emphasizes meaningful signals, helping models generalize to unseen data.

* **Simplifies Interpretation:**
  Creates more informative and understandable inputs.

There are several techniques used for feature engineering, including **scaling**, **normalization**, and **standardization**.
Below is one commonly used method.

---

# **1. Absolute Maximum Scaling**

Absolute Maximum Scaling rescales each feature by dividing all values by the **maximum absolute value** of that feature.
This transformation ensures that feature values lie within the range **–1 to 1**.

### **Formula**

```text
X_scaled = Xi / max(|X|)
```

### **Key Characteristics**

* Scales values to the range **–1 to 1**
* Simple to apply
* **Highly sensitive to outliers**, since extreme values can distort the scaling

---


In [1]:
import pandas as pd
import numpy as np

df = pd.read_csv('housing.csv')

df = df.select_dtypes(include=np.number)
df.head()

Unnamed: 0,Avg. Area Income,Avg. Area House Age,Avg. Area Number of Rooms,Avg. Area Number of Bedrooms,Area Population,Price
0,79545.45857,5.682861,7.009188,4.09,23086.8005,1059034.0
1,79248.64245,6.0029,6.730821,3.09,40173.07217,1505891.0
2,61287.06718,5.86589,8.512727,5.13,36882.1594,1058988.0
3,63345.24005,7.188236,5.586729,3.26,34310.24283,1260617.0
4,59982.19723,5.040555,7.839388,4.23,26354.10947,630943.5


**Performing Absolute Maximum Scaling**

Computes max absolute value per column with np.max(np.abs(df), axis=0).


Divides each value by that max absolute to scale features between -1 and 1.

Displays first few rows of scaled data with scaled_df.head().

In [2]:
max_abs = np.max(np.abs(df), axis=0)

scaled_df = df / max_abs

scaled_df.head()

Unnamed: 0,Avg. Area Income,Avg. Area House Age,Avg. Area Number of Rooms,Avg. Area Number of Bedrooms,Area Population,Price
0,0.738572,0.596996,0.651436,0.629231,0.331603,0.428921
1,0.735816,0.630617,0.625565,0.475385,0.577019,0.609903
2,0.569044,0.616224,0.791176,0.789231,0.529751,0.428902
3,0.588154,0.755139,0.519233,0.501538,0.49281,0.510564
4,0.556929,0.529521,0.728596,0.650769,0.378533,0.255539


---

## **2. Min-Max Scaling**

Min-Max Scaling transforms feature values by **subtracting the minimum value** of the feature and **dividing by the range** (maximum − minimum).
This method maps the values to a specified range—commonly **0 to 1**—while preserving the original distribution shape.
However, it is **sensitive to outliers**, since it relies on extreme values.

---

### **Formula**

```text
X_scaled = (Xi - Xmin) / (Xmax - Xmin)
```

---

### **Key Characteristics**

* Scales features to a defined range (commonly **0 to 1**)
* Preserves the original distribution's shape
* **Sensitive to outliers**, as min and max can be influenced by extreme values

---

## **Code Example: Performing Min-Max Scaling**

```python
from sklearn.preprocessing import MinMaxScaler
import pandas as pd

# Create MinMaxScaler object
scaler = MinMaxScaler()

# Fit scaler to the data and transform
scaled_data = scaler.fit_transform(df)

# Convert the scaled output back to a DataFrame
scaled_df = pd.DataFrame(scaled_data, columns=df.columns)

# Display first few rows
scaled_df.head()
```

### **Explanation**

* Creates a **MinMaxScaler** object to scale values to the selected range
* Uses `scaler.fit_transform(df)` to scale the data
* Converts the result to a DataFrame to maintain column names
* Shows the first few scaled rows using `scaled_df.head()`

---

In [3]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_data, columns=df.columns)

scaled_df.head()

  from scipy.sparse import csr_matrix, issparse


Unnamed: 0,Avg. Area Income,Avg. Area House Age,Avg. Area Number of Rooms,Avg. Area Number of Bedrooms,Area Population,Price
0,0.686822,0.441986,0.501502,0.464444,0.329942,0.42521
1,0.683521,0.488538,0.464501,0.242222,0.575968,0.607369
2,0.483737,0.468609,0.70135,0.695556,0.528582,0.425192
3,0.50663,0.660956,0.31243,0.28,0.491549,0.507384
4,0.469223,0.348556,0.611851,0.495556,0.376988,0.250702


---

## **3. Normalization (Vector Normalization)**

Normalization scales each data sample (**row**) such that its **vector length (Euclidean norm)** becomes **1**.
This technique focuses on the **direction** of data points rather than their magnitude, making it highly effective for algorithms where **angle** or **cosine similarity** is important, such as:

* Text classification
* Clustering
* Recommendation systems

---

### **Formula**

```text
X_scaled = Xi / ||X||
```

Where:

* **Xi** → each individual feature value
* **||X||** → Euclidean norm (length) of the vector **X**

---

### **Key Characteristics**

* Normalizes each sample to **unit length (1)**
* Focuses on direction rather than magnitude
* Ideal for similarity-based algorithms (e.g., cosine similarity)

---

## **Code Example: Performing Normalization**

```python
from sklearn.preprocessing import Normalizer
import pandas as pd

# Create Normalizer object
normalizer = Normalizer()

# Fit and transform the data
normalized_data = normalizer.fit_transform(df)

# Convert back to DataFrame
normalized_df = pd.DataFrame(normalized_data, columns=df.columns)

# Display first few rows
normalized_df.head()
```

---

### **Explanation**

* Each row (sample) is scaled to have **unit norm**
* Normalization emphasizes the **direction** of data points
* Ideal for algorithms relying on **distance**, **angles**, or **cosine similarity**
* `normalized_df.head()` shows normalized data where **each row is scaled individually**

---



In [4]:
from sklearn.preprocessing import Normalizer

scaler = Normalizer()
scaled_data = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_data, columns=df.columns)

scaled_df.head()

Unnamed: 0,Avg. Area Income,Avg. Area House Age,Avg. Area Number of Rooms,Avg. Area Number of Bedrooms,Area Population,Price
0,0.074883,5e-06,7e-06,4e-06,0.021734,0.996955
1,0.052534,4e-06,4e-06,2e-06,0.026631,0.998264
2,0.057742,6e-06,8e-06,5e-06,0.034749,0.997727
3,0.050168,6e-06,4e-06,3e-06,0.027173,0.998371
4,0.094559,8e-06,1.2e-05,7e-06,0.041546,0.994652


---

# **What is Feature Extraction?**

Feature extraction is the process of **transforming raw data** into a simplified, informative set of features.
This reduces data complexity and highlights the most relevant information, making it easier for machine learning models to analyze patterns and learn efficiently.

Feature extraction improves **model accuracy**, reduces **computational costs**, and focuses learning on the most essential aspects of the data.

---

## **Importance of Feature Extraction**

Feature extraction is important for several reasons:

### **1. Reduced Computational Cost**

Raw data—especially images, audio, or large datasets—can be extremely complex.
Feature extraction simplifies the data, reducing memory usage and processing requirements.

### **2. Improved Model Performance**

By focusing on the most relevant information, models achieve **higher accuracy** and learn more effectively.

### **3. Better Insights**

Reducing the number of features eliminates noise and irrelevant information, helping in deeper data understanding.

### **4. Prevention of Overfitting**

Too many features may cause a model to memorize the training data.
Feature extraction simplifies the dataset, increasing generalization and reducing overfitting risk.

---

# **Key Techniques for Feature Extraction**

Feature extraction methods vary based on data type and problem requirements.
Below are some widely used techniques:

---

## **1. Statistical Methods**

Statistical methods summarize and describe essential patterns in the data.

**Common attributes include:**

* **Mean:** Average value of a dataset
* **Median:** Middle value after sorting
* **Standard Deviation:** Measures spread or dispersion
* **Correlation & Covariance:** Show relationships between variables
* **Regression Analysis:** Models relationships between dependent and independent variables

These techniques help represent the **central tendency**, **spread**, and **relationships** within a dataset.

---

## **2. Dimensionality Reduction**

Dimensionality reduction simplifies datasets by reducing the number of features while preserving important information.

Popular methods include:

### **• Principal Component Analysis (PCA)**

Identifies components that capture the most variance in the data.

### **• Linear Discriminant Analysis (LDA)**

Maximizes class separability by finding optimal feature combinations.

### **• t-Distributed Stochastic Neighbor Embedding (t-SNE)**

Reduces high-dimensional data into 2D or 3D for visualization, especially useful for complex datasets.

# **Choosing the Right Method**

Selecting an appropriate feature extraction method depends on:

* Type of data (images, text, signals, tabular data)
* Objective of the ML task
* Available computational resources
* Domain knowledge and expertise

### **Challenges**

* **Information Loss:** Simplifying data may remove important details
* **Computational Complexity:** Some methods are resource-heavy, especially for large datasets

---




# **What is Feature Engineering?**

Feature Engineering is the process of **selecting, creating, or modifying features**—the input variables—to help machine learning models learn patterns more effectively.
It transforms raw, messy data into **meaningful, structured inputs** that improve model accuracy, performance, and reliability.

---

Feature engineering may involve tasks such as handling missing values, encoding categorical variables, scaling numerical features, creating new features, or combining existing ones.

---

# **Importance of Feature Engineering**

Feature engineering can significantly influence the quality and performance of machine learning models.

### **Benefits**

* **Improve Accuracy**
  Selecting and creating the right features helps the model learn better, resulting in more accurate predictions.

* **Reduce Overfitting**
  Using fewer but more important features helps the model generalize and avoid memorizing the data.

* **Boost Interpretability**
  Well-crafted features make it easier to understand how the model makes decisions.

* **Enhance Efficiency**
  Focusing on meaningful features reduces training time and computational cost.

---

# **Processes Involved in Feature Engineering**

Below are core processes used in feature engineering:

---

## **1. Feature Creation**

Feature creation involves generating new features using:

* **Domain Knowledge:** Based on industry or business rules
* **Data-Driven Insights:** Derived by observing patterns
* **Synthetic Features:** Combining or transforming existing variables

---

## **2. Feature Transformation**

Adjusts features to improve model performance:

* **Normalization & Scaling**
* **Encoding Categorical Data** (e.g., one-hot encoding)
* **Mathematical Transformations**

  * Log transformations for skewed distributions
  * Polynomial transformations

---

## **3. Feature Extraction**

Extracts meaningful information while reducing dimensionality:

* **Dimensionality Reduction (PCA, LDA)**
* **Aggregation & Combination** (sum, average, ratios)

---

## **4. Feature Selection**

Chooses the most impactful subset of features:

* **Filter Methods** (correlation, chi-square, ANOVA)
* **Wrapper Methods** (forward selection, RFE)
* **Embedded Methods** (Lasso, tree-based methods)

---

## **5. Feature Scaling**

Ensures all features contribute equally to the model:

* **Min-Max Scaling** (0–1 range)
* **Standard Scaling** (mean = 0, variance = 1)

---

# **Steps in Feature Engineering**

Although steps may vary by problem, the general workflow includes:

### **1. Data Cleaning**

Identify and correct missing values, inconsistencies, and errors to maintain data quality.

### **2. Data Transformation**

Prepare data for modeling by applying scaling, normalization, encoding, and formatting.

### **3. Feature Extraction**

Create new features by deriving or combining existing ones to provide more meaningful input to the model.

### **4. Feature Selection**

Choose relevant features using techniques like correlation analysis, mutual information, or stepwise regression.

### **5. Feature Iteration**

Refine features continuously based on model performance—adding, removing, or modifying features to improve results.

---

---

# **Feature Selection Techniques in Machine Learning**

**Last Updated:** 20 Nov, 2025

Feature selection is the process of choosing the **most useful** input features for a machine learning model.
It helps improve performance, reduces noise, and makes results easier to interpret.

---

## **Why Feature Selection Is Important**

Feature selection plays a vital role in building efficient ML models:

* Removes irrelevant and redundant features
* Improves accuracy and reduces overfitting
* Speeds up training and prediction
* Makes models simpler and more interpretable

---

## **Need for Feature Selection**

Feature selection methods are essential for the following reasons:

* **Improved Accuracy:** Models perform better with relevant inputs.
* **Faster Training:** Fewer features reduce computational time.
* **Greater Interpretability:** Easier to understand model behavior.
* **Avoiding Curse of Dimensionality:** Reduces complexity in high-dimensional datasets.

---

# **Types of Feature Selection Methods**

Feature selection techniques are grouped into **three main categories**, each offering different advantages depending on the use case.

---

# **1. Filter Methods**

Filter methods evaluate each feature **independently of the model** by measuring its relationship with the target variable.
They are used during preprocessing to remove irrelevant or redundant features based on **statistical tests** or other criteria.

### **Common Filter Techniques**

* **Information Gain** – Measures entropy reduction
* **Chi-square Test** – Evaluates relationships in categorical data
* **Fisher’s Score** – Ranks features based on class separability
* **Pearson Correlation Coefficient** – Linear relationship between continuous variables
* **Variance Threshold** – Removes features with low variance
* **Mean Absolute Difference** – Measures variability
* **Dispersion Ratio** – Compares arithmetic mean to geometric mean

### **Advantages**

* Fast and computationally efficient
* Easy to implement
* Works with any ML model (model-independent)

### **Limitations**

* Doesn’t consider feature interactions
* Performance depends on choosing the right statistical metric

---

# **2. Wrapper Methods**

Wrapper methods evaluate feature subsets by **training a model** and selecting combinations that improve performance.
They try various feature combinations and choose the best-performing subset.

### **Common Wrapper Techniques**

* **Forward Selection** – Begin with no features, add one at a time
* **Backward Elimination** – Begin with all features, remove one at a time
* **Recursive Feature Elimination (RFE)** – Repeatedly remove least important features

### **Advantages**

* Model-specific optimization
* Leads to potentially better performance
* Flexible with different evaluation metrics

### **Limitations**

* Computationally expensive
* Risk of overfitting
* Not suitable for large datasets

---

# **3. Embedded Methods**

Embedded methods perform feature selection **during the model training process**, combining the strengths of filter and wrapper methods.

### **Common Embedded Techniques**

* **L1 Regularization (Lasso Regression)** – Removes features with zero coefficients
* **Decision Trees & Random Forests** – Select features based on impurity reduction
* **Gradient Boosting Models** – Choose features that reduce prediction error

### **Advantages**

* Efficient and computationally lighter than wrapper methods
* Model learns feature importance automatically during training

### **Limitations**

* Less interpretable than filter methods
* Not all algorithms support embedded feature selection

---

# **Choosing the Right Feature Selection Method**

The best method depends on the following factors:

* **Dataset Size:**

  * Large datasets → *Filter methods*
  * Small datasets → *Wrapper methods*

* **Model Type:**

  * Tree-based models have built-in feature selection

* **Interpretability Needs:**

  * Use *filter methods* if transparency is important

* **Computational Resources:**

  * Wrapper methods are resource-intensive

---

By applying the right feature selection techniques, we can **improve model performance**, **reduce computation**, and **build more reliable machine learning systems**.

---
