**Feature engineering**

Feature engineering is the process of creating, transforming, or selecting features (input variables) to improve the performance of machine learning models. It's one of the most critical steps in the machine learning pipeline. Here are the most important feature engineering techniques, grouped by type:



**Feature Creation**
Creating new features from raw data:

Interaction Features: Multiply or combine two or more features to create interaction terms.

Example: area = length × width

Polynomial Features: Add polynomial terms (e.g., square, cube) of numeric features.

Datetime Features: Extract elements like day, month, weekday, hour from timestamps.

Example: date → year, month, weekday

**Text Features:**

Word count, character count, average word length

Presence of keywords or use of NLP embeddings (TF-IDF, Word2Vec, BERT)

Aggregation Features:

Use group-based statistics (mean, count, sum, etc.) within categories.

Example: Mean salary per department



 **Feature Transformation**

Changing the scale or distribution of features:

Normalization / Min-Max Scaling: Rescales features to a 0-1 range.

Standardization (Z-score): Transforms data to have zero mean and unit variance.

Log/Box-Cox/Power Transformations: Helps handle skewed data.

Quantile Transformation: Maps feature values to a uniform or normal distribution.

Binning: Convert continuous variables into discrete bins (e.g., age groups).

**Handling Missing Values**

Dealing with NaNs or nulls:

Imputation:

Mean, median, mode imputation

KNN or regression-based imputation

Missing Indicator: Add a binary flag column indicating whether a value was missing.

**Encoding Categorical Variables**

Convert categories into numeric formats:

Label Encoding: Assigns a unique integer to each category.

One-Hot Encoding: Creates binary columns for each category.

Target Encoding: Replace categories with the mean of the target variable.

Frequency/Count Encoding: Encode categories with their frequency/count.

**Feature Selection**

Choosing the most important features:

Filter Methods: Use statistical tests (chi-square, ANOVA, correlation).

Wrapper Methods: Recursive Feature Elimination (RFE), Forward/Backward selection.

Embedded Methods: Feature importance from models (e.g., Lasso, Tree-based models).

**Dimensionality Reduction**

Reduce feature space while preserving variance:

Principal Component Analysis (PCA)

t-SNE / UMAP (for visualization)

Autoencoders (neural network-based)

**Domain-Specific Features**

Use knowledge of the problem domain:

Example: In finance, use technical indicators (moving averages, RSI) for stock prediction.

In health, create BMI from height and weight.