# Regression Models Catalog

This notebook documents 20 popular regression models, including descriptions, hyperparameter tuning ranges (suitable for GridSearchCV), strengths, and weaknesses.

- [Linear Regression](#Linear-Regression)

- [Ridge Regression](#Ridge-Regression)

- [Lasso Regression](#Lasso-Regression)

- [Elastic Net Regression](#Elastic-Net-Regression)

- [Decision Tree Regressor](#Decision-Tree-Regressor)

- [Random Forest Regressor](#Random-Forest-Regressor)

- [Gradient Boosting Regressor](#Gradient-Boosting-Regressor)

- [XGBoost Regressor](#XGBoost-Regressor)

- [LightGBM Regressor](#LightGBM-Regressor)

- [CatBoost Regressor](#CatBoost-Regressor)

- [Support Vector Regression (SVR)](#Support-Vector-Regression-(SVR))

- [K-Nearest Neighbors Regressor](#K-Nearest-Neighbors-Regressor)

- [Bayesian Ridge Regression](#Bayesian-Ridge-Regression)

- [Huber Regressor](#Huber-Regressor)

- [Extra Trees Regressor](#Extra-Trees-Regressor)

- [AdaBoost Regressor](#AdaBoost-Regressor)

- [Poisson Regressor](#Poisson-Regressor)

- [Tweedie Regressor](#Tweedie-Regressor)

- [Passive Aggressive Regressor](#Passive-Aggressive-Regressor)

- [Gaussian Process Regressor](#Gaussian-Process-Regressor)

- Model name

- Description

- Importing

- Fitting

- Hyperparameter tuning
            
        includes all commonly tuned parameters + practical value ranges for GridSearch

- Strengths

- Weaknesses

The notebook assumes a standard supervised setup: `X_train, X_test, y_train, y_test`.

Folder: `01_regression_models`

This notebook documents **20 popular regression models**, including descriptions, hyperparameter tuning ranges (suitable for GridSearchCV), strengths, and weaknesses.

Assumes a standard supervised setup with `X_train, X_test, y_train, y_test`.

In [None]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score

# Linear Regression
<a id='#Linear-Regression'></a>



**Description:** Ordinary Least Squares linear regression.

**Importing:**
```python
from sklearn.linear_model import LinearRegression
```

**Fitting:**
```python
model = LinearRegression()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- fit_intercept: [True, False]
- copy_X: [True, False]
- n_jobs: [None, -1]
```


**Strengths:** Simple, interpretable, fast

**Weaknesses:** Assumes linearity, sensitive to outliers

# Ridge Regression
<a id='#Ridge-Regression'></a>



**Description:** L2-regularized linear regression.

**Importing:**
```python
from sklearn.linear_model import Ridge
```

**Fitting:**
```python
model = Ridge()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- alpha: [1e-4,1e-3,1e-2,0.1,1,10,100]
- solver: ['auto','svd','lsqr']
- max_iter: [1000,5000]
```


**Strengths:** Handles multicollinearity

**Weaknesses:** No feature selection


# Lasso Regression
<a id='#Lasso-Regression'></a>



**Description:** L1-regularized linear regression with feature selection.

**Importing:**
```python
from sklearn.linear_model import Lasso
```

**Fitting:**
```python
model = Lasso()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- alpha: [1e-4,1e-3,1e-2,0.1,1]
- max_iter: [1000,5000]
- selection: ['cyclic','random']
```



**Strengths:** Sparse solutions

**Weaknesses:** Unstable with correlated features

# Elastic Net Regression
<a id='#Elastic-Net-Regression'></a>



**Description:** Combination of L1 and L2 regularization.

**Importing:**
```python
from sklearn.linear_model import ElasticNet
```

**Fitting:**
```python
model = ElasticNet()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- alpha: [1e-4,1e-3,1e-2,0.1,1]
- l1_ratio: [0.1,0.3,0.5,0.7,0.9]
- max_iter: [1000,5000]
```



**Strengths:** Balances Lasso and Ridge

**Weaknesses:** More complex tuning

# Decision Tree Regressor
<a id='#Decision-Tree-Regressor'></a>



**Description:** Tree-based non-linear regression.

**Importing:**
```python
from sklearn.tree import DecisionTreeRegressor
```

**Fitting:**
```python
model = DecisionTreeRegressor()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- max_depth: [None,5,10,20]
- min_samples_split: [2,5,10]
- min_samples_leaf: [1,2,5]
- max_features: [None,'sqrt','log2']
```


**Strengths:** Captures non-linearities

**Weaknesses:** Prone to overfitting

# Random Forest Regressor
<a id='#Random-Forest-Regressor'></a>


**Description:** Ensemble of trees using bagging.

**Importing:**
```python
from sklearn.ensemble import RandomForestRegressor
```

**Fitting:**
```python
model = RandomForestRegressor(n_estimators=200)
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- n_estimators: [100,300,500]
- max_depth: [None,10,30]
- max_features: ['sqrt','log2']
- min_samples_leaf: [1,2,5]
```



**Strengths:** Robust and accurate

**Weaknesses:** Less interpretable


# Gradient Boosting Regressor
<a id='#Gradient-Boosting-Regressor'></a>



**Description:** Sequential boosting of weak learners.

**Importing:**
```python
from sklearn.ensemble import GradientBoostingRegressor
```

**Fitting:**
```python
model = GradientBoostingRegressor()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- n_estimators: [100,300,500]
- learning_rate: [0.01,0.05,0.1]
- max_depth: [2,3,5]
- subsample: [0.6,0.8,1.0]
```



**Strengths:** High predictive power

**Weaknesses:** Sensitive to tuning

# XGBoost Regressor
<a id='#XGBoost-Regressor'></a>



**Description:** Optimized gradient boosting with regularization.

**Importing:**
```python
from xgboost import XGBRegressor
```

**Fitting:**
```python
model = XGBRegressor(objective='reg:squarederror')
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- n_estimators: [200,500,800]
- learning_rate: [0.01,0.05,0.1]
- max_depth: [3,5,7]
- subsample: [0.6,0.8,1.0]
- colsample_bytree: [0.6,0.8,1.0]
```



**Strengths:** State-of-the-art performance

**Weaknesses:** Complex tuning

# LightGBM Regressor
<a id='#LightGBM-Regressor'></a>



**Description:** Histogram-based gradient boosting.

**Importing:**
```python
from lightgbm import LGBMRegressor
```

**Fitting:**
```python
model = LGBMRegressor()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- num_leaves: [31,63,127]
- learning_rate: [0.01,0.05,0.1]
- n_estimators: [200,500]
- max_depth: [-1,10,20]
```



**Strengths:** Very fast

**Weaknesses:** Can overfit

# CatBoost Regressor
<a id='#CatBoost-Regressor'></a>



**Description:** Boosting with categorical handling.

**Importing:**
```python
from catboost import CatBoostRegressor
```

**Fitting:**
```python
model = CatBoostRegressor(verbose=0)
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- iterations: [300,600,1000]
- learning_rate: [0.01,0.05,0.1]
- depth: [4,6,8]
```


**Strengths:** Minimal preprocessing

**Weaknesses:** Slower training

# Support Vector Regression (SVR)
<a id='#Support-Vector-Regression-(SVR)'></a>



**Description:** Support Vector Regression.

**Importing:**
```python
from sklearn.svm import SVR
```

**Fitting:**
```python
model = SVR()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- C: [0.1,1,10,100]
- epsilon: [0.01,0.1,0.2]
- kernel: ['linear','rbf','poly']
```



**Strengths:** Effective in high dimensions

**Weaknesses:** Poor scalability

# K-Nearest Neighbors Regressor
<a id='#K-Nearest-Neighbors-Regressor'></a>


**Description:** Instance-based regression.

**Importing:**
```python
from sklearn.neighbors import KNeighborsRegressor
```

**Fitting:**
```python
model = KNeighborsRegressor()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- n_neighbors: [3,5,7,11,15]
- weights: ['uniform','distance']
- p: [1,2]
```


**Strengths:** Simple, intuitive

**Weaknesses:** Slow inference


# Bayesian Ridge Regression
<a id='#Bayesian-Ridge-Regression'></a>



**Description:** Bayesian linear regression.

**Importing:**
```python
from sklearn.linear_model import BayesianRidge
```

**Fitting:**
```python
model = BayesianRidge()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- alpha_1: [1e-6,1e-5]
- alpha_2: [1e-6,1e-5]
- lambda_1: [1e-6,1e-5]
- lambda_2: [1e-6,1e-5]
```



**Strengths:** Uncertainty estimation

**Weaknesses:** Slower than OLS

# Huber Regressor
<a id='#Huber-Regressor'></a>



**Description:** Robust regression for outliers.

**Importing:**
```python
from sklearn.linear_model import HuberRegressor
```

**Fitting:**
```python
model = HuberRegressor()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- epsilon: [1.1,1.35,1.5,2.0]
- alpha: [1e-4,1e-3,1e-2]
- max_iter: [100,300,1000]
```




**Strengths:** Outlier resistant

**Weaknesses:** Slower convergence

# Extra Trees Regressor
<a id='#Extra-Trees-Regressor'></a>


**Description:** Extremely randomized trees.

**Importing:**
```python
from sklearn.ensemble import ExtraTreesRegressor
```

**Fitting:**
```python
model = ExtraTreesRegressor(n_estimators=300)
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- n_estimators: [200,500]
- max_depth: [None,10,30]
- max_features: ['sqrt','log2']
```




**Strengths:** Low variance

**Weaknesses:** Low interpretability


# AdaBoost Regressor
<a id='#AdaBoost-Regressor'></a>



**Description:** Boosting focused on hard samples.

**Importing:**
```python
from sklearn.ensemble import AdaBoostRegressor
```

**Fitting:**
```python
model = AdaBoostRegressor()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- n_estimators: [50,100,300]
- learning_rate: [0.01,0.1,1.0]
- loss: ['linear','square','exponential']
```




**Strengths:** Improves weak learners

**Weaknesses:** Sensitive to noise

# Poisson Regressor
<a id='#Poisson-Regressor'></a>



**Description:** GLM for count data.

**Importing:**
```python
from sklearn.linear_model import PoissonRegressor
```

**Fitting:**
```python
model = PoissonRegressor()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- alpha: [0,0.01,0.1,1]
- max_iter: [100,300,1000]
```



**Strengths:** Good for counts

**Weaknesses:** Distribution assumptions

# Tweedie Regressor
<a id='#Tweedie-Regressor'></a>



**Description:** Flexible generalized linear model.

**Importing:**
```python
from sklearn.linear_model import TweedieRegressor
```

**Fitting:**
```python
model = TweedieRegressor()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- power: [0,1,1.5,2]
- alpha: [0,0.01,0.1,1]
- link: ['identity','log']
```




**Strengths:** Distribution flexibility

**Weaknesses:** Needs domain knowledge

# Passive Aggressive Regressor
<a id='#Passive-Aggressive-Regressor'></a>


**Description:** Online learning regression model.

**Importing:**
```python
from sklearn.linear_model import PassiveAggressiveRegressor
```

**Fitting:**
```python
model = PassiveAggressiveRegressor()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- C: [0.01,0.1,1,10]
- epsilon: [0.01,0.1,0.2]
- max_iter: [500,1000]
```




**Strengths:** Very fast

**Weaknesses:** Sensitive to noise


# Gaussian Process Regressor
<a id='#Gaussian-Process-Regressor'></a>

**Description:** Non-parametric probabilistic regression.

**Importing:**
```python
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, Matern
```

**Fitting:**
```python
kernel = RBF()
model = GaussianProcessRegressor(kernel=kernel)
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- kernel: [RBF(), Matern()]
- alpha: [1e-10,1e-5,1e-2]
- normalize_y: [True, False]
```



**Strengths:** Uncertainty estimation

**Weaknesses:** Poor scalability

<br><br><br><br><br><br><br>


# Features Importances

## Linear Models (Coefficients-based)

**Models:**

- LinearRegression

- Ridge

- Lasso

- ElasticNet

- BayesianRidge

- HuberRegressor

- PoissonRegressor

- TweedieRegressor

- RidgeClassifier (for completeness)

**Method:**

Use model __`coefficients`__

```python
from sklearn.linear_model import (LinearRegression, 
                                  Ridge, 
                                  Lasso, 
                                  ElasticNet, 
                                  HuberRegressor, 
                                  PoissonRegressor, 
                                  TweedieRegressor,
                                  RidgeClassifier)


# for example illustration with linear regression
model = LinearRegression()
model.fit(X_train, y_train)


importance = pd.Series(
    model.coef_,
    index=feature_names
).sort_values(key=abs, ascending=False)
```

**Notes:**

- Coefficients depend on feature scaling

- Always use StandardScaler beforehand

- Sign indicates direction of effect

## Tree-Based Models (Impurity-based importance)

**Models:**

- DecisionTreeRegressor

- RandomForestRegressor

- ExtraTreesRegressor

- GradientBoostingRegressor

- AdaBoostRegressor

**Method:**

Use model`.feature_importances_`

```python
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import (RandomForestRegressor, 
                              ExtraTreesRegressor, 
                              GradientBoostingRegressor, 
                              AdaBoostRegressor)



# for example illustration with Random Forest

model = RandomForestRegressor()
model.fit(X, y)


importance = pd.Series(
    model.feature_importances_,
    index=feature_names
).sort_values(ascending=False)
```

```python
# PREDICTION

y_predicted =  model.predict(X_test)

# Evaluation

mean_squared_error(y_predicted, y_test)

r2_score(y_predicted, y_test)

```

**Notes:**

- Based on mean decrease in impurity

- Biased toward high-cardinality features

- Fast and native

## XGBoost / LightGBM / CatBoost

### XGBoost
```python
import xgboost as xgb


model =  xgb()
model.fit(X,y)


importance = pd.Series(
    model.feature_importances_,
    index=feature_names
).sort_values(ascending=False)
```

**Optional (gain-based, preferred):**

```python
model.get_booster().get_score(importance_type="gain")
```

### LightGBM
```python
import lightgbm as lgb

model =  lgb()
model.fit(X,y)


importance = pd.Series(
    model.feature_importances_,
    index=feature_names
).sort_values(ascending=False)
```

**Optional:**

```python
model.booster_.feature_importance(importance_type="gain")
```

### CatBoost

```python
from catboost import CatBoostClassifier

model = CatBoostClassifier()
model.fit(X,y)


importance = model.get_feature_importance(prettified=True)
```