# Classification Models Catalog

Folder: `02_classification_models`

This notebook documents **20 popular classification models**, including descriptions, hyperparameter tuning ranges (suitable for GridSearchCV), strengths, and weaknesses.

Assumes a standard supervised setup with `X_train, X_test, y_train, y_test`.

- [Logistic Regression](#Logistic-Regression)

- [K-Nearest Neighbors (KNN)](#K-Nearest-Neighbors-(KNN))

- [Support Vector Machine (SVM / SVC)](#Support-Vector-Machine-(SVM-/-SVC))

- [Decision Tree Classifier](#Decision-Tree-Classifier)

- [Random Forest Classifier](#Random-Forest-Classifier)

- [Gradient Boosting Classifier](#Gradient-Boosting-Classifier)

- [XGBoost Classifier](#XGBoost-Classifier)

- [LightGBM Classifier](#LightGBM-Classifier)

- [CatBoost Classifier](#CatBoost-Classifier)

- [Naive Bayes (GaussianNB)](#Naive-Bayes-(GaussianNB))

- [Linear Discriminant Analysis (LDA)](#Linear-Discriminant-Analysis-(LDA))

- [Quadratic Discriminant Analysis (QDA)](#Quadratic-Discriminant-Analysis-(QDA))

- [Extra Trees Classifier](#Extra-Trees-Classifier)

- [AdaBoost Classifier](#AdaBoost-Classifier)

- [Perceptron](#Perceptron)

- [Passive Aggressive Classifier](#Passive-Aggressive-Classifier)

- [Ridge Classifier](#Ridge-Classifier)

- [SGD Classifier](#SGD-Classifier)

- [Gaussian Process Classifier](#Gaussian-Process-Classifier)

- Model name

- Description

- Importing

- Fitting

- Hyperparameter tuning
            
        includes all commonly tuned parameters + practical value ranges for GridSearch

- Strengths

- Weaknesses

The notebook assumes a standard supervised setup: `X_train, X_test, y_train, y_test`.

In [None]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Logistic Regression
<a id='#Logistic-Regression'></a>


**Description:** Linear model for binary and multiclass classification using the logistic function.

**Importing:**
```python
from sklearn.linear_model import LogisticRegression
```

**Fitting:**
```python
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- C: [0.01, 0.1, 1, 10, 100]
- penalty: ['l1','l2','elasticnet']
- solver: ['liblinear','saga']
- class_weight: [None,'balanced']
```


**Strengths:** Interpretable, fast, strong baseline

**Weaknesses:** Linear decision boundary, sensitive to outliers

# K-Nearest Neighbors (KNN)
<a id='#K-Nearest-Neighbors-(KNN)'></a>



**Description:** Instance-based classifier using majority vote of neighbors.

**Importing:**
```python

from sklearn.neighbors import KNeighborsClassifier

```

**Fitting:**
```python

model = KNeighborsClassifier()
model.fit(X_train, y_train)

```

**Hyperparameter Tuning (GridSearch):**
```python

- n_neighbors: [3,5,7,11,15]
- weights: ['uniform','distance']
- p: [1,2]

```

**Strengths:** Simple, non-parametric

**Weaknesses:** Slow inference, sensitive to scaling

# Support Vector Machine (SVM / SVC)
<a id='#Support-Vector-Machine-(SVM-/-SVC)'></a>



**Description:** Maximum-margin classifier with kernel trick.

**Importing:**
```python
from sklearn.svm import SVC
```

**Fitting:**
```python
model = SVC()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- C: [0.1,1,10,100]
- kernel: ['linear','rbf','poly']
- gamma: ['scale','auto']
```
**Strengths:** Effective in high dimensions

**Weaknesses:** Poor scalability

# Decision Tree Classifier
<a id='#Decision-Tree-Classifier'></a>



**Description:** Tree-based model using recursive feature splits.

**Importing:**
```python
from sklearn.tree import DecisionTreeClassifier
```

**Fitting:**
```python
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- max_depth: [None,5,10,20]
- min_samples_split: [2,5,10]
- min_samples_leaf: [1,2,5]
```

**Strengths:** Easy to interpret, non-linear

**Weaknesses:** Prone to overfitting

# Random Forest Classifier
<a id='#Random-Forest-Classifier'></a>



**Description:** Ensemble of decision trees using bagging.

**Importing:**
```python
from sklearn.ensemble import RandomForestClassifier
```

**Fitting:**
```python
model = RandomForestClassifier(n_estimators=200)
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- n_estimators: [100,300,500]
- max_depth: [None,10,30]
- max_features: ['sqrt','log2']
```

**Strengths:** Robust, strong generalization

**Weaknesses:** Less interpretable, heavy

# Gradient Boosting Classifier
<a id='#Gradient-Boosting-Classifier'></a>


**Description:** Sequential boosting of weak learners.

**Importing:**
```python
from sklearn.ensemble import GradientBoostingClassifier
```

**Fitting:**
```python
model = GradientBoostingClassifier()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- n_estimators: [100,300]
- learning_rate: [0.01,0.1]
- max_depth: [3,5]
```
    
**Strengths:** High accuracy

**Weaknesses:** Sensitive to tuning


# XGBoost Classifier
<a id='#XGBoost-Classifier'></a>



**Description:** Optimized gradient boosting with regularization.

**Importing:**
```python
from xgboost import XGBClassifier
```

**Fitting:**
```python
model = XGBClassifier(eval_metric='logloss')
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- n_estimators: [300,600]
- learning_rate: [0.01,0.1]
- max_depth: [3,6]
- subsample: [0.7,1.0]
```

**Strengths:** State-of-the-art performance

**Weaknesses:** Complex tuning

# LightGBM Classifier
<a id='#LightGBM-Classifier'></a>



**Description:** Histogram-based gradient boosting.

**Importing:**
```python
from lightgbm import LGBMClassifier
```

**Fitting:**
```python
model = LGBMClassifier()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- num_leaves: [31,63]
- learning_rate: [0.01,0.1]
- n_estimators: [300,600]
```

**Strengths:** Very fast, scalable

**Weaknesses:** Can overfit

# CatBoost Classifier
<a id='#CatBoost-Classifier'></a>



**Description:** Boosting with native categorical handling.

**Importing:**
```python
from catboost import CatBoostClassifier
```

**Fitting:**
```python
model = CatBoostClassifier(verbose=0)
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- iterations: [300,600]
- depth: [4,6,8]
- learning_rate: [0.01,0.1]
```

**Strengths:** Minimal preprocessing

**Weaknesses:** Slower training

# Naive Bayes (GaussianNB)
<a id='#Naive-Bayes-(GaussianNB)'></a>



**Description:** Probabilistic classifier based on Bayes theorem.

**Importing:**
```python
from sklearn.naive_bayes import GaussianNB
```

**Fitting:**
```python
model = GaussianNB()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- var_smoothing: [1e-9,1e-8,1e-7]
```

**Strengths:** Very fast, works with small data

**Weaknesses:** Strong independence assumption

# Linear Discriminant Analysis (LDA)
<a id='#Linear-Discriminant-Analysis-(LDA)'></a>



**Description:** Linear classifier maximizing class separation.

**Importing:**
```python
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
```

**Fitting:**
```python
model = LinearDiscriminantAnalysis()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
- solver: ['svd','lsqr','eigen']

**Strengths:** Strong for normally distributed data

**Weaknesses:** Assumes Gaussian classes

# Quadratic Discriminant Analysis (QDA)
<a id='#Quadratic-Discriminant-Analysis-(QDA)'></a>



**Description:** Quadratic decision boundary classifier.

**Importing:**
```python
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
```

**Fitting:**
```python
model = QuadraticDiscriminantAnalysis()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- reg_param: [0,0.1,0.5]
```

**Strengths:** Flexible boundaries

**Weaknesses:** Sensitive to small samples

# Extra Trees Classifier
<a id='#Extra-Trees-Classifier'></a>



**Description:** Extremely randomized tree ensemble.

**Importing:**
```python
from sklearn.ensemble import ExtraTreesClassifier
```

**Fitting:**
```python
model = ExtraTreesClassifier(n_estimators=300)
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- n_estimators: [200,500]
- max_depth: [None,20]
- max_features: ['sqrt','log2']
```
    
**Strengths:** Low variance, fast

**Weaknesses:** Low interpretability

# AdaBoost Classifier
<a id='#AdaBoost-Classifier'></a>



**Description:** Boosting focusing on hard samples.

**Importing:**
```python
from sklearn.ensemble import AdaBoostClassifier
```

**Fitting:**
```python
model = AdaBoostClassifier()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
- n_estimators: [100,300]
- learning_rate: [0.01,0.1,1]

**Strengths:** Improves weak learners

**Weaknesses:** Sensitive to noise

# Perceptron
<a id='#Perceptron'></a>



**Description:** Online linear classifier.

**Importing:**
```python
from sklearn.linear_model import Perceptron
```

**Fitting:**
```python
model = Perceptron()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- alpha: [1e-4,1e-3]
- max_iter: [1000,3000]
```

**Strengths:** Very fast

**Weaknesses:** Only linear

# Passive Aggressive Classifier
<a id='#Passive-Aggressive-Classifier'></a>



**Description:** Online margin-based classifier.

**Importing:**
```python
from sklearn.linear_model import PassiveAggressiveClassifier
```

**Fitting:**
```python
model = PassiveAggressiveClassifier()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- C: [0.01,0.1,1,10]
- max_iter: [1000,3000]
```

**Strengths:** Fast, scalable

**Weaknesses:** Sensitive to noise

# Ridge Classifier
<a id='#Ridge-Classifier'></a>


**Description:** Linear classifier with L2 regularization.

**Importing:**
```python
from sklearn.linear_model import RidgeClassifier
```

**Fitting:**
```python
model = RidgeClassifier()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- alpha: [0.1,1,10]
- class_weight: [None,'balanced']
```
    
**Strengths:** Stable, fast

**Weaknesses:** No probability outputs

# SGD Classifier

**Description:** Stochastic gradient descent classifier.

**Importing:**
```python
from sklearn.linear_model import SGDClassifier
```

**Fitting:**
```python
model = SGDClassifier()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- loss: ['hinge','log_loss']
- alpha: [1e-4,1e-3]
- max_iter: [1000,3000]
```


**Strengths:** Highly scalable

**Weaknesses:** Needs careful tuning



# Gaussian Process Classifier
<a id='#Gaussian-Process-Classifier'></a>

**Description:** Probabilistic non-parametric classifier.

**Importing:**
```python
from sklearn.gaussian_process import GaussianProcessClassifier
```

**Fitting:**
```python
model = GaussianProcessClassifier()
model.fit(X_train, y_train)
```

**Hyperparameter Tuning (GridSearch):**
```python
- kernel: ['RBF','Matern']
- max_iter_predict: [100,300]
```



**Strengths:** Uncertainty estimation

**Weaknesses:** Very slow, poor scaling

<br><br><br><br><br><br><br>


# Features Importances

## Linear Models (Coefficients-based)

**Models:**

- LogisticRegression

- Ridge

- Lasso

- ElasticNet

- BayesianRidge

- HuberRegressor

- PoissonRegressor

- TweedieRegressor

- RidgeClassifier (for completeness)

**Method:**

Use model __`coefficients`__

```python
from sklearn.linear_model import (LogisticRegression, 
                                  RidgeClassifier, 
                                  SGDClassifier)


# for example illustration with Logistic Regression
model = LogisticRegression()
model.fit(X_train, y_train)


importance = pd.Series(
    model.coef_,
    index=feature_names
).sort_values(key=abs, ascending=False)
```

**Notes:**

- Coefficients depend on feature scaling

- Always use StandardScaler beforehand

- Sign indicates direction of effect

## Tree-Based Models (Impurity-based importance)

**Models:**

- DecisionTreeRegressor

- RandomForestRegressor

- ExtraTreesRegressor

- GradientBoostingRegressor

- AdaBoostRegressor

**Method:**

Use model`.feature_importances_`

```python
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import (RandomForestClassifier, 
                              ExtraTreesClassifier,
                              GradientBoostingClassifier, 
                              AdaBoostClassifier)

# for example illustration with Random Forest

model = RandomForestClassifier()
model.fit(X, y)

importance = pd.Series(
    model.feature_importances_,
    index=feature_names
).sort_values(ascending=False)
```

```python
# PREDICTION

y_predicted =  model.predict(X_test)

# Evaluation

mean_squared_error(y_predicted, y_test)

r2_score(y_predicted, y_test)

```

**Notes:**

- Based on mean decrease in impurity

- Biased toward high-cardinality features

- Fast and native

## XGBoost / LightGBM / CatBoost

### XGBoost
```python
import xgboost as xgb


model =  xgb()
model.fit(X,y)


importance = pd.Series(
    model.feature_importances_,
    index=feature_names
).sort_values(ascending=False)
```

**Optional (gain-based, preferred):**

```python
model.get_booster().get_score(importance_type="gain")
```

### LightGBM
```python
import lightgbm as lgb

model =  lgb()
model.fit(X,y)


importance = pd.Series(
    model.feature_importances_,
    index=feature_names
).sort_values(ascending=False)
```

**Optional:**

```python
model.booster_.feature_importance(importance_type="gain")
```

### CatBoost

```python
from catboost import CatBoostClassifier

model = CatBoostClassifier()
model.fit(X,y)


importance = model.get_feature_importance(prettified=True)
```