# An overview of machine learning models

## Tree-Based Models

| Model                 | Description                                                                                                  |
| --------------------- | ------------------------------------------------------------------------------------------------------------ |
| **Decision Trees**    | Simple, interpretable models that split data based on feature thresholds. Prone to overfitting.              |
| **Random Forest**     | An ensemble of decision trees trained on random subsets of data and features (bagging). Reduces overfitting. |
| **Gradient Boosting** | Sequentially builds trees that correct errors of previous ones. Slower, but more accurate.                   |
| **XGBoost**           | Optimized implementation of gradient boosting. Fast, regularized, and widely used in competitions.           |
| **LightGBM**          | Gradient boosting library focused on speed and memory efficiency. Good for large datasets.                   |
| **CatBoost**          | Gradient boosting that handles categorical features natively. Great out-of-the-box performance.              |


## Linear models

| Model                      | Description                                                                     |
| -------------------------- | ------------------------------------------------------------------------------- |
| **Linear Regression**      | Predicts continuous values using a weighted sum of features. Easy to interpret. |
| **Logistic Regression**    | For binary classification. Outputs probability; still linear in nature.         |
| **Ridge/Lasso Regression** | Linear regression with regularization (L2 or L1) to reduce overfitting.         |


## Neural networks

| Model                                    | Description                                                            |
| ---------------------------------------- | ---------------------------------------------------------------------- |
| **Multilayer Perceptron (MLP)**          | Fully connected feedforward neural network. Good for tabular data.     |
| **Convolutional Neural Networks (CNNs)** | Designed for image data. Capture spatial patterns using filters.       |
| **Recurrent Neural Networks (RNNs)**     | Designed for sequential data (e.g. time series, text).                 |
| **Transformers**                         | Modern neural architecture for sequences, especially effective in NLP. |


## Others

| Type| Model                        | Description                                                                                      |
| -----| ---------------------------- | ------------------------------------------------------------------------------------------------ |
|Support Vector Machines (SVM)| **SVM Classifier/Regressor** | Finds a hyperplane that maximizes margin between classes. Works well with high-dimensional data. |
|Instance based methods| **k-Nearest Neighbors (k-NN)** | Predicts by looking at the closest k data points in the training set. No training time, slow prediction. |
|Naive Bayes| **Gaussian/Bernoulli/Multinomial Naive Bayes** | Probabilistic classifier based on Bayes' theorem. Assumes feature independence. Surprisingly effective for text. |

## Clustering (unsupervised)

(These come back in a later chapter)

| Model                       | Description                                                            |
| --------------------------- | ---------------------------------------------------------------------- |
| **K-Means**                 | Groups data into k clusters by minimizing within-cluster distance.     |
| **Hierarchical Clustering** | Builds a tree of clusters via bottom-up or top-down approach.          |
| **DBSCAN**                  | Density-based clustering — detects clusters of varying shape and size. |


## Time series models

(These also get their very own chapter.)

| Model                                                                 | Description                                                                                                      |
| --------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
| **ARIMA (AutoRegressive Integrated Moving Average)**                  | Classic statistical model. Good for univariate forecasting when data is stationary or can be differenced.        |
| **SARIMA (Seasonal ARIMA)**                                           | Extension of ARIMA to handle **seasonality** explicitly.                                                         |
| **Exponential Smoothing (ETS)**                                       | Models level, trend, and seasonality with smoothing coefficients. Good for simple, interpretable forecasts.      |
| **Prophet (by Facebook)**                                             | Decomposes time series into trend + seasonality + holidays. Easy to tune, works well on business data.           |
| **Vector AutoRegression (VAR)**                                       | Generalizes ARIMA to multivariate time series — models relationships between multiple time-dependent variables.  |
| **LSTM (Long Short-Term Memory networks)**                            | A type of RNN that can learn long-term dependencies. Powerful for sequential or time series data.                |
| **Temporal Convolutional Networks (TCNs)**                            | Use 1D convolutions over time series. Often faster and more stable than RNNs.                                    |
| **Transformer-based models (e.g. Time Series Transformer, Informer)** | Recent advances in time series using attention mechanisms; good for long sequences and multivariate forecasting. |
| **XGBoost, LightGBM for time series**                                 | Tree-based models can also be used for time series **if you engineer lag, rolling, and time features manually**. |
