# Choosing the right Machine Learning model

## There are many groups of models out there:

- Regression Algorithms
- Instance-based Algorithms
- Regularization Algorithms   
- Decision Tree Algorithms
- Bayesian Algorithms
- Ensemble Algorithms
- Clustering Algorithms
- Deep Learning Algorithms
- Dimensionality Reduction Algorithms
- and more..

### How to choose the right one for your problem?

- Know your data
- Know your priorities
- Know the involved trade-offs
- ...

This lecture will further introduce you to different families of ML algorithms and help you choose the ones most suitable for your problem

## Regression Algorithms

Explicitly modeling the relationship between variables that is iteratively refined using a measure of error in the predictions made by the model.

- Linear Regression 
- Logistic Regression
- LASSO / Ridge Regression
- Elastic Net
- Locally Estimated Scatterplot Smoothing (LOESS)
- ...

<img src="images/algorithms/Regression-Algorithms.png" style="display: block;margin-left: auto;margin-right: auto;height: 200px"/>

## Regression Algorithms

### Logistic Regression

Predicts the probability of a binary (0/1) target variable. Multinomial / Ordered logistic regression can have a target variable 3 or more possible unordered / ordered outcomes.


<img src="images/algorithms/logistic.png" style="display: block;margin-left: auto;margin-right: auto;height: 350px"/>

*In Scikit-Learn:* [sklearn.linear_model.LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)

## Regression Algorithms

**Pros**:

- Simple & Interpretable
- Small number of hyperparmeters
- Overfitting can be addressed though regularization 
- Fast

**Cons**:

- Assumes a specific relationship
- Sensitive to outliers
- Sensitive to feature scaling
- Complex hypothesis functions are often difficult to fit

## Instance-based Algorithms

Build up a reference database of example data and compare new data to the database using some similarity measure in order to find the best match and make a prediction

- k-Nearest Neighbors (kNN)
- Self-Organizing Map (SOM)
- Support Vector Machines (SVM)
- Locally Weighted Learning (LWL)
- ...

<img src="images/algorithms/Instance-based-Algorithms.png" style="display: block;margin-left: auto;margin-right: auto;height: 200px"/>

## Instance-based Algorithms

### k-Nearest Neighbors (kNN)

To predict the class of a new point, kNN finds the **k** nearest neighbors of that point based on mathematical distance. The majority class of these points is selected as the prediction. kNN assumes that similar things exist in close proximity.


<img src="images/algorithms/KnnClassification.png" style="display: block;margin-left: auto;margin-right: auto;height: 250px"/>

*In Scikit-Learn:* [sklearn.neighbors.KNeighborsClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html)

## Instance-based Algorithms

### Support Vector Machines (SVM)

Tries to isolate groups of observations from each other by finding a hyperplane with maximum distance from data points of both classes. 'Support vectors' are used to find such hyperplane and are based on the data points closest to it. SVM-family algorithms can be used not only for classification, but also for regression and outliers detection.

<img src="images/algorithms/svm.png" style="display: block;margin-left: auto;margin-right: auto;height: 350px"/>

*In Scikit-Learn:* [sklearn.svm.SVC](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html)

## Instance-based Algorithms

**Pros**:

- Can be fast and performant provided enough representative data
- Can be rather fast (especially kNN and linear SVM)
- Easy to add more training examples (especially kNN)
- SVMs have many kernels to choose from

**Cons**:

- Can be sensitive to missing values and outliers
- Noise in the features might flip the results
- Sensitive to feature scaling

## Decision Tree Algorithms

Rule-based learning methods that usually use information gain criteria to split the dataset at a series of decisions, made at nodes. At the very end of the branches are leaf nodes that represent a class or regression outcome from the tree unless tree depth is restricted. When trained, result in readable decision diagrams.

- Classification and Regression Tree (CART)
- Decision Stump
- Conditional Decision Trees
- ...

<img src="images/algorithms/Decision-Tree-Algorithms.png" style="display: block;margin-left: auto;margin-right: auto;height: 200px"/>

## Decision Tree Algorithms

**Pros**:

- Easy to implement
- Highly interpretable
- Can learn complex relationships
- Requires little data preprocessing
- Insensitive to missing values

**Cons**:

- Very prone to overfitting unless pruning is used 
- Not robust to small changes in the training data
- Individual trees usually perform worse than ensembles of trees

## Ensemble Algorithms

Large models composed of multiple weaker models that are independently trained and whose predictions are combined in some way to make the overall prediction. One of the most powerful and popular algorithm families.

- Bootstrapped Aggregation (Bagging)
- Random Forest
- Gradient Boosting
- AdaBoost
- XGBoost
- ...

<img src="images/algorithms/Ensemble-Algorithms.png" style="display: block;margin-left: auto;margin-right: auto;height: 200px"/>

## Ensemble Algorithms
### Bagging and Random Forests
A large number of independent unrestricted decision trees is trained on (bootstrapped) subsamples of the training data (and with a random selection of features in case of RF). After each tree produces a prediction, voting / averaging over all of them is used to produce the final prediction. While each tree has low bias and high variance, combining them averages out the variance, ideally resulting in a balanced low bias low variance model.
<img src="images/algorithms/bagged-trees.png" style="display: block;margin-left: auto;margin-right: auto;height: 350px"/>
*In Scikit-Learn:* [sklearn.ensemble.RandomForestClassifier / .RandomForestRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html)

## Ensemble Algorithms
### Adaptive boosting and Gradient boosting
Iteratively improve the original weak model by focusing on improving its weaknesses at each new iteration. In Adaptive boosting it is done by reweighting the misclassified samples at each new step, while in Gradient boosting the current weaknesses with gradients are identified. The final iteratively improved model often proves to be very powerful.

<img src="images/algorithms/boosting.png" style="display: block;margin-left: auto;margin-right: auto;height: 300px"/>

*In Scikit-Learn:* [sklearn.ensemble.AdaBoostClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html) and [ sklearn.ensemble.GradientBoostingClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html)

## Ensemble Algorithms

**Pros**:

- High performance (competition-winning)
- Robust to missing data, highly correlated and irrelevant features
- Less likely to overfit than individual weak learners
- Keep many pros of the used weak learners (e.g. DTs)
- Still interpretable via feature importance scores 
- Insensitive to missing values
- Can learn complex relationships

**Cons**:

- Much less interpretability than in the used weak learners
- More computationally expensive than individual models
- Highly sensitive to non-representative training data (non-random samples)

## Clustering Algorithms

Unsupervised algorithms that use the inherent structures in the data to best organize it into groups of maximum homogeneity.

- k-Means
- k-Medians
- Hierarchical Clustering
- DBSCAN and HDBSCAN
- ...

<img src="images/algorithms/Clustering-Algorithms.png" style="display: block;margin-left: auto;margin-right: auto;height: 200px"/>

## Clustering Algorithms

### k-Means

**k** random cluster centers are created and all data points are assigned to the nearest of these centers. Then new centers of these cluster clouds are calculated and data points are reassigned. The procedure is repeated until stable cluster centers are found. Quick and simple, yet often effective. Number of clusters **k** cannot be automatically determined though.

<img src="images/algorithms/k-means.gif" style="display: block;margin-left: auto;margin-right: auto;height: 350px"/>

*In Scikit-Learn:* [sklearn.cluster.KMeans](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html)

## Clustering Algorithms

### DBSCAN

Stands for Density-Based Spatial Clustering of Applications with Noise. It groups together points that are close to each other based on a (Euclidean) distance measurement and a minimum number of points. Flexible and powerful. Though we need to specify how close points should be to each other to be considered a part of a cluster. Therefore most suitable for data which contains clusters of similar density.

<img src="images/algorithms/dbscan.gif" style="display: block;margin-left: auto;margin-right: auto;height: 350px"/>

*In Scikit-Learn:* [sklearn.cluster.DBSCAN](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html)

## More Algorithm Families...


- **Deep Learning Algorithms**
    - capture complex relationships in complex data (imgs, audio, text etc)
    - one of the most promising and rapidly developing fields
    - can be hard to train & require more domain knowledge

    <br>
- **Bayesian Algorithms**
    - rely on Bayesian Statistics instead
    - allow incorporation of prior knowledge
    - Naive Bayes: simple yet popular classifier

    <br>

- **Anomaly detection algorithms**
    - a broad family of algorithms focused on isolating (not) normal data
    - Isolation Forest: one of the most powerful and popular approaches
    <br>

...

# Deep Learning Neural networks

Demonstrated super-human performance on a number of tasks

<img src="images/deeplearning.png" style="display: block;margin-left: auto;margin-right: auto;height: 400px"/>

## Can even perform data generation

[This person does not exist](thispersondoesnotexist.com)

[This cat does not exist](https://thiscatdoesnotexist.com/)

[This AirBnB does not exist](https://thisrentaldoesnotexist.com/)

## GANs

<img src="images/gan.png" style="display: block;margin-left: auto;margin-right: auto;height: 400px"/>

# Reinforcement Learning

Deep learning algorithms can be used to perform Reinforcement Learning

In [2]:
from IPython.display import YouTubeVideo

YouTubeVideo('8tq1C8spV_g', width=600, height=450)

## Factors to keep in mind

1. **Training data size**. 

    If your data has relatively few observations and many features 
    
    --> Linear regression, Naïve Bayes, or Linear SVM can do better than some other popular models
    
2. **Accuracy vs Interpretability**
    
    e.g. Decision Trees vs Neural Nets
    
3. **Speed & Training time**

    More accuracy ~ More time
    
...

<img src="images/interpretable.png" style="display: block;margin-left: auto;margin-right: auto;height: 600px"/>


<img src="images/algorithms/ml-cheet-sheet.png" style="display: block;margin-left: auto;margin-right: auto;height: 900px"/>

