# Classical ML Algorithm Legends

### Table of Contents

* [Decision Trees](#DecisionTrees)
    * [Decision tree learning](#Learning)
    * [Decision tree pruning](#Pruning)
    * [Boosted trees!](#Algorithms)
* [Indexing](#Indexing)
* [Other useful links](#Finally...)

# DecisionTrees

**prerequisites**


* Information Gain:
In decision tree learning, Information gain ratio is a ratio of information gain to the intrinsic information. It was proposed to reduce a bias towards multi-valued attributes by taking the number and size of branches into account when choosing an attribute.
Check [this tutorial](https://www.youtube.com/watch?v=FuTRucXB9rA) for more information on this topic.

* [Decision Trees](https://en.wikipedia.org/wiki/Decision_tree)

![dcsntr.png](attachment:dcsntr.png)

Generally there are two types of decision trees:
1. [**Classification tree** ](https://www.solver.com/classification-tree)
: A Classification tree labels, records, and assigns variables to discrete classes.

![0*ToYXqRes95eMvIKV.png](attachment:0*ToYXqRes95eMvIKV.png)

2. [**Regression tree**](https://www.solver.com/regression-trees): A regression tree is built through a process known as binary recursive partitioning, which is an iterative process that splits the data into partitions or branches, and then continues splitting each partition into smaller groups as the method moves up each branch. 

![440px-Decision_Tree.jpg](attachment:440px-Decision_Tree.jpg)

## Learning

Decision Tree Learning is a type of Supervised Machine Learning where the data is continuously split according to a certain parameter.

you can check out [this](https://en.wikipedia.org/wiki/Decision_tree_learning) wikipedia page for more thorough details.

### Algorithms
1. [Iterative Dichotomiser 3 (ID3)](https://en.wikipedia.org/wiki/ID3_algorithm) - ([sample implementation](https://sefiks.com/2017/11/20/a-step-by-step-id3-decision-tree-example/))
2. [C4.5](https://en.wikipedia.org/wiki/C4.5_algorithm) (Successor of ID3) - ([sample implementation](https://sefiks.com/2018/05/13/a-step-by-step-c4-5-decision-tree-example/))
3. Classification And Regression Tree - ([useful article](https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/))
4. Chi-square automatic interaction detection - ([useful article](https://sefiks.com/2020/03/18/a-step-by-step-chaid-decision-tree-example/))
5. Multivariate adaptive regression spline ([useful tutorial](https://www.youtube.com/watch?v=9COLjUxSzx8))

## Avoid Overfitting
Decision trees are prone to overfitting, especially when a tree is particularly deep. This is due to the amount of specificity we look at leading to smaller sample of events that meet the previous assumptions. This small sample could lead to unsound conclusions.
In decision trees, pruning is a process which is applied to control or limit the depth (size) of the trees. By default, decision tree model hyperparameters were created to grow the tree into its full depth. These trees are called fully-grown trees which are always overfitting.

### Pruning
Pruning reduces the size of decision trees by removing parts of the tree that do not provide power to classify instances. Check [this Wikipedia link](https://en.wikipedia.org/wiki/Decision_tree_pruning) for full explanation.


**Pre-pruning**: As the names suggest, pre-pruning or early stopping involves stopping the tree before it has completed classifying the training set and post-pruning refers to pruning the tree after it has finished.

**Post-pruning**: Post-pruning a decision tree implies that we begin by generating the (complete) tree and then adjust it with the aim of improving the accuracy on unseen instances.

Also see: 

* [pre-pruning and post-pruning](https://www.displayr.com/machine-learning-pruning-decision-trees/)

* [Bottom-up pruning](https://en.wikipedia.org/wiki/Decision_tree_pruning)

* [Top-down pruning](https://en.wikipedia.org/wiki/Decision_tree_pruning)
<br>

**Algorithms**

* [Reduced error pruning](https://www.cs.auckland.ac.nz/~pat/706_98/ln/node90.html): In this algorithm, by starting at the leaves, each node is replaced with its most popular class. If the prediction accuracy is not affected then the change is kept. While somewhat naive, reduced error pruning has the advantage of simplicity and speed.
* [Cost complexity pruning](http://mlwiki.org/index.php/Cost-Complexity_Pruning): This generates a series of trees and at each step a tree is made from the previous one by subtracting a subtree from it and replacing it with a leaf node with value chosen as in the tree building algorithm




Some techniques, often called **ensemble methods**, construct more than one decision tree:
<ol>
    <li>Boosted trees</li>
    <li>Rotation forest</li>
    <li>Bootstrap aggregated</li>
</ol>

**ensemble methods**: In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.

* **Boosted trees**: Incrementally building an ensemble by training each new instance to emphasize the training instances previously mis-modeled. These can be used for regression-type and classification-type problems.

* **Rotation forest**: in which every decision tree is trained by first applying principal component analysis (PCA) on a random subset of the input features.

* **Bootstrap aggregated**: bagged decision trees, an early ensemble method, builds multiple decision trees by repeatedly resampling training data with replacement, and voting the trees for a consensus prediction.

[Bagging](https://en.wikipedia.org/wiki/Bootstrap_aggregating), also known as bootstrap aggregation, is the ensemble learning method that is commonly used to reduce variance within a noisy dataset.

[Here](https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote18.html) is the link of bagging topic in ML course at Cornell and [this](https://www.youtube.com/watch?v=2Mg8QD0F1dQ) is a simple video explaining Bootstrap aggregating bagging.

The term ['Boosting'](https://en.wikipedia.org/wiki/Boosting_(machine_learning)) refers to a family of algorithms which converts weak learner to strong learners.

[This video](https://www.youtube.com/watch?v=MIPkK5ZAsms) is *A Short Introduction to Boosting*. You can read [this Medium article](https://medium.com/greyatom/a-quick-guide-to-boosting-in-ml-acf7c1585cb5) for a better perspective on the topic.

There are many boosting algorithms. The original ones are as followed:
<ol>
    <li>a recursive majority gate formulation</li>
    <li>boost by majority</li>
</ol>

check [this paper](https://web.archive.org/web/20121010030839/http://www.cs.princeton.edu/~schapire/papers/strengthofweak.pdf) for more information on this topic.

But these algorithms were not adaptive and could not take full advantage of the weak learners. Later, AdaBoost was developed, which was an adaptive boosting algorithm.

[AdaBoost]((https://en.wikipedia.org/wiki/AdaBoost)), short for Adaptive Boosting, is a machine learning meta-algorithm. It can be used in conjunction with many other types of learning algorithms to improve performance. [This video](https://www.youtube.com/watch?v=LsK-xG1cLYA) explains the algorithm clearly.

# Indexing

[This link](https://cilvr.cs.nyu.edu/diglib/lsml/lecture12_indexing.pdf) explains indexing in machine learning. You can also use [this](https://chartio.com/learn/databases/how-does-indexing-work/) tutorial to see how it works and what it does. Also check [this paper](http://learningsys.org/nips17/assets/papers/paper_22.pdf) out for more information.

## Finally...
Check these links out for more information on topics covered in this notebook.

* [How To Implement The Decision Tree Algorithm From Scratch In Python](https://machinelearningmastery.com/implement-decision-tree-algorithm-scratch-python/)

* [Let’s Solve Overfitting! Quick Guide to Cost Complexity Pruning of Decision Trees](https://www.analyticsvidhya.com/blog/2020/10/cost-complexity-pruning-decision-trees/)

* [Minimax Algorithm with Alpha-beta pruning](https://www.hackerearth.com/blog/developers/minimax-algorithm-alpha-beta-pruning/)

* [Post pruning decision trees with cost complexity pruning](https://scikit-learn.org/stable/auto_examples/tree/plot_cost_complexity_pruning.html)

* [Information Gain and Mutual Information for Machine Learning](https://machinelearningmastery.com/information-gain-and-mutual-information/)

* [How to Develop a Bagging Ensemble with Python](https://machinelearningmastery.com/bagging-ensemble-with-python/)

* [Boosting in Machine Learning and the Implementation of XGBoost in Python](https://towardsdatascience.com/boosting-in-machine-learning-and-the-implementation-of-xgboost-in-python-fb5365e9f2a0)

* [Implementing the AdaBoost Algorithm From Scratch](https://www.kdnuggets.com/2020/12/implementing-adaboost-algorithm-from-scratch.html)

* [Last, but not least!](https://www.youtube.com/watch?v=dQw4w9WgXcQ)