# Ensembling with Trees

We will go through an overview of the different types of tree-based algorithms in the literature and how they work using ensembling techniques like bagging (boostrapping + aggregating) and boosting (minimize error using gradients).

---
# Ensembling Techniques



## Bagging

## Boosting

## Stacking

---
# Decision Trees

Decision trees partition feature space into axis-parallel rectangles, labelling each rectangles with a class / assign a continuous value (regression).

### Basic Algorithm

1. Check for the above base cases.
2. For each feature $f_i$, find the **metric** from splitting on the criteria $c$ based on $f_i$, e.g. if $f_i > 4.3$ (Regression) or if $f_i == \text{Dog}$ (Classification).
3. Let $c_{best}$ be the "best" criteria with the "best" metric result.
4. Create a decision node that splits on $c_{best}$.
5. Recur on the sublists obtained by splitting on $c_{best}$, and add those nodes as children of node.

There are multiple variations on this basic decision tree algorithm and most of them work the same way by choosing the best criteria for splitting and recursively splitting until all the overall metrics are the best, but we can categorize them based on the **metrics** they use to decide how to split a node.

## Metrics for selecting "best" criteria for split

Gini Impurity: $G = \sum^{C}_{i=1} p(i) * (1 - p(i))$
- Used by CART's (Classification And Regression Trees) Classification Trees
- Works only with categorical features ('Success', 'Failure')
- Performs binary splits
- Higher the value, higher the homogeneity
- Steps:
    1. Calculate Gini Impurity for child nodes after splitting on a feature $x_i \Big\{\begin{array}{lr} \text{Category 1} \\ \text{Category 2} \end{array}$

Variance Reduction:
- Used by CART's (Classification And Regression Trees) Regression Trees

Information Gain:
- Used by ID3, can only be used for categorical values

Gains Ratio:
- Used by C4.5 (successor of ID3), and C5.0 (successor of C4.5), can be used for both classification and regression

## Problems
Overfitting
- Solutions:
    1. Pre-pruning
        - Fixed / Max Depth
        - Fixed / Max number of leaves
    2. Post-pruning
        - Chi Squared Test for association / independence
            - Removing nodes that are statistically insignificant
    3. Model Selection
        - Complexity Penalization

---
# Random Forest

---
# AdaBoost

---
# Gradient Boosted Trees

---
# XGBoost

---
## Resources:

- [Tips for stacking and blending](https://www.kaggle.com/zaochenye/tips-for-stacking-and-blending)
- [Stacking Classifer](https://www.youtube.com/watch?v=sBrQnqwMpvA)
- [Victor Lavrenko on Decision Trees](https://www.youtube.com/watch?v=eKD5gxPPeY0&list=PLBv09BD7ez_4temBw7vLA19p3tdQH6FYO)
- [Statquest on Decision Trees](https://www.youtube.com/watch?v=7VeUPuFGJHk)
- [Basic Decision Tree algorithm Wiki](https://en.wikipedia.org/wiki/C4.5_algorithm#pseudocode)
- [Decision Tree Splitting Metrics Wiki](https://en.wikipedia.org/wiki/Decision_tree_learning#Metrics)
- [Rishabh Jain on Decision Trees](https://medium.com/@rishabhjain_22692/decision-trees-it-begins-here-93ff54ef134)
- [CMU ML Decision Trees Notes](http://alex.smola.org/teaching/cmu2013-10-701/slides/23_Trees.pdf)
- [Building a Binary Decision Tree using Gini Index by Jason Brownlee](https://machinelearningmastery.com/implement-decision-tree-algorithm-scratch-python/)