<div style="display: flex; align-items: center;">
  
  <div style="text-align: left;">
   <h2 style="font-size: 1.8em; margin-bottom: 0;"><b>Branching out decisions in a tree</b></h2>
   <br>
   <h3 style=" font-size: 1.2em;margin-bottom: 0;">Decision Trees and Ensemble Learning</h3>
   <h3 style="font-size: 1.2em; margin-bottom: 0; color: blue;"><i>Dr. Satadisha Saha Bhowmick</i></h3>
  </div>

  <div style="margin-right: 10px;"> 
    <img src="media/images/dsi-logo-600.png" align="right" alt="UC-DSI" scale="0.7;">
  </div>

</div>

<!-- ### Learning Loop -->

<div style="display: flex; align-items: center;gap: 5px;">
  <div style="flex: 1;">
    <h3>About Me</h3>
  <h4>Satadisha Saha Bhowmick, Ph.D</h4>

  <div class="fragment"  style="font-size: 14px;">
    <h4>Affiliation</h4>
    <ul>
      <li>Postdoctoral Teaching Fellow <br> Data Science Institute, University of Chicago</li>
    </ul>
  </div>

  <div class="fragment"  style="font-size: 14px;">
    <h4>Courses Sequences I teach</h4>
    <ul>
      <li>Introduction to Data Science</li>
      <li>Mathematical Methods for Data Science</li>
      <li>Ethics, Fairness, Responsibility, and Privacy in Data Science</li>
      <li>Object Oriented Programming with Java</li>
    </ul>
  </div>

  <div class="fragment"  style="font-size: 14px;">
    <h4>Research Interest</h4>
    <ul>
      <li>Information Extraction</li>
      <li>Short Text Mining</li>
    </ul>
  </div>
  
  </div>
  <div style="flex: 1;">
    <img src="media/images/satadisha-photo.png" alt="Self" scale="0.3">
  </div>
</div>


In [1]:
from graphviz import Digraph
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from matplotlib.colors import ListedColormap
import ipywidgets as widgets
from IPython.display import display, clear_output
from PIL import Image
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D  # noqa: F401
from ipywidgets import interact
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.preprocessing import LabelEncoder
df = pd.read_csv("data/data_for_tree_Oct22.csv")

### Today's Learning Outcomes
Course Module from DATA 119 Introduction to Data Science II

<div style="display: flex; gap: 2px;">

  <div style="flex: 1;">

  <ul>
    <li class="fragment"> General understanding of Tree Models</li>
    <li class="fragment"> Data driven decision making with trees</li>
    <li class="fragment">Impurity functions to build decision boundaries for tree models.</li>
  </ul>

  
  </div>

  <div style="flex: 1;">
  <ul>
    <li class="fragment">Using an ensemble of tree-based learners</li> 
    <li class="fragment"> Bagging</li>
    <li class="fragment"> Random Forest</li>
  </ul>
  </div>

</div>

### Setting The Scene

<div style="display:flex; gap:20px;">
  <div style="flex:1;">
  <img src="media/images/bullet1.png" alt="tab1" scale="0.35;" style="width: 20%;">
  <p>Most data that is interesting<br> enough for prediction has<br> some inherent structure.</p>
  </div>
  <div style="flex:1;">
  <img src="media/images/bullet2.png" alt="tab1" scale="0.35;" style="width: 20%;">
  <p>Tree-based models exploit structure in data to split them into multiple homogenous subgroups</p>
  <p>Approximates a (typically) discrete valued target function by repeatedly segmenting the predictor space into more homogeneous regions.</p>
  <p>Represent a disjunction of conjunctions of constraints on the values of attributes representing the data.</p>
  </div>
  <div style="flex:1;">
  <img src="media/images/bullet3.png" alt="tab1" scale="0.35;" style="width: 20%;">
  <p><b>Advantages</b></p>
  <p>Training data need not be stored once the tree is constructed</p>
  <p>Very fast during test time as test inputs only need to traverse down the tree to a leaf.</p>
  <p>Decision trees require no distance metric because the splits are based on feature thresholds and not distances.</p>

  </div>
</div>

### Decision Tree: Example

- Assume a toy task that consisting of a dataset that contains several attributes related to trees growing in a plot of land. 
- Given only the $\color{darkorange}{\textbf{Diameter}}$ and $\color{darkorange}{\textbf{Height}}$ of a tree trunk, we must determine if it's an Apple, Cherry, or Oak tree. 
- To do this, we'll use a $\color{darkorange}{\textbf{Decision Tree}}$.

<i>Let's start by investigating the data!</i>

In [2]:
data = df[["Diameter", "Height"]]
print("Number of rows:",len(data))

#Number of instances per class
class_counts = (
    df["Family"]
    .value_counts()
    .sort_index()
    .rename_axis("Tree type")
    .reset_index(name="Count")
)

class_counts.loc["Total"] = ["Total", class_counts["Count"].sum()]
class_counts


Number of rows: 150


Unnamed: 0,Tree type,Count
0,apple,50
1,cherry,50
2,oak,50
Total,Total,150


<img src="media/images/tree-data.png" alt="Tree Data" scale="0.55;" style="width: 90%;">

### Decision Tree: Example

Learned trees can also be thought of as <span style="color:blue;"><i>sets of if-then rules</i></span> progressively dividing the feature space!

<img src="media/gif/decision_tree_growth.gif" alt="Decision Tree Example" scale="0.55;" style="width: 90%;">

### Decision Tree: Example

We can use the `DecisionTreeClassifier` module from `scikit-learn` with some added parameters to fit a full decision tree on this data.

In [None]:
# Features and label
X = df[["Diameter", "Height"]].values
y_raw = df["Family"].astype(str).values

# Encode class labels
le = LabelEncoder()
y = le.fit_transform(y_raw)
class_names = list(le.classes_)

# Fit a full (no max_depth cap) classification tree
clf = DecisionTreeClassifier(criterion="entropy", max_depth=None, random_state=0)
clf.fit(X, y)


### Decision Trees: Terminology
<span style="color:darkorange;"><b>Root Node</b></span>: Top of the tree, the whole sample is still together.
<img src="media/images/full-DTree.png" alt="Decision Tree" scale="2.5" style="height: 200%;">

### Decision Trees: Terminology
<span style="color:darkorange;"><b>Node</b></span>: Decision nodes that create splits.
<img src="media/images/full-DTree.png" alt="Decision Tree" scale="2.5" style="height: 200%;">

### Decision Trees: Terminology
<span style="color:darkorange;"><b>Child</b></span>: Nodes below the current node.
<img src="media/images/full-DTree.png" alt="Decision Tree" scale="2.5" style="height: 200%;">

### Decision Trees: Terminology
<span style="color:darkorange;"><b>Parent</b></span>: Node above the current node.
<img src="media/images/full-DTree.png" alt="Decision Tree" scale="2.5" style="height: 200%;">

### Decision Trees: Terminology
<span style="color:darkorange;"><b>Leaf/Terminal Node</b></span>: Nodes that have no children, predictions depend on data at these nodes.
<img src="media/images/full-DTree.png" alt="Decision Tree" scale="2.5" style="height: 200%;">

### Decision Trees: Terminology
<span style="color:darkorange;"><b>Pruning</b></span>: Removing unnecessary branches.
<img src="media/images/full-DTree.png" alt="Decision Tree" scale="2.5" style="height: 200%;">

### Decision Trees: Algorithm

<div style="display: flex; align-items: center;gap: 20px;">
  <div style="flex: 1;">
  <h4><span style="color:darkorange;"><b>Top-Down Greedy Recursive Partitioning </b></span></h4>
  <p>Choose the most efficient split locally that creates the most homogenous groups.</p>
  <ul>
    <li>Continuous and Discrete Features</li>
    <li>Can split at the same attribute multiple times at different values.</li>
    <li>Recursively keep splitting on each child node until a homogeneous label group is attained.</li>
  </ul>
  <h4><span style="color:darkorange;"><b>Algorithms:</b></span></h4>
  <ul>
    <li>ID3 (Iterative Dichotomiser 3): <span style="color:blue;"><u>Quinlan, 1986</u></span></li>
    <li>C4.5 (successor of ID3)</li>
    <li>CART (Classification And Regression Tree): <span style="color:blue;"><u>Breiman, 1984</u></span></li>
    <li>And more‚Ä¶</li>
  </ul>

  </div>
  <div style="flex: 1;">
    <img src="media/images/table-ss.png" alt="Decision Tree: Comparison" scale="0.9">
  </div>
</div>

### Building a Tree with an Impurity Function

The split which results in node with minimal impurity is considered the optimal split.

<img src="media/images/impurity-1.png" alt="Decision Tree: Impurity" scale="0.9">

### Building a Tree with Gini Impurity

Range of values go from 0 for perfectly pure nodes to 0.5 for perfectly impure nodes.

<img src="media/images/gini-impurity-1.png" alt="Decision Tree: Impurity" scale="0.9">

### Building a Tree with Gini Impurity

Calculating the Gini Impurity on Numeric column like Age (that contains numbers and not just Yes/No values) is little more involved.

<img src="media/images/gini-impurity-2.png" alt="Decision Tree: Impurity" scale="0.9">

### Building a Tree with Gini Impurity

All 3 possible first splits!

<img src="media/images/gini-impurity-3.png" alt="Decision Tree: Impurity" scale="0.9">

### Building a Tree with Gini Impurity

Natural feature selection $\rightarrow$ loves popcorn doesn‚Äôt appear in the full tree for the decision making process!

<img src="media/images/full-tree-gini.png" alt="Decision Tree: Impurity" scale="0.9">

### Entropy as Impurity
<ul>
    <li><span style="color:darkorange;"><b>Entropy</b></span> is a measure of <span style="color:darkorange;">disorder</span>. The¬†entropy¬†of a¬†random variable is the average level of "surprise", or "uncertainty" inherent to the variable's possible outcomes.</li>
        <ul>
        <li> $E(S) = \sum_{i=1}^{C}-p_ilog_2p_i$ </li>
        <li> $p_i$ is the probability of class $i$. </li>
        </ul>
    </li>
    <li> 
    Varies between <span style="color:darkorange;"><b>0</b></span> and <span style="color:darkorange;"><b>$log_2C$</b></span>.<br>
    Worst case is a <span style="color:darkorange;">uniform distribution of labels</span> ‚Äì all classes being equally likely. For binary classification $E(S) \in (0,1)$
    </li>
</ul>

### Entropy as Impurity

<div style="display: flex; align-items: center;gap: 20px;">
  <div style="flex: 1;">
  <p>In <span style="color:darkorange;">Information Theory</span>: average number of bits needed to encode the classification of an example in $S$.</p>
  <ul>
    <li> If all examples are in the same class, receiver knows it‚Äôs always that class <span style="color:darkorange;"><b>0 bit</b></span> (no information has to be sent)!</li>
    <li> If both classes are equally likely, i.e. probability = 0.5, then we need <span style="color:darkorange;"><b>1 bit</b></span>.</li>
    <li>If one class has probability 0.8, on average it can be conveyed using shorter messages.</li>
  </ul>
  <p>I can start provisioning my bits in an optimal manner. $1^{st}$ bit to say red or not, add $2^{nd}$ bit only if it is blue, $3^{rd}$ bit if neither red not blue, and value of the bit tells us if it is green or yellow.</p>

  </div>
  <div style="flex: 1; display: flex; flex-direction: column; align-items: center;">
    <img src="media/images/entropy-calculation.png" alt="Entropy" scale="0.015">
  </div>
</div>


### Building a Tree with Entropy

- Entropy tells us how much information we need to define the uncertainty of a variable.
    - <span style="color:darkorange;">Low Entropy means more homogeneity! $\rightarrow$ Can be used to quantify Impurity</span>
- Our goal when building the Decision Tree is to reduce uncertainty or disorder with every split.
- Mathematically this is quantified as <span style="color:darkorange;"><b>Information Gain</b></span> $\in (0,1)$.
    - Expected reduction in entropy caused by partitioning the examples according to this attribute. 
    - It measures <i>how well</i> a given attribute separates the training examples according to their target classification.
    - $IG(Y,X) = ùê∏(ùëå)‚àí ùê∏(ùëå|ùëã)$


### Building a Tree with Entropy

Here, we see part of a decision tree predicting if a person will be able to repay a loan.<br>
Calculate the Entropy at the parent node and check how much uncertainty can be reduced upon splitting on the feature Balance.

<ul>
    <li>${\scriptsize E(Parent) = -\frac{16}{30}log_2\frac{16}{30} - \frac{14}{30}log_2\frac{14}{30} = 0.99}$</li>
    <li>${\scriptsize E(Balance<50K) = -\frac{12}{13}log_2\frac{12}{13} - \frac{1}{13}log_2\frac{1}{13} = 0.39}$</li>
    <li>${\scriptsize E(Balance>50K) = -\frac{4}{17}log_2\frac{4}{17} - \frac{13}{17}log_2\frac{13}{17} = 0.79}$</li>
</ul>

<div style="display: flex; align-items: center;gap: -20px;">
  <div style="flex: 1;">

  <p>Weighted average of Entropy for splitting on Balance
  $${\scriptsize E(Balance) = \frac{13}{30}0.39 + \frac{17}{30}0.79 = 0.62}$$
  </p>

  <p>Information Gain through split
  $${\scriptsize IG(Parent, Balance) = E(Parent) - E(Balance) = 0.37}$$
  </p>

  </div>
  <div style="flex: 1; display: flex; flex-direction: column; align-items: center;">
    <img src="media/images/entropy-node-split.png" alt="Entropy" scale="0.65">
  </div>
</div>


### Automatic Feature Selection

<div style="display: flex; align-items: center;gap: -20px;">
  <div style="flex: 1;">

  <p>Even at impure nodes, we might not choose to split on an attribute if the impurity function is not minimized upon splitting the node as opposed to its original value.</p>

  <p>Choosing not to split despite having attributes remaining for further split is a type of <span style="color:darkorange;">automatic feature selection</span>.</p>

  <p>Alternatively, instead of just checking for impurity reduction upon split, one can also set the threshold such that the reduction needs to exceed its value.</p>

  <p>One way to limit <span style="color:darkorange;">overfitting</span>!</p>

  </div>
  <div style="flex: 1; display: flex; flex-direction: column; align-items: center;">
    <img src="media/images/dt-feature-selection.png" alt="Entropy" width="85%">
  </div>
</div>

### Single trees overfit

<span style="color:darkorange;"><b>Decision Trees are prone to overfitting.</b></span><br> 
Without any additional oversight, CART and ID3 algorithms grows each branch of the tree just deeply enough to perfectly classify the training examples. 

Overfit trees are:
- Too mindful to randomness of the training data
- Compromises generalizability on unseen test data.

### Minimize overfitting

Must balance the depth and complexity of the tree to <b>generalize</b> to unseen data.

<div style="display: flex; align-items: center;gap: -20px;">
  <div style="flex: 1;">

  <p>Two Main Options:</p>

  <ul>
    <li>Early stopping</li>
    <ul>
    <li>Restrict tree depth</li>
    <li>Restrict node size</li>
    </ul>
    <li>Pruning</li>
  </ul>
  <p>Training a full decision tree with $67\%$ of initial data and testing on the rest yields an F1-score of $\color{darkorange}{87\%}$</p>
  </div>
  <div style="flex: 1; display: flex; flex-direction: column; align-items: center;">
    <img src="media/images/full-decision-surface.png" alt="DTSurface" scale="0.65">
  </div>
</div>

### Early stopping: <span style="color:darkorange;">Limit tree depth</span>

Stop splitting after a certain depth by setting `max_depth` parameter.
```python
clf = DecisionTreeClassifier(max_depth=4)
clf.fit(X, y)
```
<div style="display: flex; align-items: center;gap: -20px;">
  <div style="flex: 1;">
  <p>Training a decision tree of maximum depth 4 with $67\%$ of initial data and testing on the rest yields an F1-score of $\color{darkorange}{93\%}$</p>
  </div>
  <div style="flex: 1; display: flex; flex-direction: column; align-items: center;">
    <img src="media/images/decision-boundary-depth4.png" alt="DTSurface4" style="width:100%;">
  </div>
</div>

### Early stopping: <span style="color:darkorange;">Minimum node size</span>

Do not split intermediate node which contains fewer than a minimum number of samples by setting `min_samples_split` parameter
```python
clf = DecisionTreeClassifier(min_samples_split=4)
clf.fit(X, y)
```

<div style="display: flex; align-items: center;gap: -20px;">
  <div style="flex: 1;">
  <p>Training a decision tree requiring a minimum of 4 samples in its internal node, with $67\%$ of initial data and testing on the rest, yields an F1-score of $\color{darkorange}{89\%}$</p>
  </div>
  <div style="flex: 1; display: flex; flex-direction: column; align-items: center;">
    <img src="media/images/decision-boundary-minsplit4.png" alt="DTSurface4minsample" style="width:100%;">
  </div>
</div>

### More on Decision Trees

<img src="media/images/dt-pros-cons.png" alt="Entropy" scale="0.65">

### The problem with single trees
- Single pruned trees are poor predictors
- Single deep trees are noisy $\rightarrow$ <b>very sensitive to small perturbations in data</b>.

Only $5\%$ Gaussian noise in the data changes the tree.
<img src="media/images/dt-w-gaussian-noise.png" alt="DTwGaussianNoise" style="width:100%;">

<b>Fix<b>: <span style="color:darkorange;">an ensemble of trees</span>!

### Ensemble Learning
An ML paradigm that combines multiple individual models to create a stronger and more robust model.

<span style="color:darkorange;">Bias Variance Decomposition</span> of Learning Error:
${\scriptsize E[(h_D(x)-y)^2] = \underbrace{E[(h_D(x)-\bar{h}(x))^2]}_{Variance}+ \underbrace{E[(\bar{h}(x)-\bar{y}(x))^2]}_{Bias} + \underbrace{E[(\bar{y}(x)-y(x))^2]}_{Noise}}$



### Ensemble Learning: Bagging
- Reduces variance, so has a strong beneficial effect on high variance classifiers.
- Train multiple instances of the same model on different subsets of the training data
- Each subset is created by sampling with replacement - <span style="color:darkorange;"><b>bootstrapping</b></span>!!
- The final prediction is an average (for regression) or a majority vote (for classification) of the individual models.


### Ensemble Learning: Boosting

- Boosting focuses on sequentially improving the model's weaknesses.
- It gives more weight to misclassified instances, allowing the model to learn from its mistakes.
- Focuses on improving accuracy by systematically reducing <span style="color:darkorange;">bias</span>.

### Bagging (Bootstrap Aggregating)

Downside of Decision Trees ‚Äì easily overfit and often inaccurate on new samples.<br>
Goal: Reduce Variance term $E[(h_D(x)-\bar{h}(x))^2]$ <br>
To achieve this, we use Bagging. 

<div style="display: flex; align-items: center;gap: -20px;">
  <div style="flex: 1;"> 
  <p>Inspired by the <span style="color:darkorange;"><b>Weak law of large numbers</b></span>. Apply this to classifiers.</p>
  <p>Assume we have $m$ training sets¬†$D_1,D_2,\dots,D_m$ drawn from¬†population. Train a classifier on each one and average result:
  $\hat{h} = \frac{1}{m}\sum_{i=1}^mh_{D_i} \rightarrow \bar{h}$ as¬†$m \rightarrow \infty$.<br>
  If $\hat{h} \rightarrow \bar{h}$ the variance component of the error must also vanish. Problem is we don‚Äôt have m data sets, we only have one D.</p>

  <span style="color:darkorange;"><b>Solution</b></span>: Use bootstrapped samples!
  </div>
  <div style="flex: 1; display: flex; flex-direction: column; align-items: center;">
    <img src="media/images/bagging-fig.png" alt="Random Forest" style="width:90%;">
  </div>
</div>
    
A popular bagged model with Decision Trees is called Random Forest.


### Bagging: Random Forest

<div style="display: flex; align-items: center;gap: -20px;">
  <div style="flex: 1;"> 
  <p>1. <span style="color:darkorange;"><b>Bootstrapping</b></span>: Sample different subsets of data (sampling with replacement).</p>
  <p>2. Builds multiple ‚Äòoverfit‚Äô decision trees and combines their predictions.</p>
  <p>3. Random subsets of features are considered for splitting at each node to <span style="color:darkorange;"><b>avoid tree correlation</b></span>. <span style="color:red;"><b>Crucial!</b></span></p>
  <p>4. The final prediction is an average (regression) or majority vote (classification) of individual tree predictions.</p>
  </div>
  <div style="flex: 1; display: flex; flex-direction: column; align-items: center;">
    <img src="media/images/rf.png" alt="Random Forest" style="width:100%;">
    <div style="margin-top: 20px; color:darkorange; text-align: center;">
      Randomized feature selection with
      Variance in Prediction
    </div>
  </div>
</div>

### Random Forest: Example (Tree Data)
Training a full decision tree with $67\%$ of initial data and testing on the rest yields an F1-score of $\color{darkorange}{91\%}$

In [None]:
from sklearn.ensemble import RandomForestClassifier
# Fit Random Forest
rf = RandomForestClassifier(
    n_estimators=200,     # number of trees
    max_depth=None,      # grow full trees
    min_samples_split=2,
    random_state=0,
    n_jobs=-1
)

rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
f1 = f1_score(y_test, y_pred, average='weighted')
print(f"F1 Score (weighted): {f1:.4f}\n")

### Evaluation of a Random Forest
- <span style="color:darkorange;"><b>Out-of-Bag Data</b></span>: Entries that did not make into a bootstrapped dataset.
    - For large enough N, on average, 63.21% or the original records end up in any bootstrap sample.
    - Roughly 36.79% of the observations are not used in the construction of a particular tree.
- An out-of-bag sample can be used to test all the trees in the Random Forest trained without it.
- <span style="color:darkorange;"><b>Free cross-validation</b></span>: Random Forest accuracy can be gauged by the proportion of out-of-bag samples that were correctly classified by the forest.
- The proportion of out-of-bag samples that were incorrectly classified is the <span style="color:darkorange;"><b>Out-of-Bag Error</b></span>. 
    - This is an unbiased estimate of the test error.

### Random Forest for Classifying Digits

We want to use Random Forest to classify hand-written digits. Our dataset consists of 1797 grayscale images of handwritten digits, from 0 to 9. Each image is an $8 \times 8$ pixel grid that has been flattened into a vector of 64 numerical features.

We use separate out $25\%$ of the data for evaluation and train the model on the rest.
<img src="media/images/digits-data.png" alt="Digits Data" style="width:50%;">

In [7]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, classification_report
from sklearn.datasets import load_digits
from sklearn.ensemble import RandomForestClassifier

digits = load_digits()
digits.keys()


dict_keys(['data', 'target', 'frame', 'feature_names', 'target_names', 'images', 'DESCR'])

In [8]:
Xtrain, Xtest, ytrain, ytest = train_test_split(digits.data, digits.target,
                                                random_state=0)
model = RandomForestClassifier(n_estimators=1000)
model.fit(Xtrain, ytrain)
ypred = model.predict(Xtest)
print(classification_report(ypred, ytest))

              precision    recall  f1-score   support

           0       1.00      0.97      0.99        38
           1       0.98      0.95      0.97        44
           2       0.95      1.00      0.98        42
           3       0.98      0.98      0.98        45
           4       0.97      1.00      0.99        37
           5       0.98      0.96      0.97        49
           6       1.00      1.00      1.00        52
           7       1.00      0.96      0.98        50
           8       0.94      0.98      0.96        46
           9       0.98      0.98      0.98        47

    accuracy                           0.98       450
   macro avg       0.98      0.98      0.98       450
weighted avg       0.98      0.98      0.98       450



### Out-of-Bag Error

Although for smaller samples (small $N$), OOB is less reliable than validation, as $N$ increases OOB  is more efficient than cross-validation.<br>
OOB error and Test errors both stabilize with the number of trees.

Albeit a lower test error is a bit lucky! $\downarrow$

<img src="media/images/rf-oob-vs-test.png" alt="Digits Data- OOB v Test" style="width:70%;">

### Rounding up our thoughts

<div style="display: flex; gap: 2px;">
  <div style="flex: 1;"> 
  <p><span style="color:darkorange;"><b>What did we learn?</b></span></p>
  <ul>
  <li>Decision Trees</li>
  <ul>
    <li>Confident and Fast</li>
    <li>Good at learning structures within feature space</li>
    <li>But alone it overfits</li>
  </ul>
  <li>Bagging</li>
  <ul>
    <li>Wisdom of the crowd!</li>
  </ul>
  </ul>
  </div>
  <div style="flex: 1;">
  <p><span style="color:darkorange;"><b>What more on Decision Trees?</b></span></p>
  <ul>
    <li>A closer look at Regression Trees</li> 
    <li> Minimize Overfitting in Single Trees w Reduced Error Pruning</li>
    <li> Boosting</li>
    <ul>
    <li> AdaBoost</li>
    <li> Gradient Boosting with Regression Trees</li>
    <li> XGBoost for Classification and Regression</li>
    </ul>
  </ul>
  </div>
</div>

### Thats all folks!

<div style="text-align: center;">
    <img src="media/images/end-slide.jpg" alt="The End" scale="0.01;" style="width: 40%;">
</div>