## Chapter 14: Tree-based learners

# 14.5  Gradient Boosting

You can toggle the code on and off in this presentation via the button below.

In [2]:
from IPython.display import HTML

HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')

- The most popular way to build deep regression and classification trees via addition is by summing together shallower ones, growing the tree sequentially one shallow member at a time.

- This is an instance of the general boosting method introduced in Chapter 11, often referred to as gradient boosting.

- At the $m$th round of boosting we begin with a model consisting of a fully tuned linear combination of $m−1$ units of a universal approximator

$$\text{model}_{m−1}(\mathbf{x},\Theta_{m-1}) = f_{s_1}^{\star}(\mathbf{x}) + f_{s_2}^{\star}(\mathbf{x}) + \cdots + f_{s_{m-1}}^{\star}(\mathbf{x})$$

- Each function in this sum is a tree-based unit (e.g., a stump) whose split point(s) and leaf values have been chosen optimally. 

- The $m$th round of boosting involves a search over a range of suitable candidates and a corresponding optimization of each candidate's leaf values.

- To construct the next candidate model we add a prospective unit $f_{s_{m}}(\mathbf{x})$ to $\text{model}_{m−1}(\mathbf{x},\Theta_{m-1})$, forming 

$$\text{model}_{m}(\mathbf{x},\Theta_{m}) = \text{model}_{m−1}(\mathbf{x},\Theta_{m-1}) + f_{s_{m}}(\mathbf{x})$$

- We then optimize the leaf values of $f_{s_{m}}(\mathbf{x})$ using an appropriate cost function (e.g., the Least Squares cost for regression and Softmax cost for classification).

Boosting, as described in Section 11.3, is an extremely popular cross-validation technique for building effective tree-based models.  Indeed trees - and most commonly *stumps* - are often the most popular universal approximator used when applying boosting, with this pairing often being referred to as *gradient boosting*.   The use of stumps in particular - with their low capacity and simple behavior - enables a high resolution search for the right capacity model (as discussed in Section 11.3.2).

In [1]:
## This code cell will not be shown in the HTML version of this notebook
# imports from custom library
import sys
sys.path.append('../../')
import autograd.numpy as np
from mlrefined_libraries import nonlinear_superlearn_library as nonlib
datapath = '../../mlrefined_datasets/nonlinear_superlearn_datasets/'
 
# this is needed to compensate for %matplotlib notebook's tendancy to blow up images when plotted inline
from matplotlib import rcParams
rcParams['figure.autolayout'] = True
%matplotlib notebook

# plotting
import matplotlib.pyplot as plt
import copy

%load_ext autoreload
%autoreload 2

Boosting with regression stumps is often interpreted as successive rounds of 'fitting to the residual' of a regression dataset (as illustrated below).

In [2]:
## This code cell will not be shown in the HTML version of this notebook
# load in dataset
csvname = datapath + 'universal_regression_samples_0.csv'
csvname = datapath + 'noisy_sin_sample.csv'

data = np.loadtxt(csvname,delimiter = ',')
x = copy.deepcopy(data[:-1,:])
y = copy.deepcopy(data[-1:,:] )

# boosting procedure
num_units = 40
runs2 = []
for j in range(num_units):    
    # import the v1 library
    mylib2 = nonlib.boost_lib3.stump_booster.Setup(x,y)
    
    # choose normalizer
    mylib2.choose_normalizer(name = 'standard')

    # choose normalizer
    mylib2.make_train_valid_split(train_portion = 1)

    # choose cost|
    mylib2.choose_cost(name = 'least_squares')

    # choose optimizer
    mylib2.choose_optimizer('newtons_method',max_its=1)
    
    # run boosting
    mylib2.boost(1,verbose=False)
    mylib2.model = mylib2.models[-1]

    # add model to list
    runs2.append(copy.deepcopy(mylib2))
    
    # cut off output given model
    normalizer = mylib2.normalizer
    ind = np.argmin(mylib2.train_cost_vals[0])
    y_pred =  mylib2.models[-1](mylib2.normalizer(x))
    y -= y_pred

# animate the business
frames = num_units
demo2 = nonlib.boosting_regression_animators_v3.Visualizer(csvname)
demo2.animate_boosting(runs2,frames)