## Gradient Boosting for Regression
#### Lewis Sears

Using tree based methods for regression, we now will write an algorithm that predicts continuous data using a method called gradient boosting. Before the algorithm, why don't we just observe how much more powerful the gradient boosted regressor actually is using sci-kit learn's *GradientBoostingRegressor*.

In [124]:
#real estate data for testing
df = pd.read_csv('car_data.csv')
df_cars = df[['Year', 'Selling_Price', 'Kms_Driven','Owner']]
target = df['Present_Price']

In [125]:
import sklearn.ensemble as ml
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
X_train, X_test, y_train, y_test = train_test_split(df_cars, target, test_size=0.33, random_state=42)

In [234]:
gbr = ml.GradientBoostingRegressor(loss = 'ls', learning_rate = 0.75, n_estimators = 100, max_depth = 3)
lr = LinearRegression()

In [235]:
gbr.fit(X_train, y_train)
lr.fit(X_train, y_train)

LinearRegression()

In [236]:
boosted_scores = gbr.score(X_train, y_train), gbr.score(X_test, y_test)
reg_scores = lr.score(X_train, y_train), lr.score(X_test, y_test)

In [237]:
boosted_scores

(0.9999661753150955, 0.8595008046607651)

In [238]:
reg_scores

(0.8504954757170343, 0.7991031805544824)

So, even though it overfit the training data, the boosted tree regression far outperformed standard linear regression. It is a important to note that boosted models primarily reduce bias of our model, so it's common that they overfit on the training data. In contrast, ensemble methods that use bagging tend to reduce variance at the expense of some bias by training parallel models. This is a critical distinction in the two methods.    

### Our Algorithm

In [None]:
import numpy as np
import pandas as pd
class GradientBoostedRegression(object):
    '''Our boosted regression algorithm using tree based regression models.'''
    
    def __init__(self, learning_rate, tree_depth):
        '''
        Some initial Hyper parameters:
        learning_rate: A number between 0 and 1 that scales the added output of a new tree
        tree_depth: The number of trees will we stack together 
        '''
        self.learning_rate = learning_rate
        self.depth = tree_depth                          

    #define the loss function                  
    def Loss_Function(target, predicted):
        return 0.5*np.sum((target-predicted))**2
    #The derivative is really straightforward
    def derivative_loss_function(target, predicted):
        return -(target - predicted)
        
    def fit(self, train_data, train_target):
        '''Since this is a regression algorithm, the train_target should be a good continuous target.'''
        
        data = np.array(train_data)
        target = np.array(train_target)
        
        #initialize residuals:
        residuals = target - np.mean(target)
        
        return self