Gradient boosting **goes through cycles to iteratively add models into an ensemble**

1. First, it initialises the ensemble with a single model 
2. Then, the ensemble is used to generate predictions. Each prediction is a function of all ensemble model's predictions
3. Error is calculated with a loss function
4. Loss functio is used to fit another model with parameters tweaked to reduce the loss
5. Add the model to the ensemble and repeat the cycle

Gradient in gradient boosting refers to the gradient descent that is done to mimise the loss

In [16]:
import pandas as pd
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
from sklearn.metrics import mean_absolute_error

data = pd.read_csv("/Users/felix/GitHub/DataSci/ml/KaggleML/melb_data.csv")

In [15]:
# Selecting X and y as a subset of data
X = data[['Rooms', 'Distance', 'Landsize', 'BuildingArea', 'YearBuilt']]
y = data.Price

# Separating into training and testing
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size = 0.2)

# Instantiation and fitting
model = XGBRegressor()
model.fit(X_train, y_train)

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
       importance_type='gain', interaction_constraints='',
       learning_rate=0.300000012, max_delta_step=0, max_depth=6,
       min_child_weight=1, missing=nan, monotone_constraints='()',
       n_estimators=100, n_jobs=0, num_parallel_tree=1,
       objective='reg:squarederror', random_state=0, reg_alpha=0,
       reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',
       validate_parameters=1, verbosity=None)

In [17]:
# Making predictions
predictions = model.predict(X_valid)

#Â Evaluating error
print(f"Mean absolute error is \t {str(mean_absolute_error(predictions, y_valid))}")

Mean absolute error is 	 247544.4160472662


## Important Parameters in XBGRegressor

**_n.estimators_**: Number of models included in the cycle. Typically betwen 100-1000

**_early stopping rounds_**: Ceases iterations when the improvements are 0. Smart to set n_estimators high and then use early stopping rounds to find the optimal time to stop iterating 

**_learning rate_**: Multiplies models scores by a number so that each tree we add helps us less, allowing us to set high values for n estimators without overfitting. In general, small learning rate and high no. estimators yields an accurate model (but it takes longer to train)

**_n jobs_**: Uses parallelism to train models faster. Set equal to the number of cores on the machine. No point on smaller datasets. Doesn't actually improve model performance.

In [18]:
# Here is how the parameters are set

my_model = XGBRegressor(n_estimators=1000, learning_rate=0.05, n_jobs=4)
my_model.fit(X_train, y_train, 
             early_stopping_rounds=5, 
             eval_set=[(X_valid, y_valid)], 
             verbose=False)

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
       importance_type='gain', interaction_constraints='',
       learning_rate=0.05, max_delta_step=0, max_depth=6,
       min_child_weight=1, missing=nan, monotone_constraints='()',
       n_estimators=1000, n_jobs=4, num_parallel_tree=1,
       objective='reg:squarederror', random_state=0, reg_alpha=0,
       reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',
       validate_parameters=1, verbosity=None)