In [None]:
"""
But what makes XGBoost so popular?

Speed and performance : Originally written in C++, it is comparatively faster than other ensemble classifiers.

Core algorithm is parallelizable : Because the core XGBoost algorithm is parallelizable it can harness the power of multi-core computers. 
It is also parallelizable onto GPU’s and across networks of computers making it feasible to train on very large datasets as well.

Consistently outperforms other algorithm methods : It has shown better performance on a variety of machine learning benchmark datasets.

Wide variety of tuning parameters : XGBoost internally has parameters for cross-validation, regularization, user-defined objective functions, 
missing values, tree parameters, scikit-learn compatible API etc.


XGBoost (Extreme Gradient Boosting) belongs to a family of boosting algorithms and uses the gradient boosting (GBM) framework at its core. 
It is an optimized distributed gradient boosting library. But wait, what is boosting? Well, keep on reading.



Using XGBoost in Python
First of all, just like what you do with any other dataset, you are going to import the Boston Housing dataset and store it in a variable called boston. 
To import it from scikit-learn you will need to run this snippet.
"""

from sklearn.datasets import load_boston
boston = load_boston()
print(boston.keys())
print(boston.data.shape)
print(boston.feature_names)
print(boston.DESCR)

In [None]:
import pandas as pd

data = pd.DataFrame(boston.data)
data.columns = boston.feature_names

In [None]:
data['PRICE'] = boston.target

"""
If you plan to use XGBoost on a dataset which has categorical features you may want to consider applying some encoding (like one-hot encoding) to such features before training the
model. Also, if you have some missing values such as NA in the dataset you may or may not do a separate treatment for them, because XGBoost is capable of handling missing values 
internally. You can check out this link if you wish to know more on this.
"""

import xgboost as xgb
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np
X, y = data.iloc[:,:-1],data.iloc[:,-1]


In [None]:
data_dmatrix = xgb.DMatrix(data=X,label=y)

In [None]:
"""
XGBoost's hyperparameters
At this point, before building the model, you should be aware of the tuning parameters that XGBoost provides. Well, there are a plethora of tuning parameters for tree-based 
learners in XGBoost and you can read all about them here. But the most common ones that you should know are:

-learning_rate: step size shrinkage used to prevent overfitting. Range is [0,1]
-max_depth: determines how deeply each tree is allowed to grow during any boosting round.
-subsample: percentage of samples used per tree. Low value can lead to underfitting.
-colsample_bytree: percentage of features used per tree. High value can lead to overfitting.
-n_estimators: number of trees you want to build.
-objective: determines the loss function to be used like reg:linear for regression problems, reg:logistic for classification problems with only decision, binary:logistic for 
classification problems with probability.


XGBoost also supports regularization parameters to penalize models as they become more complex and reduce them to simple (parsimonious) models.

-gamma: controls whether a given node will split based on the expected reduction in loss after the split. A higher value leads to fewer splits. Supported only for tree-based 
learners.
-alpha: L1 regularization on leaf weights. A large value leads to more regularization.
-lambda: L2 regularization on leaf weights and is smoother than L1 regularization.
-It's also worth mentioning that though you are using trees as your base learners, you can also use XGBoost's relatively less popular linear base learners and one other tree 
learner known as dart. All you have to do is set the booster parameter to either gbtree (default),gblinear or dart.

Now, you will create the train and test set for cross-validation of the results using the train_test_split function from sklearn's model_selection module with test_size size equal 
to 20% of the data. Also, to maintain reproducibility of the results, a random_state is also assigned.
"""

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

In [None]:
xg_reg = xgb.XGBRegressor(objective ='reg:linear', colsample_bytree = 0.3, learning_rate = 0.1,
                max_depth = 5, alpha = 10, n_estimators = 10)

In [None]:
xg_reg.fit(X_train,y_train)

preds = xg_reg.predict(X_test)

rmse = np.sqrt(mean_squared_error(y_test, preds))
print("RMSE: %f" % (rmse))

In [None]:
"""
k-fold Cross Validation using XGBoost
In order to build more robust models, it is common to do a k-fold cross validation where all the entries in the original training dataset are used for both training as well 
as validation. Also, each entry is used for validation just once. XGBoost supports k-fold cross validation via the cv() method. All you have to do is specify the nfolds parameter, 
which is the number of cross validation sets you want to build. Also, it supports many other parameters (check out this link) like:

-num_boost_round: denotes the number of trees you build (analogous to n_estimators)
-metrics: tells the evaluation metrics to be watched during CV
-as_pandas: to return the results in a pandas DataFrame.
-early_stopping_rounds: finishes training of the model early if the hold-out metric ("rmse" in our case) does not improve for a given number of rounds.
-seed: for reproducibility of results.

"""

params = {"objective":"reg:linear",'colsample_bytree': 0.3,'learning_rate': 0.1,
                'max_depth': 5, 'alpha': 10}

cv_results = xgb.cv(dtrain=data_dmatrix, params=params, nfold=3,
                    num_boost_round=50,early_stopping_rounds=10,metrics="rmse", as_pandas=True, seed=123)

cv_results.head()
print((cv_results["test-rmse-mean"]).tail(1))

In [None]:
"""
Visualize Boosting Trees and Feature Importance
"""

xg_reg = xgb.train(params=params, dtrain=data_dmatrix, num_boost_round=10)

In [None]:
import matplotlib.pyplot as plt

xgb.plot_tree(xg_reg,num_trees=0)
plt.rcParams['figure.figsize'] = [50, 10]
plt.show()

In [None]:
xgb.plot_importance(xg_reg)
plt.rcParams['figure.figsize'] = [5, 5]
plt.show()

In [None]:
"""
As you can see the feature RM has been given the highest importance score among all the features. Thus XGBoost also gives you a way to do Feature Selection. Isn't this brilliant?
"""