# [A Gentle Introduction to XGBoost for Applied Machine Learning](https://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/)

XGBoost is an implementation of gradient boosted decision trees designed for speed and performance.

> XGBoost stands for e__X__treme __G__radient __B__oosting.

## XGBoost Features
The library is laser focused on computational speed and model performance, it does offer a number of advanced features.

#### Model Features
Three main forms of gradient boosting are supported:
- __Gradient Boosting__ algorithm also called gradient boosting machine including the learning rate.
- __Stochastic Gradient Boosting__ with sub-sampling at the row, column and column per split levels.
- __Regularized Gradient Boosting__ with both L1 and L2 regularization.

#### System Features
The library provides a system for use in a range of computing environments, not least:
- __Parallelization__ of tree construction using all of your CPU cores during training.
- __Distributed Computing__ for training very large models using a cluster of machines.
- __Out-of-Core Computing__ for very large datasets that don’t fit into memory.
- __Cache Optimization__ of data structures and algorithm to make best use of hardware.

#### Algorithm Features

The implementation of the algorithm was engineered for efficiency of compute time and memory resources. A design goal was to make the best use of available resources to train the model. Some key algorithm implementation features include:
- __Sparse Aware__ implementation with automatic handling of missing data values.
- __Block Structure__ to support the parallelization of tree construction.
- __Continued Training__ so that you can further boost an already fitted model on new data.

## Why Use XGBoost?
The two reasons to use XGBoost:
1. Execution Speed
2. Model Performance

***
# [XGBoost Algorithm: Long May She Reign! The new queen of Machine Learning algorithms](https://towardsdatascience.com/https-medium-com-vishalmorde-xgboost-algorithm-long-she-may-rein-edd9f99be63d)

XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a [gradient boosting](https://en.wikipedia.org/wiki/Gradient_boosting) framework. In prediction problems involving unstructured data (images, text, etc.) artificial neural networks tend to outperform all other algorithms or frameworks. However, when it comes to small-to-medium structured/tabular data, decision tree based algorithms are considered best-in-class right now.

![](images/1.PNG)

### How to build an intuition for XGBoost?

A simple analogy to better understand the evolution of tree-based algorithms.

Imagine that you are a hiring manager interviewing several candidates with excellent qualifications. Each step of the evolution of tree-based algorithms can be viewed as a version of the interview process.

1. __Decision Tree__: Every hiring manager has a set of criteria such as education level, number of years of experience, interview performance. A decision tree is analogous to a hiring manager interviewing candidates based on his or her own criteria.
2. __Bagging__: Now imagine instead of a single interviewer, now there is an interview panel where each interviewer has a vote. Bagging or bootstrap aggregating involves combining inputs from all interviewers for the final decision through a democratic voting process.
3. __Random Forest__: It is a bagging-based algorithm with a key difference wherein only a subset of features is selected at random. In other words, every interviewer will only test the interviewee on certain randomly selected qualifications (e.g. a technical interview for testing programming skills and a behavioral interview for evaluating non-technical skills).
4. __Boosting__: This is an alternative approach where each interviewer alters the evaluation criteria based on feedback from the previous interviewer. This ‘boosts’ the efficiency of the interview process by deploying a more dynamic evaluation process.
5. __Gradient Boosting:__ A special case of boosting where errors are minimized by gradient descent algorithm e.g. the strategy consulting firms leverage by using case interviews to weed out less qualified candidates.
6. __XGBoost:__ Think of XGBoost as gradient boosting on ‘steroids’ (well it is called ‘Extreme Gradient Boosting’ for a reason!). It is a perfect combination of software and hardware optimization techniques to yield superior results using less computing resources in the shortest amount of time.

### Why does XGBoost perform so well?
XGBoost and Gradient Boosting Machines (GBMs) are both ensemble tree methods that apply the principle of boosting weak learners (CARTs generally) using the gradient descent architecture. However, XGBoost improves upon the base GBM framework through systems optimization and algorithmic enhancements.

![](images/2.PNG)

#### System Optimization:

1. __Parallelization:__ XGBoost approaches the process of sequential tree building using parallelized implementation. This is possible due to the interchangeable nature of loops used for building base learners; the outer loop that enumerates the leaf nodes of a tree, and the second inner loop that calculates the features. This nesting of loops limits parallelization because without completing the inner loop (more computationally demanding of the two), the outer loop cannot be started. Therefore, to improve run time, the order of loops is interchanged using initialization through a global scan of all instances and sorting using parallel threads. This switch improves algorithmic performance by offsetting any parallelization overheads in computation.

2. __Tree Pruning:__ The stopping criterion for tree splitting within GBM framework is greedy in nature and depends on the negative loss criterion at the point of split. XGBoost uses ‘max_depth’ parameter as specified instead of criterion first, and starts pruning trees backward. This ‘depth-first’ approach improves computational performance significantly.

3. __Hardware Optimization:__ This algorithm has been designed to make efficient use of hardware resources. This is accomplished by cache awareness by allocating internal buffers in each thread to store gradient statistics. Further enhancements such as ‘out-of-core’ computing optimize available disk space while handling big data-frames that do not fit into memory.

#### Algorithmic Enhancements:

1. __Regularization:__ It penalizes more complex models through both LASSO (L1) and Ridge (L2) regularization to prevent overfitting.
2. __Sparsity Awareness:__ XGBoost naturally admits sparse features for inputs by automatically ‘learning’ best missing value depending on training loss and handles different types of sparsity patterns in the data more efficiently.
3. __Weighted Quantile Sketch:__ XGBoost employs the distributed weighted Quantile Sketch algorithm to effectively find the optimal split points among weighted datasets.
4. __Cross-validation:__ The algorithm comes with built-in cross-validation method at each iteration, taking away the need to explicitly program this search and to specify the exact number of boosting iterations required in a single run.