<span style="background-color: #FFFF00">ensemble methods combine the predictions of several models</span> (e.g., several trees, in the case of random forests).

#### Gradient Boosting
<span style="background-color: #FFFF00">Gradient boosting is a method that goes through cycles to iteratively add models into an ensemble</span>.

It begins by <span style="background-color: #FFFF00">initializing the ensemble with a single model, whose predictions can be pretty naive</span>. (Even if its predictions are wildly inaccurate, subsequent additions to the ensemble will address those errors.)

Then, we start the cycle:

- First, we use the <span style="background-color: #FFFF00">current ensemble to generate predictions for each observation in the datase</span>t. To make a prediction, we <span style="background-color: #FFFF00">add the predictions from all models in the ensemble</span>.
- These <span style="background-color: #FFFF00">predictions are used to calculate a loss function (like mean squared error, for instance)</span>.
- Then, we <span style="background-color: #FFFF00">use the loss function to fit a new model that will be added to the ensemble</span>. Specifically, we <span style="background-color: #FFFF00">determine model parameters so that adding this new model to the ensemble will reduce the loss</span>. (Side note: The "gradient" in "gradient boosting" refers to the fact that we'll use gradient descent on the loss function to determine the parameters in this new model.)
- Finally, we add the new model to ensemble, and ...
... repeat!

In [12]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Read the data
X = pd.read_csv('train.csv', index_col='Id')
X_test_full = pd.read_csv('test.csv', index_col='Id')

# Remove rows with missing target, separate target from predictors
X.dropna(axis=0, subset=['SalePrice'], inplace=True)
y = X.SalePrice              
X.drop(['SalePrice'], axis=1, inplace=True)

# Break off validation set from training data
X_train_full, X_valid_full, y_train, y_valid = train_test_split(X, y, train_size=0.8, test_size=0.2,
                                                                random_state=0)

# "Cardinality" means the number of unique values in a column
# Select categorical columns with relatively low cardinality (convenient but arbitrary)
low_cardinality_cols = [cname for cname in X_train_full.columns if X_train_full[cname].nunique() < 10 and 
                        X_train_full[cname].dtype == "object"]

# Select numeric columns
numeric_cols = [cname for cname in X_train_full.columns if X_train_full[cname].dtype in ['int64', 'float64']]

# Keep selected columns only
my_cols = low_cardinality_cols + numeric_cols
X_train = X_train_full[my_cols].copy()
X_valid = X_valid_full[my_cols].copy()
X_test = X_test_full[my_cols].copy()

# One-hot encode the data (to shorten the code, we use pandas)
X_train = pd.get_dummies(X_train)
X_valid = pd.get_dummies(X_valid)
X_test = pd.get_dummies(X_test)
X_train, X_valid = X_train.align(X_valid, join='left', axis=1)
X_train, X_test = X_train.align(X_test, join='left', axis=1)

#### n_estimators
specifies how many <span style="background-color: #FFFF00">times to go through the modeling cycle described above</span>. It is <span style="background-color: #FFFF00">equal to the number of models</span> that we include in the ensemble.

Too <span style="background-color: #FFFF00">low a value causes underfitting</span>, which leads to <span style="background-color: #FFFF00">inaccurate predictions on both training data and test data</span>.

Too <span style="background-color: #FFFF00">high a value causes overfitting, which causes <span style="background-color: #FFFF00">accurate predictions on training data, but inaccurate predictions on test data</span> (which is what we care about).
Typical values range from 100-1000,

#### early_stopping_rounds¶
early_stopping_rounds offers a way to automatically <span style="background-color: #FFFF00">find the ideal value for n_estimators</span>. Early stopping causes the model to stop iterating <span style="background-color: #FFFF00">when the validation score stops improving</span>, even if we aren't at the hard stop for n_estimators. It's smart to set a high value for n_estimators and then use early_stopping_rounds to find the optimal time to stop iterating.

Since random chance sometimes causes a single round where <span style="background-color: #FFFF00">validation scores don't improve</span>, you need to <span style="background-color: #FFFF00">specify a number for how many rounds of straight deterioration to allow before stopping</span>. Setting early_stopping_rounds=5 is a reasonable choice. In this case, we stop after 5 straight rounds of deteriorating validation scores.

When using early_stopping_rounds, you also <span style="background-color: #FFFF00">need to set aside some data for calculating the validation scores - this is done by setting the eval_set parameter</span>
.

#### learning_rate
Instead of getting predictions by simply adding up the predictions from each component model, we can<span style="background-color: #FFFF00"> multiply the predictions from each model by a small number (known as the learning rate) before adding them in</span>.

This <span style="background-color: #FFFF00">means each tree we add to the ensemble helps us less. So, we can set a higher value for n_estimators without overfitting</span>. If we use early stopping, the appropriate number of trees will be determined automatically.

In general, a <span style="background-color: #FFFF00">small learning rate and large number of estimators will yield more accurate XGBoost models,</span> though it will also take the model longer to train since it does more iterations through the cycle. As default, <span style="background-color: #FFFF00">XGBoost sets learning_rate=0.1</span>.

### n_jobs
On larger datasets where runtime is a consideration, <span style="background-color: #FFFF00">parallelism to build your models faster</span>. It's common to set the parameter <span style="background-color: #FFFF00">n_jobs equal to the number of cores on your machine</span>. On smaller datasets, this won't help.

It's <span style="background-color: #FFFF00">useful in large datasets where you would otherwise spend a long time waiting during the fit command</span>.

In [30]:
from xgboost import XGBRegressor,XGBClassifier,XGBModel
from sklearn.metrics import mean_absolute_error
# Define the model
my_model_3 = XGBModel(n_estimators = 1000, random_state = 0,learining_rate = 0.1)

# Fit the model
my_model_3.fit(X_train,y_train,verbose = True)# Your code here

# Get predictions
predictions_3 = my_model_3.predict(X_valid)

# Calculate MAE
mae_3 = mean_absolute_error(predictions_3,y_valid)

# Uncomment to print MAE
print("Mean Absolute Error:" , mae_3)

Parameters: { "learining_rate" } are not used.



Mean Absolute Error: 18298.11336151541
