## Hyperparameter Tuning in XGBoost

Load the dataset with pandas

In [1]:
import pandas as pd
file = "./facebook+comment+volume+dataset/Training/Features_Variant_1.csv"
df = pd.read_csv(file, header=None)
df.sample(n=5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,44,45,46,47,48,49,50,51,52,53
5399,356059,0,27989,8,0.0,422.0,56.713568,42.0,57.458776,0.0,...,0,0,0,0,0,1,0,0,0,2
27700,5365996,40729,102442,9,0.0,740.0,60.61215,33.0,90.050007,0.0,...,0,1,1,0,0,0,0,0,0,2
16740,309999,28,43647,9,0.0,532.0,73.075949,45.0,84.541769,0.0,...,0,0,0,0,0,0,0,0,1,0
29069,50469,21748,2456,32,0.0,47.0,5.820312,2.5,8.418203,0.0,...,1,0,0,1,0,0,0,0,0,0
4583,104037,0,30,14,0.0,113.0,6.748691,4.0,11.213291,0.0,...,0,1,0,0,1,0,0,0,0,0


Check the size of our dataset

In [2]:
print("Dataset has {} entries and {} features".format(*df.shape))

Dataset has 40949 entries and 54 features


In [3]:
X, y = df.loc[:,:52].values, df.loc[:,53].values

We keep 90% of the dataset for training, and 10% (or a .1 part) for testing.

In [4]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=.1, random_state=42)

### Loading data into DMatrices

As mentioned before, in order to use the native API for XGBoost, we will first need to build DMatrices.

In [5]:
import xgboost as xgb # conda install xgboost --y
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

### Building a baseline model

In [6]:
from sklearn.metrics import mean_absolute_error

In [7]:
import numpy as np
# "Learn" the mean from the training data
mean_train = np.mean(y_train)
# Get predictions on the test set
baseline_predictions = np.ones(y_test.shape) * mean_train
# Compute MAE
mae_baseline = mean_absolute_error(y_test, baseline_predictions)
print("Baseline MAE is {:.2f}".format(mae_baseline))

Baseline MAE is 11.31


### Training and Tuning an XGBoost model

#### The params dictionary

Let’s define it with default values for the moment.

In [8]:
params = {
    # Parameters that we are going to tune.
    'max_depth':6,
    'min_child_weight': 1,
    'eta':.3,
    'subsample': 1,
    'colsample_bytree': 1,
    # Other parameters
    'objective':'reg:linear',
}

**Parameters** num_boost_round **and** early_stopping_rounds

The first parameter we will look at is not part of the params dictionary, but will be passed as a standalone argument to the training method. This parameter is called num_boost_round and corresponds to the number of boosting rounds or trees to build. Its optimal value highly depends on the other parameters, and thus it should be re-tuned each time you update a parameter. You could do this by tuning it together with all parameters in a grid-search, but it requires a lot of computational effort.

To do so, we define a test dataset and a metric that is used to assess performance at each round. If performance haven’t improved for N rounds (N is defined by the variable early_stopping_round), we stop the training and keep the best number of boosting rounds. Let's see how to use it.

First, we need to add the evaluation metric we are interested in to our params dictionary.

In [9]:
params['eval_metric'] = "mae"

We still need to pass a num_boost_round which corresponds to the maximum number of boosting rounds that we allow. We set it to a large value hoping to find the optimal number of rounds before reaching it, if we haven't improved performance on our test dataset in early_stopping_round rounds

In [10]:
num_boost_round = 999

In order to automatically find the best number of boosting rounds, we need to pass extra parameters on top of the params dictionary, the training DMatrix and num_boost_round:

- evals: a list of pairs (test_dmatrix, name_of_test). Here we will use our dtest DMatrix.
- early_stopping_rounds: The number of rounds without improvements after which we should stop, here we set it to 10.

In [11]:
model = xgb.train(
    params,
    dtrain,
    num_boost_round=num_boost_round,
    evals=[(dtest, "Test")],
    early_stopping_rounds=10
)

[0]	Test-mae:8.58568
[1]	Test-mae:6.96160
[2]	Test-mae:5.86780
[3]	Test-mae:5.29799
[4]	Test-mae:4.91546
[5]	Test-mae:4.67942
[6]	Test-mae:4.51087
[7]	Test-mae:4.43763
[8]	Test-mae:4.37738
[9]	Test-mae:4.32590
[10]	Test-mae:4.28929
[11]	Test-mae:4.24682
[12]	Test-mae:4.20685
[13]	Test-mae:4.18984
[14]	Test-mae:4.16802
[15]	Test-mae:4.14928
[16]	Test-mae:4.13792
[17]	Test-mae:4.12976
[18]	Test-mae:4.12585
[19]	Test-mae:4.12675
[20]	Test-mae:4.12793
[21]	Test-mae:4.11582
[22]	Test-mae:4.11287
[23]	Test-mae:4.12059
[24]	Test-mae:4.11311
[25]	Test-mae:4.10039
[26]	Test-mae:4.10567
[27]	Test-mae:4.11535
[28]	Test-mae:4.11051
[29]	Test-mae:4.11479
[30]	Test-mae:4.11239
[31]	Test-mae:4.13754
[32]	Test-mae:4.13697
[33]	Test-mae:4.12812
[34]	Test-mae:4.14731
[35]	Test-mae:4.14775




In [12]:
print("Best MAE: {:.2f} with {} rounds".format(
                 model.best_score,
                 model.best_iteration+1))

Best MAE: 4.10 with 26 rounds


As you can see we stopped before reaching the maximum number of boosting rounds, that’s because after the 26th tree, adding more rounds did not lead to improvements of MAE on the test dataset.
Let’s keep this MAE in mind for later, this is the MAE of our model with default parameters and an optimal number of boosting rounds, on the test dataset. As you can see, we are already beating the baseline MAE 11.31.

### Using XGBoost’s CV

In order to tune the other hyperparameters, we will use the cv function from XGBoost. It allows us to run cross-validation on our training dataset and returns a mean MAE score.

We need to pass it:

- params: our dictionary of parameters.
- our dtrain matrix.
- num_boost_round: number of boosting rounds. Here we will use a large number again and count on early_stopping_rounds to find the optimal number of rounds before reaching the maximum.
- seed: random seed. It's important to set a seed here, to ensure we are using the same folds for each step so we can properly compare the scores with different parameters.
- nfold: the number of folds to use for cross-validation
- metrics: the metrics to use to evaluate our model, here we use MAE.

As you can see, we don’t need to pass a test dataset here. It’s because the cross-validation function is splitting the train dataset into nfolds and iteratively keeps one of the folds for test purposes.

In [13]:
cv_results = xgb.cv(
    params,
    dtrain,
    num_boost_round=num_boost_round,
    seed=42,
    nfold=5,
    metrics={'mae'},
    early_stopping_rounds=10
)
cv_results

Unnamed: 0,train-mae-mean,train-mae-std,test-mae-mean,test-mae-std
0,8.288764,0.11709,8.372614,0.236231
1,6.563341,0.106052,6.79364,0.256606
2,5.415179,0.075973,5.822265,0.250534
3,4.643518,0.057167,5.193959,0.247316
4,4.131201,0.051492,4.823537,0.227656
5,3.776277,0.042514,4.578252,0.217193
6,3.534382,0.038187,4.442351,0.218526
7,3.356895,0.032615,4.347466,0.217143
8,3.218539,0.031009,4.298237,0.21851
9,3.117893,0.033844,4.260752,0.216624


In [14]:
num_boost_round

999

cv returns a table where the rows correspond to the number of boosting trees used, here again, we stopped before the 999 rounds (fortunately!).

The 4 columns correspond to the mean and standard deviation of MAE on the test dataset and on the train dataset. For this tutorial we will only try to improve the mean test MAE. We can get the best MAE score from cv with:

In [15]:
cv_results['test-mae-mean'].min()

4.1010219056723916

Now that we know how to use cv, we are ready to start tuning! We will first tune our parameters to minimize the MAE on cross-validation, and then check the performance of our model on the test dataset.

**Parameters** max_depth **and** min_child_weight

Those parameters add constraints on the architecture of the trees.

- max_depth is the maximum number of nodes allowed from the root to the farthest leaf of a tree. Deeper trees can model more complex relationships by adding more nodes, but as we go deeper, splits become less relevant and are sometimes only due to noise, causing the model to overfit.

- min_child_weight is the minimum weight (or number of samples if all samples have a weight of 1) required in order to create a new node in the tree. A smaller min_child_weight allows the algorithm to create children that correspond to fewer samples, thus allowing for more complex trees, but again, more likely to overfit.

Thus, those parameters can be used to control the complexity of the trees. It is important to tune them together in order to find a good trade-off between model bias and variance

Let’s make a list containing all the combinations max_depth/min_child_weight that we want to try.

In [16]:
# You can try wider intervals with a larger step between
# each value and then narrow it down. Here after several
# iteration I found that the optimal value was in the
# following ranges.
gridsearch_params = [
    (max_depth, min_child_weight)
    for max_depth in range(9,12)
    for min_child_weight in range(5,8)
] # 9 elements

Let’s run cross validation on each of those pairs. It can take some time…

In [17]:
# Define initial best params and MAE
min_mae = float("Inf")
best_params = None
for max_depth, min_child_weight in gridsearch_params:
    print("CV with max_depth={}, min_child_weight={}".format(
                             max_depth,
                             min_child_weight))
    # Update our parameters
    params['max_depth'] = max_depth
    params['min_child_weight'] = min_child_weight
    # Run CV
    cv_results = xgb.cv(
        params,
        dtrain,
        num_boost_round=num_boost_round,
        seed=42,
        nfold=5,
        metrics={'mae'},
        early_stopping_rounds=10
    )
    # Update best MAE
    mean_mae = cv_results['test-mae-mean'].min()
    boost_rounds = cv_results['test-mae-mean'].argmin()
    print("\tMAE {} for {} rounds".format(mean_mae, boost_rounds))
    if mean_mae < min_mae:
        min_mae = mean_mae
        best_params = (max_depth,min_child_weight)
print("Best params: {}, {}, MAE: {}".format(best_params[0], best_params[1], min_mae))

CV with max_depth=9, min_child_weight=5




	MAE 4.188793890644826 for 19 rounds
CV with max_depth=9, min_child_weight=6




	MAE 4.191833542466357 for 11 rounds
CV with max_depth=9, min_child_weight=7
	MAE 4.212599884308445 for 12 rounds
CV with max_depth=10, min_child_weight=5
	MAE 4.210564874878322 for 16 rounds
CV with max_depth=10, min_child_weight=6




	MAE 4.230052083374737 for 16 rounds
CV with max_depth=10, min_child_weight=7




	MAE 4.222578605588366 for 12 rounds
CV with max_depth=11, min_child_weight=5
	MAE 4.189796710935704 for 14 rounds
CV with max_depth=11, min_child_weight=6




	MAE 4.212010991899288 for 13 rounds
CV with max_depth=11, min_child_weight=7
	MAE 4.210864445213987 for 14 rounds
Best params: 9, 5, MAE: 4.188793890644826


We get the best score with a max_depth of 9 and min_child_weight of 5, so let's update our params

In [18]:
params['max_depth'] = 9
params['min_child_weight'] = 5

**Parameters** subsample **and** colsample_bytree

Those parameters control the sampling of the dataset that is done at each boosting round.

Instead of using the whole training set every time, we can build a tree on slightly different data at each step, which makes it less likely to overfit to a single sample or feature.

- subsample corresponds to the fraction of observations (the rows) to subsample at each step. By default it is set to 1 meaning that we use all rows.
- colsample_bytree corresponds to the fraction of features (the columns) to use. By default it is set to 1 meaning that we will use all features.

Let’s see if we can get better results by tuning those parameters together.

In [19]:
gridsearch_params = [
    (subsample, colsample)
    for subsample in [i/10. for i in range(7,11)]
    for colsample in [i/10. for i in range(7,11)]
] # 16 elements

In [20]:
print(gridsearch_params)

[(0.7, 0.7), (0.7, 0.8), (0.7, 0.9), (0.7, 1.0), (0.8, 0.7), (0.8, 0.8), (0.8, 0.9), (0.8, 1.0), (0.9, 0.7), (0.9, 0.8), (0.9, 0.9), (0.9, 1.0), (1.0, 0.7), (1.0, 0.8), (1.0, 0.9), (1.0, 1.0)]


This can take some time…

In [21]:
min_mae = float("Inf")
best_params = None
# We start by the largest values and go down to the smallest
for subsample, colsample in reversed(gridsearch_params):
    print("CV with subsample={}, colsample={}".format(
                             subsample,
                             colsample))
    # We update our parameters
    params['subsample'] = subsample
    params['colsample_bytree'] = colsample
    # Run CV
    cv_results = xgb.cv(
        params,
        dtrain,
        num_boost_round=num_boost_round,
        seed=42,
        nfold=5,
        metrics={'mae'},
        early_stopping_rounds=10
    )
    # Update best score
    mean_mae = cv_results['test-mae-mean'].min()
    boost_rounds = cv_results['test-mae-mean'].argmin()
    print("\tMAE {} for {} rounds".format(mean_mae, boost_rounds))
    if mean_mae < min_mae:
        min_mae = mean_mae
        best_params = (subsample,colsample)
print("Best params: {}, {}, MAE: {}".format(best_params[0], best_params[1], min_mae))

CV with subsample=1.0, colsample=1.0




	MAE 4.230052083374737 for 16 rounds
CV with subsample=1.0, colsample=0.9




	MAE 4.327802396522925 for 12 rounds
CV with subsample=1.0, colsample=0.8
	MAE 4.46603263111348 for 17 rounds
CV with subsample=1.0, colsample=0.7




	MAE 4.67037331882947 for 12 rounds
CV with subsample=0.9, colsample=1.0
	MAE 4.252509337708058 for 14 rounds
CV with subsample=0.9, colsample=0.9




	MAE 4.286404976346348 for 13 rounds
CV with subsample=0.9, colsample=0.8
	MAE 4.475591388381757 for 12 rounds
CV with subsample=0.9, colsample=0.7




	MAE 4.612627758676085 for 12 rounds
CV with subsample=0.8, colsample=1.0
	MAE 4.210677791347896 for 10 rounds
CV with subsample=0.8, colsample=0.9
	MAE 4.334051620788318 for 11 rounds
CV with subsample=0.8, colsample=0.8




	MAE 4.454363277746388 for 12 rounds
CV with subsample=0.8, colsample=0.7
	MAE 4.619639843454797 for 12 rounds
CV with subsample=0.7, colsample=1.0




	MAE 4.231604512230874 for 11 rounds
CV with subsample=0.7, colsample=0.9
	MAE 4.319851888425975 for 11 rounds
CV with subsample=0.7, colsample=0.8




	MAE 4.489076813622117 for 12 rounds
CV with subsample=0.7, colsample=0.7
	MAE 4.689236656114426 for 12 rounds
Best params: 0.8, 1.0, MAE: 4.210677791347896


Again, we update our params dictionary.

In [22]:
params['subsample'] = .8
params['colsample_bytree'] = 1.

**Parameter** ETA

The ETA parameter controls the learning rate. It corresponds to the shrinkage of the weights associated to features after each round, in other words it defines the amount of "correction" we make at each step (remember how each boosting round is correcting the errors of the previous?).

In practice, having a lower eta makes our model more robust to overfitting thus, usually, the lower the learning rate, the best. But with a lower eta, we need more boosting rounds, which takes more time to train, sometimes for only marginal improvements. Let's try a couple of values here, and time them with the notebook command:

In [23]:
%time
# This can take some time…
min_mae = float("Inf")
best_params = None
for eta in [.3, .2, .1, .05, .01, .005]:
    print("CV with eta={}".format(eta))
    # We update our parameters
    params['eta'] = eta
    # Run and time CV
    %time cv_results = xgb.cv(params, dtrain, num_boost_round=num_boost_round, seed=42, nfold=5, metrics=['mae'], early_stopping_rounds=10)
    # Update best score
    mean_mae = cv_results['test-mae-mean'].min()
    boost_rounds = cv_results['test-mae-mean'].argmin()
    print("\tMAE {} for {} rounds\n".format(mean_mae, boost_rounds))
    if mean_mae < min_mae:
        min_mae = mean_mae
        best_params = eta
print("Best params: {}, MAE: {}".format(best_params, min_mae))

CPU times: user 9 µs, sys: 19 µs, total: 28 µs
Wall time: 3.81 µs
CV with eta=0.3




CPU times: user 3.3 s, sys: 2.3 s, total: 5.6 s
Wall time: 423 ms
	MAE 4.210677791347896 for 10 rounds

CV with eta=0.2
CPU times: user 4.75 s, sys: 3.14 s, total: 7.89 s
Wall time: 590 ms
	MAE 4.104137217788904 for 21 rounds

CV with eta=0.1




CPU times: user 7.9 s, sys: 5.06 s, total: 13 s
Wall time: 1 s
	MAE 3.998718575226202 for 44 rounds

CV with eta=0.05




CPU times: user 15.3 s, sys: 9.85 s, total: 25.1 s
Wall time: 1.88 s
	MAE 3.9605776315872774 for 100 rounds

CV with eta=0.01




CPU times: user 1min 10s, sys: 46.3 s, total: 1min 56s
Wall time: 8.78 s
	MAE 3.905728601651572 for 509 rounds

CV with eta=0.005




CPU times: user 2min 15s, sys: 1min 28s, total: 3min 43s
Wall time: 16.9 s
	MAE 3.907620621708253 for 993 rounds

Best params: 0.01, MAE: 3.905728601651572


In [24]:
params['eta'] = .01

### Results

Here is how our final dictionary of parameters looks like:

In [25]:
params

{'max_depth': 10,
 'min_child_weight': 6,
 'eta': 0.01,
 'subsample': 0.8,
 'colsample_bytree': 1.0,
 'objective': 'reg:linear',
 'eval_metric': 'mae'}

Let’s train a model with it and see how well it does on our test set!

In [26]:
model = xgb.train(
    params,
    dtrain,
    num_boost_round=num_boost_round,
    evals=[(dtest, "Test")],
    early_stopping_rounds=10
)

[0]	Test-mae:11.20560
[1]	Test-mae:11.10354
[2]	Test-mae:11.00289
[3]	Test-mae:10.90358
[4]	Test-mae:10.80749
[5]	Test-mae:10.70781
[6]	Test-mae:10.60834
[7]	Test-mae:10.51156
[8]	Test-mae:10.41670
[9]	Test-mae:10.32266
[10]	Test-mae:10.23176
[11]	Test-mae:10.14105
[12]	Test-mae:10.05041
[13]	Test-mae:9.96023
[14]	Test-mae:9.87409
[15]	Test-mae:9.78696
[16]	Test-mae:9.70213
[17]	Test-mae:9.62004
[18]	Test-mae:9.53587
[19]	Test-mae:9.45630
[20]	Test-mae:9.37353
[21]	Test-mae:9.29651
[22]	Test-mae:9.21901
[23]	Test-mae:9.14052
[24]	Test-mae:9.06745




[25]	Test-mae:8.99583
[26]	Test-mae:8.92462
[27]	Test-mae:8.85509
[28]	Test-mae:8.78713
[29]	Test-mae:8.72017
[30]	Test-mae:8.65116
[31]	Test-mae:8.58471
[32]	Test-mae:8.51927
[33]	Test-mae:8.45378
[34]	Test-mae:8.38722
[35]	Test-mae:8.32398
[36]	Test-mae:8.25836
[37]	Test-mae:8.19993
[38]	Test-mae:8.14226
[39]	Test-mae:8.08359
[40]	Test-mae:8.02478
[41]	Test-mae:7.96481
[42]	Test-mae:7.90802
[43]	Test-mae:7.85148
[44]	Test-mae:7.79544
[45]	Test-mae:7.73972
[46]	Test-mae:7.68544
[47]	Test-mae:7.63283
[48]	Test-mae:7.58040
[49]	Test-mae:7.52836
[50]	Test-mae:7.47415
[51]	Test-mae:7.42512
[52]	Test-mae:7.37515
[53]	Test-mae:7.32517
[54]	Test-mae:7.27579
[55]	Test-mae:7.22705
[56]	Test-mae:7.17807
[57]	Test-mae:7.13342
[58]	Test-mae:7.08749
[59]	Test-mae:7.04221
[60]	Test-mae:6.99550
[61]	Test-mae:6.94923
[62]	Test-mae:6.90776
[63]	Test-mae:6.86616
[64]	Test-mae:6.82330
[65]	Test-mae:6.78270
[66]	Test-mae:6.73928
[67]	Test-mae:6.69883
[68]	Test-mae:6.65647
[69]	Test-mae:6.61815
[70]	Test-

In [27]:
print("Best MAE: {:.2f} in {} rounds".format(model.best_score, model.best_iteration+1))

Best MAE: 4.00 in 481 rounds


As expected it took us more rounds to get there, but we improved our MAE from 4.31 to 4.00. Is that good? Well it depends what you compare it to. Noting that we got this improvement almost for free, without adding data or engineering features, simply by spending a bit of time tuning our model, then it’s not bad. But it’s good to notice that it did not transform a poor model (we are still off by 4 comments on average whilst our average number of comments is 7…) into an excellent one. This is quite common with Machine Learning, whilst it is important to “roughly” tune your model to get good results from it, it will only get you that far. And there is a point after which additional time spent tuning it only provides marginal improvements. When it’s the case, it’s usually worth looking more closely at the data to find better ways of extracting information, and/or try other algorithms instead of fine tuning your current model.

#### Saving your model

Although we found the best number of rounds, our model has been trained with more rounds than optimal, thus before using it for predictions, we should retrain it with the good number of rounds. Since we now the exact best num_boost_round, we don't need the early_stopping_round anymore.

In [28]:
num_boost_round = model.best_iteration + 1
best_model = xgb.train(
    params,
    dtrain,
    num_boost_round=num_boost_round,
    evals=[(dtest, "Test")]
)

[0]	Test-mae:11.20560
[1]	Test-mae:11.10354
[2]	Test-mae:11.00289
[3]	Test-mae:10.90358
[4]	Test-mae:10.80749
[5]	Test-mae:10.70781
[6]	Test-mae:10.60834
[7]	Test-mae:10.51156
[8]	Test-mae:10.41670
[9]	Test-mae:10.32266
[10]	Test-mae:10.23176
[11]	Test-mae:10.14105
[12]	Test-mae:10.05041
[13]	Test-mae:9.96023
[14]	Test-mae:9.87409
[15]	Test-mae:9.78696
[16]	Test-mae:9.70213
[17]	Test-mae:9.62004
[18]	Test-mae:9.53587
[19]	Test-mae:9.45630
[20]	Test-mae:9.37353




[21]	Test-mae:9.29651
[22]	Test-mae:9.21901
[23]	Test-mae:9.14052
[24]	Test-mae:9.06745
[25]	Test-mae:8.99583
[26]	Test-mae:8.92462
[27]	Test-mae:8.85509
[28]	Test-mae:8.78713
[29]	Test-mae:8.72017
[30]	Test-mae:8.65116
[31]	Test-mae:8.58471
[32]	Test-mae:8.51927
[33]	Test-mae:8.45378
[34]	Test-mae:8.38722
[35]	Test-mae:8.32398
[36]	Test-mae:8.25836
[37]	Test-mae:8.19993
[38]	Test-mae:8.14226
[39]	Test-mae:8.08359
[40]	Test-mae:8.02478
[41]	Test-mae:7.96481
[42]	Test-mae:7.90802
[43]	Test-mae:7.85148
[44]	Test-mae:7.79544
[45]	Test-mae:7.73972
[46]	Test-mae:7.68544
[47]	Test-mae:7.63283
[48]	Test-mae:7.58040
[49]	Test-mae:7.52836
[50]	Test-mae:7.47415
[51]	Test-mae:7.42512
[52]	Test-mae:7.37515
[53]	Test-mae:7.32517
[54]	Test-mae:7.27579
[55]	Test-mae:7.22705
[56]	Test-mae:7.17807
[57]	Test-mae:7.13342
[58]	Test-mae:7.08749
[59]	Test-mae:7.04221
[60]	Test-mae:6.99550
[61]	Test-mae:6.94923
[62]	Test-mae:6.90776
[63]	Test-mae:6.86616
[64]	Test-mae:6.82330
[65]	Test-mae:6.78270
[66]	Test-

All good, now let’s use our model to make predictions. We will use the test dataset and compute MAE with the scikit-learn function. We should obtain the same score as promised in the last round of training, let’s check!

In [29]:
mean_absolute_error(best_model.predict(dtest), y_test)

4.001262541873988

Great! If you want to re-use your model on new data in the future, it can be a good idea to save it to a file, here is how you can do it with XGBoost:

In [30]:
best_model.save_model("my_model.model")



You can then load the model later with:

In [31]:
loaded_model = xgb.Booster()
loaded_model.load_model("my_model.model")

# And use it for predictions.
loaded_model.predict(dtest)



array([4.487469  , 0.37458023, 2.1316977 , ..., 4.379968  , 0.11489217,
       4.28442   ], dtype=float32)

#### Reference:

- https://archive.ics.uci.edu/dataset/363/facebook+comment+volume+dataset
- https://blog.cambridgespark.com/hyperparameter-tuning-in-xgboost-4ff9100a3b2f