# Part 4: Parameter Tuning
---

## It's GridCV Time

So at this point, I was kind of oscillating between XGBoost and HistGradientBoosting. I was leaning towards XGBoost, so I did a bunch of parameter tuning on that, on the following parameters:


In [None]:
tree_params = {'max_depth':[2, 3, 4, 5, 6], 'gamma':[0, 0.01, 0.1]}
grow_params = {'eta':[0.1, 0.2, 0.3, 0.4], 'grow_policy':['depthwise', 'lossguide'], 
               'eval_metric':['merror', 'mlogloss']} #n_estimators ++
sample_params = {'subsample':[0.5, 0.6, 0.75, 0.9, 1], 'colsample_bytree':[0.8, 0.9, 1]}
reg_params = {'lambda':[1, 1.2, 1.4], 'alpha':[0, 0.2, 0.4, 0.6]}
weight_params = {'scale_pos_weight':[0, 0.25, 0.5, 0.75], 'min_child_weight':[1, 2, 3, 4, 5]}

In [None]:
xgb_model = XGBClassifier({'objective':'multi:softmax', 'tree_method':'hist', 'num_class':9, 
                           'n_estimators':400, 'seed':42})
clf = GridSearchCV(xgb_model, tree_params, scoring='balanced_accuracy', 
                   error_score=0, n_jobs=20)
clf.fit(X, y)
print(clf.cv_results_)
print(clf.best_score_)
print(clf.best_params_)

I ran the above cell more or less identically for each of the above parameter dicts:  
[tree_params](https://github.com/edithalice/stellar_classification/blob/master/notebook_runs/4_1_Parameter_Tuning_tree.ipynb),
[grow_params](https://github.com/edithalice/stellar_classification/blob/master/notebook_runs/4_1_Parameter_Tuning_grow.ipynb),
[sample_params](https://github.com/edithalice/stellar_classification/blob/master/notebook_runs/4_1_Parameter_Tuning_sample.ipynb),
[reg_params](https://github.com/edithalice/stellar_classification/blob/master/notebook_runs/4_1_Parameter_Tuning_reg.ipynb),
[weight_params](https://github.com/edithalice/stellar_classification/blob/master/notebook_runs/4_1_Parameter_Tuning_weight.ipynb)

Results:  
- min_child_weight=2
- subsample=1
- max_depth=6
- gamma=0.1
- col_sample_bytree=0.8
- lambda=1.4
- alpha=0  

and looks like these params have no effect on the model:
- eval_metric
- grow_policy
- scale_pos_weight

I was going to do a second round based on the above results to tweak these a little more, but then! I tweaked a couple paramaters for HistGradientBoost and its performance jumped.

Trained Hist Gradient Boost in 270.549 seconds.
Hist Gradient Boost predicted test data in 6.681 seconds. 
Trained XGBoost in 2317.085 seconds.  
XGBoost predicted test data in 17.215 seconds.

Hist Gradient Boost: 0.8459  
XGBoost: 0.845
<img src='pics/hist_v_xgb.svg'>

Only slightly better than XGBoost, but much faster! At this point, I decided to go with the Hist Gradient Boosting model for sure, so I started doing some parameter tuning for that model. (Which was so much quicker since there a lot fewer parameters!)

In [None]:
#round 1
tree_params = {'learning_rate':[0.1, 0.2, 0.3], 'max_depth':[4, 6, 8]}
grad_params = {'max_iter':[200, 350, 500], 'l2_regularization':[0.6, 1, 1.4]}
#round 2
tree_params = {'learning_rate':[0.05, 0.1, 0.15], 'max_depth':[7, 8, 9]}
grad_params = {'max_iter':[350, 500], 'l2_regularization':[1.3, 1.4, 1.5]}

In [None]:
model = HistGradientBoostingClassifier(random_state=25)
clf = GridSearchCV(model, tree_params, scoring='balanced_accuracy', error_score=0)
clf.fit(X_train, y_train)
print(clf.cv_results_)
print(clf.best_score_)
print(clf.best_params_)

Results of round 1:
- learning_rate=0.1
- max_depth=8
- max_iter=350
- l2_regularization=1.4  

aaaand results of round 2 were... exactly the same

## Final Model

In [None]:
model = HistGradientBoostingClassifier(random_state=25, learning_rate=0.1, max_depth=8,
                                      max_iter=350, l2_regularization=1.4)