### XGBoost Hyperparameters Tuning

```
write python code to use sklearn GridSearchCV() to tune the XGBoost hyper-parameters.
```

In [1]:
import numpy as np
import xgboost as xgb
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.datasets import make_regression

In [2]:
# 1. Generate synthetic data
X, y = make_regression(n_samples=1000, n_features=10, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 2. Create XGBoost regressor
xgb_model = xgb.XGBRegressor(random_state=42)

# 3. Set up the hyperparameter grid for tuning
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [3, 5, 7],
    'learning_rate': [0.01, 0.1, 0.3],
    'min_child_weight': [1, 3, 5],
    'subsample': [0.5, 0.7, 1.0],
    'colsample_bytree': [0.5, 0.7, 1.0]
}

# 4. Create GridSearchCV object
grid_search = GridSearchCV(estimator=xgb_model, param_grid=param_grid, cv=5, scoring='neg_mean_squared_error', n_jobs=-1, verbose=2)

In [1]:
# 5. Fit the GridSearchCV object to the training data
grid_search.fit(X_train, y_train)

In [4]:
# 6. Print the best hyperparameters
print("Best hyperparameters found: ", grid_search.best_params_)

Best hyperparameters found:  {'colsample_bytree': 0.5, 'learning_rate': 0.1, 'max_depth': 3, 'min_child_weight': 5, 'n_estimators': 200, 'subsample': 0.5}


In [5]:
# 7. Evaluate the best model on the test set
best_model = grid_search.best_estimator_
best_model.fit(X_train, y_train)
y_pred = best_model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean squared error for the best model: ", round(mse,3))
print("R2 score for the best model: ", round(r2,3))


Mean squared error for the best model:  563.315
R2 score for the best model:  0.964
