The XGBRegressor is an implementation of gradient boosting for regression problems using the XGBoost library, which is known for its performance and speed. XGBoost stands for Extreme Gradient Boosting and is a popular machine learning algorithm, particularly for structured/tabular data.

Key Features of XGBRegressor:

Boosting: It builds an ensemble of weak learners (typically decision trees) sequentially, where each new model corrects the errors of the previous ones.

Regularization: It includes L1 (Lasso) and L2 (Ridge) regularization to prevent overfitting.

Handling Missing Values: XGBoost automatically handles missing data, so you don’t need to worry about imputing them before fitting.

Parallelization: XGBoost uses parallel and distributed computing, making it faster compared to traditional gradient boosting implementations.

xgb_reg = xgb.XGBRegressor(
    n_estimators=100,        # Number of trees
    learning_rate=0.1,       # Step size shrinkage
    max_depth=3,             # Maximum depth of a tree
    subsample=0.8,           # Subsample ratio of the training instances
    colsample_bytree=0.8,    # Subsample ratio of columns for each split
    objective='reg:squarederror'  # Loss function
)

Key Hyperparameters to Tune:

n_estimators: Number of trees (boosting rounds).

learning_rate: Step size shrinkage used to prevent overfitting. A smaller value requires more trees (n_estimators).

max_depth: Maximum depth of each tree, controlling model complexity.

subsample: Fraction of the training data to be used in each round, used to prevent overfitting.

colsample_bytree: Fraction of features to be randomly sampled for each tree.

objective: Defines the loss function (for regression, typically reg:squarederror or reg:squaredlogerror).

Advanced Techniques:

Cross-validation: Use xgb.cv to find the best number of boosting rounds with early stopping.

Grid Search/Randomized Search: Combine with GridSearchCV or RandomizedSearchCV from sklearn to find optimal hyperparameters.

Feature Importance: XGBoost provides built-in functionality to retrieve feature importance.

Tuning Tips:

If your model is overfitting, decrease max_depth, reduce learning_rate, or use subsample and colsample_bytree.

If your model is underfitting, increase n_estimators, learning_rate, or max_depth.