

1. **What is Boosting in Machine Learning?**
   Boosting is an ensemble technique that combines weak learners sequentially to create a strong model.

2. **How does Boosting differ from Bagging?**
   Boosting trains models sequentially to correct errors; Bagging trains in parallel to reduce variance.

3. **What is the key idea behind AdaBoost?**
   Focus more on previously misclassified samples by adjusting weights.

4. **Explain the working of AdaBoost with an example.**
   It trains a series of weak learners and increases weight on misclassified data at each step.

5. **What is Gradient Boosting, and how is it different from AdaBoost?**
   Gradient Boosting optimizes a loss function via gradients, while AdaBoost adjusts sample weights.

6. **What is the loss function in Gradient Boosting?**
   Commonly used: Mean Squared Error for regression and Log Loss for classification.

7. **How does XGBoost improve over traditional Gradient Boosting?**
   It includes regularization, parallelization, and handling of missing values for better performance.

8. **What is the difference between XGBoost and CatBoost?**
   CatBoost is better for categorical features; XGBoost is faster and more mature in ecosystem.

9. **What are some real-world applications of Boosting techniques?**
   Fraud detection, ranking in search engines, loan risk prediction, and ad click prediction.

10. **How does regularization help in XGBoost?**
    It reduces overfitting by penalizing complex trees.

11. **What are some hyperparameters to tune in Gradient Boosting models?**
    Learning rate, number of estimators, max\_depth, subsample, and loss function.

12. **What is the concept of Feature Importance in Boosting?**
    It shows how much each feature contributes to the model's predictions.

13. **Why is CatBoost efficient for categorical data?**
    It handles categorical data internally without needing manual preprocessing.

---

In [2]:
from sklearn.datasets import load_breast_cancer, fetch_california_housing
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, mean_absolute_error, mean_squared_error, r2_score, f1_score, confusion_matrix, ConfusionMatrixDisplay, roc_curve, auc
from sklearn.ensemble import AdaBoostClassifier, AdaBoostRegressor, GradientBoostingClassifier, GradientBoostingRegressor
from xgboost import XGBClassifier, XGBRegressor, plot_importance
from catboost import CatBoostClassifier
import matplotlib.pyplot as plt
import numpy as np

# Load datasets
bc = load_breast_cancer()
X, y = bc.data, bc.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 14. AdaBoost Classifier
ada_clf = AdaBoostClassifier()
ada_clf.fit(X_train, y_train)
print("Q14 AdaBoost Accuracy:", ada_clf.score(X_test, y_test))

# 15. AdaBoost Regressor with MAE
housing = fetch_california_housing()
Xh, yh = housing.data, housing.target
Xh_train, Xh_test, yh_train, yh_test = train_test_split(Xh, yh, test_size=0.3, random_state=42)
ada_reg = AdaBoostRegressor()
ada_reg.fit(Xh_train, yh_train)
print("Q15 AdaBoost MAE:", mean_absolute_error(yh_test, ada_reg.predict(Xh_test)))

# 16. Gradient Boosting Classifier & feature importance
gb_clf = GradientBoostingClassifier()
gb_clf.fit(X_train, y_train)
print("Q16 Accuracy (Gradient Boosting):", gb_clf.score(X_test, y_test))
print("Q16 Feature Importance:", gb_clf.feature_importances_)

# 17. Gradient Boosting Regressor & R2
gb_reg = GradientBoostingRegressor()
gb_reg.fit(Xh_train, yh_train)
print("Q17 R2 Score:", r2_score(yh_test, gb_reg.predict(Xh_test)))

# 18. XGBoost vs Gradient Boosting
xgb_clf = XGBClassifier(eval_metric='logloss')
xgb_clf.fit(X_train, y_train)
print("Q18 XGBoost Accuracy:", xgb_clf.score(X_test, y_test))
print("Q18 Gradient Boosting Accuracy:", gb_clf.score(X_test, y_test))

# 19. CatBoost & F1
cat_clf = CatBoostClassifier(verbose=0)
cat_clf.fit(X_train, y_train)
print("Q19 CatBoost F1:", f1_score(y_test, cat_clf.predict(X_test)))

# 20. XGBoost Regressor & MSE
xgb_reg = XGBRegressor()
xgb_reg.fit(Xh_train, yh_train)
print("Q20 XGBoost MSE:", mean_squared_error(yh_test, xgb_reg.predict(Xh_test)))

# 21. AdaBoost feature importance
print("Q21 AdaBoost Feature Importance:", ada_clf.feature_importances_)

# 22. Gradient Boosting loss curves
plt.plot(gb_reg.train_score_)
plt.title("Q22: Gradient Boosting Loss Curve")
plt.xlabel("Iterations")
plt.ylabel("Loss")
plt.show()

# 23. XGBoost Feature Importance
plot_importance(xgb_clf)
plt.title("Q23: XGBoost Feature Importance")
plt.show()

# 24. CatBoost Confusion Matrix
y_pred_cat = cat_clf.predict(X_test)
ConfusionMatrixDisplay.from_predictions(y_test, y_pred_cat)
plt.title("Q24: CatBoost Confusion Matrix")
plt.show()

# 25. AdaBoost: compare estimators
for n in [10, 50, 100]:
    clf = AdaBoostClassifier(n_estimators=n)
    clf.fit(X_train, y_train)
    print(f"Q25 Accuracy with {n} estimators:", clf.score(X_test, y_test))

# 26. Gradient Boosting ROC Curve
probs = gb_clf.predict_proba(X_test)[:, 1]
fpr, tpr, _ = roc_curve(y_test, probs)
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, label=f'AUC = {roc_auc:.2f}')
plt.xlabel("FPR")
plt.ylabel("TPR")
plt.title("Q26: Gradient Boosting ROC")
plt.legend()
plt.show()

# 27. XGBoost Regressor GridSearchCV
params = {'n_estimators': [50, 100], 'learning_rate': [0.05, 0.1]}
grid = GridSearchCV(XGBRegressor(), param_grid=params, cv=3)
grid.fit(Xh_train, yh_train)
print("Q27 XGBoost Best Params:", grid.best_params_)

# 28. CatBoost imbalance performance
# Simulate imbalance by taking 90% of class 0, 10% of class 1
idx_0 = np.where(y == 0)[0]
idx_1 = np.where(y == 1)[0]
imb_idx = np.concatenate([idx_0[:int(0.9*len(idx_0))], idx_1[:int(0.1*len(idx_1))]])
X_imb, y_imb = X[imb_idx], y[imb_idx]
X_train_imb, X_test_imb, y_train_imb, y_test_imb = train_test_split(X_imb, y_imb, test_size=0.3, random_state=42)
cat_imb = CatBoostClassifier(verbose=0, class_weights=[1, 10])
cat_imb.fit(X_train_imb, y_train_imb)
print("Q28 CatBoost Imbalanced F1:", f1_score(y_test_imb, cat_imb.predict(X_test_imb)))

# 29. AdaBoost log loss
from sklearn.metrics import log_loss
print("Q29 AdaBoost Log Loss:", log_loss(y_test, ada_clf.predict_proba(X_test)))

# 30. Gradient Boosting Log Loss
print("Q30 Gradient Boosting Log Loss:", log_loss(y_test, gb_clf.predict_proba(X_test)))


ModuleNotFoundError: No module named 'catboost'