## XGBoost (Extreme Gradient Boosting):
###

XGBoost is an advanced boosting algorithm based on Gradient Boosting, but it is optimized for speed and performance. It has become one of the most popular machine learning algorithms due to its scalability, efficiency, and accuracy in handling structured/tabular data.

### How XGBoost Works:

#### Start with Predictions:

XGBoost begins by making an initial prediction for all data points (e.g., predicting the mean of the target for regression or probabilities for classification).

#### Calculate Residuals (Errors):

For each data point, it calculates the difference (residual) between the predicted value and the actual value (the "error"). This residual indicates how much the model needs to improve.

#### Train a Weak Learner:

A weak learner (e.g., a small decision tree) is trained to predict the residuals, focusing on fixing the mistakes of the previous predictions.

#### Update Predictions:

The predictions from the weak learner are combined with the previous predictions to improve accuracy. This process is guided by a learning rate, which controls how much influence each weak learner has.

#### Repeat:

This process of calculating residuals, training weak learners, and updating predictions is repeated for a specified number of iterations or until the model reaches a desired level of accuracy.

#### Combine Weak Learners:

At the end of training, all the weak learners (trees) are combined into a strong model. Each tree contributes to the final prediction based on its performance.

#### Key Features of XGBoost:

#### Regularization:

XGBoost includes L1 (lasso) and L2 (ridge) regularization to prevent overfitting, making it more robust than traditional gradient boosting.

#### Optimized for Speed:

It uses advanced techniques like parallel processing, tree pruning, and efficient memory usage to speed up training.

#### Handling Missing Values:

XGBoost can automatically learn the best direction to handle missing data during training.

#### Sparsity Awareness:

It works efficiently with sparse datasets by optimizing calculations.

#### Customizable Objective Functions:

You can define custom loss functions, making it versatile for different kinds of problems (regression, classification, ranking, etc.).

### When to Use:

#### Use AdaBoost When:

- Simplicity is Key: You need a simpler boosting algorithm that's easier to interpret.
- Noise is Minimal: Works well with datasets that have less noise, as it is sensitive to outliers.
- Feature Engineering is Needed: Performs better if you’re okay with investing time in manual feature engineering.
- Smaller Datasets: Performs effectively on smaller datasets without much computational overhead.

#### Use XGBoost When:

- Performance is Crucial: Known for its high predictive accuracy and speed.
- Large Datasets: Handles large datasets efficiently with parallel processing.
- Imbalanced Datasets: Works well by adjusting weights and incorporating custom loss functions.
- Regularization is Needed: Provides built-in regularization to reduce overfitting, making it robust.
- Complex Models: Excels in complex data scenarios with many features and interactions.

#### Why?
- AdaBoost focuses on improving weak learners iteratively, making it simpler but less robust to overfitting and noise.
- XGBoost uses advanced optimization techniques, regularization, and efficient computation, making it powerful for modern machine learning tasks.

In [18]:
!where python


C:\Users\HP\anaconda3\python.exe
C:\Users\HP\AppData\Local\Microsoft\WindowsApps\python.exe


In [6]:
from xgboost import XGBClassifier
print("XGBoost installed successfully!")

XGBoost installed successfully!


In [7]:
from xgboost import XGBClassifier

In [18]:
import pandas as pd

df = pd.read_csv('heartdisease.csv')
df

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

x = df.drop('num', axis = 1)
y = df['num']

cat_cols = ['sex', 'cp', 'fbs', 'restecg', 'exang', 'slope', 'thal']
x[cat_cols] = x[cat_cols].astype('object')
x['ca'] = x['ca'].astype('int64')

x = pd.get_dummies(x, drop_first = True)
x

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 42, stratify = y)

scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

scaler.scale_

from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

ada = AdaBoostClassifier(algorithm = 'SAMME')
ada.fit(x_train, y_train)
ada_pred = ada.predict(x_test)
accuracy_score(y_test, ada.predict(x_test))



0.639344262295082

In [19]:
xgb = XGBClassifier()
xgb.fit(x_train, y_train)

xgb.predict(x_test)
accuracy_score(y_test, xgb.predict(x_test))

0.5245901639344263

In [116]:
xgb = XGBClassifier(n_estimators = 19)
xgb.fit(x_train, y_train)
xgb.predict(x_test)
print(f'{accuracy_score(y_test, xgb.predict(x_test)) * 100:.3f}%')

62.295%


In [102]:
from sklearn.model_selection import GridSearchCV

In [107]:
params = {'n_estimators' : [250, 300, 350],
         'learning_rate' : [0.01, 0.1, 1],
         'max_depth' : [1, 2, 3, 4, 5]}

grid = GridSearchCV(xgb, param_grid = params, cv = 5, n_jobs = -1)
grid.fit(x_train, y_train)

In [108]:
grid.best_params_, grid.best_score_

({'learning_rate': 0.01, 'max_depth': 2, 'n_estimators': 250},
 0.5869047619047618)

In [111]:
grid.predict(x_test)

array([0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 3, 0, 0, 0, 1, 3, 3, 0, 2, 1, 0, 0,
       0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 3, 0, 0, 0, 0, 3, 0, 0, 3, 2, 1, 0,
       0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 2, 1, 3, 3, 0, 0, 1], dtype=int64)

In [114]:
print(f'{accuracy_score(y_test, grid.predict(x_test)) * 100:.3f}%')

65.574%


In [117]:
params = {'n_estimators' : [250, 300, 350],
         'learning_rate' : [0.01, 0.1, 1],
         'max_depth' : [1, 2, 3, 4, 5],
         'lambda' : [1, 2, 3]}

grid = GridSearchCV(xgb, param_grid = params, cv = 5, n_jobs = -1)
grid.fit(x_train, y_train)

In [120]:
grid.predict(x_test)
print(f'{accuracy_score(y_test, grid.predict(x_test)) * 100:.3f}%')

65.574%
