# Q1. What is Gradient Boosting Regression?

>Gradient Boosting Regression is a popular machine learning algorithm for regression problems that involves building a series of decision trees in a sequential manner. It works by combining several weak learners (decision trees) to form a strong learner (ensemble model) that can make accurate predictions. 

>The algorithm starts by building a single decision tree to make predictions on the training data. The errors (residuals) from this initial model are then used to train a second decision tree. This process is repeated, with each subsequent tree being trained on the errors of the previous tree, until the specified number of trees is built or a stopping criterion is met.

>In each iteration, the algorithm tries to find the optimal split in the data by minimizing the residual error of the previous iteration. This is done using gradient descent, which calculates the negative gradient of the loss function with respect to the predicted values. The predicted values are then updated by adding a fraction of the gradient to them, and the process is repeated until the error is minimized.

>By combining the predictions of multiple decision trees, Gradient Boosting Regression is able to capture complex nonlinear relationships between the input features and the target variable, resulting in high predictive accuracy.

# Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an example and train the model on a small dataset. Evaluate the model's performance using metrics such as mean squared error and R-squared.

In [24]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.tree import DecisionTreeRegressor

class GradientBoostingRegressor:
    def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=3):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.estimators = []

    def fit(self, X, y):
        residuals = y.copy()

        for i in range(self.n_estimators):
            tree = DecisionTreeRegressor(max_depth=self.max_depth)
            tree.fit(X, residuals)
            predictions = tree.predict(X)
            self.estimators.append(tree)
            residuals -= self.learning_rate * predictions

    def predict(self, X):
        predictions = np.zeros(len(X))

        for tree in self.estimators:
            predictions += self.learning_rate * tree.predict(X)

        return predictions

# load tips dataset and perform one-hot encoding
tips = sns.load_dataset('tips')
tips_encoded = pd.get_dummies(tips, columns=['sex', 'smoker', 'day', 'time'])

X = tips_encoded.drop('tip', axis=1)
y = tips_encoded['tip']

# split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# train gradient boosting model
gb = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3)
gb.fit(X_train, y_train)

# make predictions on test set
y_pred = gb.predict(X_test)

# evaluate model performance using mean squared error and R-squared
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R-squared:", r2)


Mean Squared Error: 0.8337716869669473
R-squared: 0.3329673595157666


# Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimise the performance of the model. Use grid search or random search to find the best hyperparameters.

In [25]:
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import GradientBoostingRegressor
import seaborn as sns

# Load the Tips dataset
tips = sns.load_dataset('tips')
X = tips.drop('tip', axis=1)
y = tips['tip']

# Perform Label Encoding
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X['sex'] = le.fit_transform(X['sex'])
X['smoker'] = le.fit_transform(X['smoker'])
X['day'] = le.fit_transform(X['day'])
X['time'] = le.fit_transform(X['time'])

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the parameter grid to search
param_grid = {
    'n_estimators': [100, 200, 300],
    'learning_rate': [0.05, 0.1, 0.15],
    'max_depth': [3, 4, 5]
}

# Create a Gradient Boosting regressor object
gb = GradientBoostingRegressor()

# Create a GridSearchCV object
grid_search = GridSearchCV(estimator=gb, param_grid=param_grid, cv=5, n_jobs=-1)

# Fit the GridSearchCV object to the data
grid_search.fit(X_train, y_train)

# Print the best hyperparameters and score
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Score:", grid_search.best_score_)

# Make predictions on test set using best model
y_pred = grid_search.best_estimator_.predict(X_test)

# Evaluate model performance using mean squared error and R-squared
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R-squared:", r2)


Best Hyperparameters: {'learning_rate': 0.05, 'max_depth': 3, 'n_estimators': 100}
Best Score: 0.31534878154263496
Mean Squared Error: 0.8654454588043873
R-squared: 0.3076277611663921


# Q4. What is a weak learner in Gradient Boosting?

> A weak learner in Gradient Boosting is a machine learning model that performs slightly better than random guessing, but is not a strong enough predictor on its own. In Gradient Boosting, a weak learner is typically a decision tree with a small number of leaves or depth. Weak learners are combined in an iterative fashion to create a strong predictor. By combining many weak learners, the final model becomes a strong learner, capable of making accurate predictions.

# Q5. What is the intuition behind the Gradient Boosting algorithm?

> The intuition behind the Gradient Boosting algorithm is to iteratively improve the predictions of a weak learner by combining its predictions with the negative gradients of a loss function. The idea is to fit the negative gradients of the loss function to the next weak learner to create a new model that improves upon the previous model. This process is repeated many times until a certain stopping criterion is met, such as the number of iterations or the convergence of the loss function. By combining multiple weak learners, Gradient Boosting creates a strong ensemble model that can make accurate predictions on new data.

# Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?

> Gradient Boosting builds an ensemble of weak learners in a stage-wise manner. At each stage, it trains a new weak learner to correct the errors made by the previous weak learners in the ensemble. The idea is to minimize a loss function by adding a weak learner at each stage that reduces the residual error from the previous stage.

> The algorithm starts by initializing a model with a constant value, such as the mean or median of the target variable. Then, at each iteration, a weak learner, typically a decision tree with a small depth, is trained on the negative gradient of the loss function with respect to the current model's predictions. The weak learner is trained to fit the residual errors of the current model. The learning rate controls the contribution of each weak learner to the final model. The smaller the learning rate, the slower the algorithm learns, but the more robust it is to overfitting.

> The predictions of the new weak learner are then added to the predictions of the current model, and the process is repeated until a stopping criterion is met, such as reaching a maximum number of iterations, reaching a minimum improvement in the loss function, or achieving a desired performance metric on a validation set. The final model is an ensemble of weak learners, where each weak learner corrects the errors made by the previous weak learners in the ensemble.

# Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?

>The mathematical intuition of Gradient Boosting algorithm can be broken down into the following steps:

1. Initialize the model with a constant value, such as the mean of the target variable.

2. Train a weak learner on the training data and use it to predict the target variable.

3. Calculate the residuals between the predicted and actual values of the target variable.

4. Train another weak learner on the residuals and use it to predict the residuals.

5. Update the model by adding the prediction of the second weak learner to the predictions of the first weak learner.

6. Repeat steps 3-5 until the desired number of weak learners is reached or until the residuals cannot be further reduced.

7. The final prediction of the model is the sum of the predictions of all the weak learners.