copyright (c) @sachin shamrma 2023

# <center> Assignment Based on Gradient Boosting


---------------




## Answer 1


### Gradient Boosting Regression
Gradient Boosting Regression is a machine learning algorithm used for regression problems that involves sequentially adding weak regression models to an ensemble, with each new model attempting to correct the errors made by the previous model.
- . It is a form of boosting algorithm that works by iteratively fitting simple regression models to the residual errors of the previous model.

####  In Gradient Boosting Regression, the objective is to minimize the mean squared error (MSE) between the predicted values and the actual values. 
####  The algorithm uses a loss function (such as the mean squared error) to measure the error of each weak model, and then uses gradient descent to find the optimal parameters for each model.

-----

Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an example and train the model on a small dataset. Evaluate the model's performance using metrics such as mean squared error and R-squared.

In [10]:
import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score
from sklearn.tree import DecisionTreeRegressor

# Generate sample data
X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
y = np.array([2, 4, 6, 8, 10, 12, 14, 16, 18, 20])

# Split data into training set and test set
train_idx = np.random.choice(len(X), size=8, replace=False)
test_idx = np.array(list(set(range(len(X))) - set(train_idx)))
X_train, y_train = X[train_idx], y[train_idx]
X_test, y_test = X[test_idx], y[test_idx]


class GradientBoostingRegressor:
    def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=3):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.estimators = []
        
    def fit(self, X, y):
        y_pred = np.full(y.shape, np.mean(y))
        for i in range(self.n_estimators):
            tree = DecisionTreeRegressor(max_depth=self.max_depth)
            residuals = y - y_pred
            tree.fit(X, residuals)
            update = tree.predict(X)
            y_pred += self.learning_rate * update
            self.estimators.append(tree)
        
    def predict(self, X):
        y_pred = np.zeros(len(X))
        for tree in self.estimators:
            y_pred += self.learning_rate * tree.predict(X)
        return y_pred


In [11]:


# Train model
model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3)
model.fit(X_train.reshape(-1, 1), y_train)

# Make predictions on test set
y_pred = model.predict(X_test.reshape(-1, 1))

# Evaluate model performance
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean squared error:", mse)
print("R-squared score:", r2)



Mean squared error: 94.253907161691
R-squared score: -93.253907161691


-----

Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to
optimise the performance of the model. Use grid search or random search to find the best
hyperparameters


## Answer 3

In [35]:

import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import GradientBoostingRegressor



# Generate sample data
X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
y = np.array([2, 4, 6, 8, 10, 12, 14, 16, 18, 20])

# Split data into training set and test set
train_idx = np.random.choice(len(X), size=8, replace=False)
test_idx = np.array(list(set(range(len(X))) - set(train_idx)))
X_train, y_train = X[train_idx], y[train_idx]
X_test, y_test = X[test_idx], y[test_idx]




from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score


param_grid = {
        'n_estimators': [20, 49,100],
        'learning_rate': [0.01, 0.1, 1],
        'max_depth': [2, 3, 4]
    }


# Grsdient boosting
model = GradientBoostingRegressor()


grid_search = GridSearchCV(model, param_grid=param_grid, cv=3, n_jobs=-1, scoring='neg_mean_squared_error')

# grid_search = GridSearchCV(model, param_grid=param_grid, cv=3, n_jobs=-1, scoring='neg_mean_squared_error')
grid_search.fit(X_train.reshape(-1, 1), y_train)


In [36]:
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test.reshape(-1, 1))
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Best hyperparameters:", grid_search.best_params_)
print("Mean squared error:", mse)
print("R-squared score:", r2)

Best hyperparameters: {'learning_rate': 1, 'max_depth': 2, 'n_estimators': 20}
Mean squared error: 4.0
R-squared score: 0.0


### Insights

### At learinng rate alpha = 1 and max_depth = 2, n_estimator = 20
- MSE = 4
- Root Squareed Error = 0

Q4. What is a weak learner in Gradient Boosting?

### Answer 4
- In Gradient Boosting, a weak learner is a model that is only slightly better than random guessing.
-A weak learner is a simple model that has low predictive power on its own, but can still contribute to the overall accuracy of the final model when combined with other weak learners.

- In the context of Gradient Boosting for regression, a common weak learner is a decision tree with a single split, also known as a decision stump. This type of model can only make predictions based on a single feature and a single threshold, which is not very powerful.

-----

Q5. What is the intuition behind the Gradient Boosting algorithm?

## Answer 5
The intuition behind Gradient Boosting is to iteratively add weak learners to a model in a way that each new learner tries to correct the mistakes made by the previous learners, gradually improving the overall accuracy of the model.

- The name "Gradient Boosting" comes from the fact that the algorithm uses the gradient, or the direction of steepest descent, to update the parameters of the model at each iteration. The gradient represents the change in the loss function with respect to the parameters of the model, and it is used to determine the direction in which the parameters should be updated to reduce the loss.

### Conclusion
In summary, the intuition behind Gradient Boosting is to build an ensemble of weak learners that work together to gradually reduce the error of the model by correcting the mistakes made by the previous learners, using the gradient to update the parameters and improve the accuracy of the model.

------

Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?

## Answer 6
The Gradient Boosting algorithm builds an ensemble of weak learners by adding them one at a time in a sequential manner, where each new learner tries to correct the mistakes made by the previous learners.

####  The algorithm follows these steps:
- First, a base model is trained on the training data. This model can be any simple model, such as a decision tree with a single split, also known as a decision stump.

- Next, the residuals, or errors, of the base model are calculated for each training sample. The residuals are the differences between the true values and the predicted values of the base model.

- A new weak learner, such as another decision stump, is trained on the residuals of the previous model. This weak learner is trained to predict the residual errors of the previous model, rather than the original target values.

- The new model is added to the ensemble, and the predictions of the entire ensemble are updated by adding the predictions of the new model multiplied by a small learning rate, which controls the contribution of the new model to the final prediction.

- Steps 3 and 4 are repeated for a specified number of iterations, each time training a new weak learner on the residuals of the previous model, adding it to the ensemble, and updating the predictions of the ensemble.

The final prediction of the Gradient Boosting ensemble is the sum of the predictions of all the weak learners, each multiplied by a weight that represents its contribution to the final prediction. The weights are determined based on the performance of each weak learner on the training data, with more weight given to the models that perform better.


###### Note:-
By iteratively adding weak learners to the ensemble, the Gradient Boosting algorithm is able to gradually reduce the errors in the predictions and improve the accuracy of the final model.






-------

Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting
algorithm?

### Answer 7
The mathematical intuition behind Gradient Boosting involves the following steps:

#### Define the loss function:
 The first step is to define a loss function that measures the difference between the true values and the predicted values of the model. The most common loss function used in regression problems is the mean squared error (MSE), which measures the average of the squared differences between the true and predicted values.

#### Train a base model: 
 A base model, such as a decision tree with a single split, is trained on the training data to predict the target values. This model serves as the starting point for the Gradient Boosting algorithm.

#### Calculate the residuals: 
The residuals, or errors, of the base model are calculated for each training sample by subtracting the predicted values of the base model from the true values.

#### Train a new model:
 A new model, such as another decision tree, is trained on the residuals of the previous model. This new model is trained to predict the residuals, rather than the original target values.

#### Update the predictions: 
The predictions of the entire ensemble are updated by adding the predictions of the new model multiplied by a small learning rate, which controls the contribution of the new model to the final prediction.

 - Repeat: Steps 3 to 5 are repeated for a specified number of iterations, with each new model trained on the residuals of the previous model and added to the ensemble.

### Final prediction: 
The final prediction of the Gradient Boosting ensemble is the sum of the predictions of all the models in the ensemble, each multiplied by a weight that represents its contribution to the final prediction. The weights are determined based on the performance of each model on the training data, with more weight given to the models that perform better.