# Assignment

### Ans1)

Gradient Boosting Regression is a machine learning technique used for regression tasks, which involve predicting a continuous numeric target variable based on input features. It is a type of ensemble learning method that combines the predictions of multiple individual regression models, typically decision trees, to create a more accurate and robust predictive model.

### Ans2)

In [3]:
import numpy as np
from sklearn.tree import DecisionTreeRegressor

import warnings
warnings.filterwarnings('ignore')

class GradientBoostingRegressor:
    
    def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=3):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.trees = []
        self.residuals = []
        
    def fit(self, X, y):
        self.residuals = y - np.mean(y)
        
        for i in range(self.n_estimators):
            tree = DecisionTreeRegressor(max_depth=self.max_depth)
            tree.fit(X, self.residuals)
            residuals_pred = tree.predict(X)
            self.residuals -= self.learning_rate * residuals_pred
            self.trees.append(tree)
    def predict(self, X):
        residuals_pred = np.sum(tree.predict(X) for tree in self.trees)
        return np.mean(y) + self.learning_rate * residuals_pred
    
    
X = np.random.rand(100,5)
y = np.sum(X, axis=1) + np.random.normal(scale=0.1, size= 100)

X_train, X_test = X[:80], X[80:]
y_train, y_test = y[:80], y[80:]

gbr = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3)
gbr.fit(X_train,y_train)

y_pred = gbr.predict(X_test)

mse = np.mean((y_pred-y_test)**2)
r2 = 1 - mse/ np.var(y_test)

print("Mean squared error : {:.3f}".format(mse))
print("R-squared: {:.3f}".format(r2))

Mean squared error : 0.140
R-squared: 0.679


### Ans3)

In [12]:
import numpy as np
from sklearn.datasets import load_boston
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Load the Boston Housing dataset
boston = load_boston()
X, y = boston.data, boston.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a GradientBoostingRegressor
gbr = GradientBoostingRegressor(random_state=42)

# Define the hyperparameter grid for grid search
param_grid = {
    'n_estimators': [50, 100, 150],
    'learning_rate': [0.01, 0.1, 0.2],
    'max_depth': [3, 4, 5]
}

# Initialize GridSearchCV
grid_search = GridSearchCV(estimator=gbr, param_grid=param_grid, cv=5, scoring='neg_mean_squared_error', n_jobs=-1)

# Perform the grid search on the training data
grid_search.fit(X_train, y_train)

# Get the best hyperparameters from the grid search
best_params = grid_search.best_params_
print("Best Hyperparameters:", best_params)

# Use the best model to make predictions on the test set
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)

# Calculate Mean Squared Error (MSE) and R-squared (R^2)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error (MSE): {mse}")
print(f"R-squared (R^2): {r2}")



    The Boston housing prices dataset has an ethical problem. You can refer to
    the documentation of this function for further details.

    The scikit-learn maintainers therefore strongly discourage the use of this
    dataset unless the purpose of the code is to study and educate about
    ethical issues in data science and machine learning.

    In this special case, you can fetch the dataset from the original
    source::

        import pandas as pd
        import numpy as np

        data_url = "http://lib.stat.cmu.edu/datasets/boston"
        raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
        data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
        target = raw_df.values[1::2, 2]

    Alternative datasets include the California housing dataset (i.e.
    :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
    dataset. You can load the datasets as follows::

        from sklearn.datasets import fetch_california_ho

Best Hyperparameters: {'learning_rate': 0.2, 'max_depth': 3, 'n_estimators': 50}
Mean Squared Error (MSE): 6.338156869469146
R-squared (R^2): 0.9135711182987436


### Ans4)

In the context of Gradient Boosting, a "weak learner" refers to a base learning algorithm that performs slightly better than random guessing on a given problem. Weak learners are typically decision trees with a shallow depth, often referred to as "stumps" or "shallow trees." These trees are simple and have limited predictive power when used individually.

Weak learners are a fundamental component of Gradient Boosting algorithms, such as AdaBoost and Gradient Boosting Machines (GBM). The idea behind boosting is to combine the predictions of multiple weak learners to create a strong ensemble model that can make highly accurate predictions.

The main characteristics of weak learners in Gradient Boosting are:

1. **Low Complexity**: Weak learners are intentionally kept simple and have low model complexity. They are usually constrained to have a limited number of nodes or a shallow depth, typically just one or two splits in the case of decision trees.

2. **Slight Advantage over Random Guessing**: While weak learners perform better than random guessing, they are not individually powerful enough to solve the entire problem on their own. Their accuracy is typically only slightly better than chance.

3. **Sequential Training**: In Gradient Boosting, weak learners are trained sequentially, and each new weak learner is trained to correct the errors or residuals of the ensemble formed by the previously trained learners. This iterative process continues until a predefined stopping criterion is met or until the ensemble achieves satisfactory performance.


### Ans5)

The intuition behind the Gradient Boosting algorithm can be summarized as follows:

1. **Combining Weak Predictors**: Gradient Boosting aims to combine the predictions of multiple "weak learners" (typically simple decision trees) to create a powerful ensemble model. Each weak learner is focused on correcting the errors made by the ensemble of previously trained learners.

2. **Sequential Correction**: The training process is sequential. It starts with the first weak learner attempting to fit the data. Subsequent learners are trained to correct the errors (residuals) of the ensemble formed by the previous learners. This sequential correction is a key feature of Gradient Boosting.

3. **Gradient Descent Optimization**: The name "Gradient Boosting" comes from the use of gradient descent optimization to minimize a loss function. In each iteration, the algorithm calculates the gradient (derivative) of the loss function with respect to the ensemble's predictions and then fits a weak learner to this gradient. This step helps to reduce the errors made by the current ensemble.

4. **Weighted Combination**: The predictions of each weak learner are combined in a weighted manner to form the final ensemble prediction. Weak learners that perform well on correcting errors are given more weight in the combination, while those that perform poorly are given less weight.

5. **Regularization**: To prevent overfitting, Gradient Boosting typically employs regularization techniques such as tree depth constraints (shallow trees) and learning rate adjustment. These techniques help control the complexity of the ensemble.

6. **Ensemble's Complexity**: As the boosting process continues, the ensemble becomes more and more complex, capturing both simple and intricate patterns in the data. The sequential nature of training ensures that each new weak learner focuses on the areas where the previous learners have made errors.

7. **High Predictive Accuracy**: The iterative nature of Gradient Boosting allows it to incrementally improve the model's accuracy. By continuously reducing the errors, the algorithm can ultimately produce highly accurate predictions, often outperforming other machine learning algorithms.

8. **Versatility**: Gradient Boosting is a versatile algorithm that can be used for both regression and classification tasks. It can handle a wide range of data types and has been successful in various domains, from finance to natural language processing.


### Ans6)

The Gradient Boosting algorithm builds an ensemble of weak learners in an iterative and sequential manner. Here's a step-by-step explanation of how it constructs this ensemble:

1. **Initialization**: Gradient Boosting starts by initializing the ensemble with a simple model, often a constant value or a simple linear regression model. This initial prediction serves as the starting point.

2. **Calculate Residuals**: In each iteration, the algorithm calculates the residuals, which represent the errors or discrepancies between the current ensemble's predictions and the true target values. The residuals indicate where the current model is making mistakes.

3. **Fit a Weak Learner to Residuals**: The algorithm then fits a weak learner (usually a decision tree with limited depth) to the residuals. This weak learner's purpose is to capture and correct the errors made by the current ensemble. It finds patterns or relationships in the data that are not yet captured by the ensemble.

4. **Update Ensemble Predictions**: The predictions from the newly trained weak learner are combined with the predictions of the current ensemble. This combination involves adding the predictions from the weak learner to the predictions from the previous ensemble, with a scaling factor (learning rate) applied. The scaling factor controls the contribution of each weak learner to the ensemble.

5. **Repeat**: Steps 2 to 4 are repeated for a specified number of iterations (controlled by the hyperparameter `n_estimators`) or until some predefined stopping criterion is met. The algorithm continues to focus on correcting the errors and improving the ensemble's predictions.

6. **Final Ensemble**: The final ensemble is formed by combining the predictions of all the weak learners trained during the iterations. Each weak learner's contribution is weighted based on its performance in reducing the residuals.


### Ans7)