In [None]:
#Q1. What is Gradient Boosting Regression?
'''
Gradient Boosting Regression is a machine learning technique used for regression tasks, where the goal is to predict a continuous numerical output
based on input features. It's an ensemble learning method that combines the predictions of multiple weak learners (usually decision trees) to create a 
stronger predictive model.
'''

In [4]:
'''
Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a
simple regression problem as an example and train the model on a small dataset. Evaluate the model's
performance using metrics such as mean squared error and R-squared.
'''
import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split

# Generate some example data
df = pd.read_csv('winequality-red.csv')
X = df.iloc[:,:-1]
y = df.iloc[:,-1]

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Define the number of iterations and the learning rate
n_estimators = 100
learning_rate = 0.1

# Initialize the prediction with the mean of the target values
y_pred = np.full(y_train.shape, np.mean(y_train))

# Iterate to build the gradient boosting model
for i in range(n_estimators):
    # Calculate the residuals
    residuals = y_train - y_pred
    
    # Fit a decision tree regressor to the residuals
    tree = DecisionTreeRegressor(max_depth=3)
    tree.fit(X_train, residuals)
    
    # Make predictions with the current tree
    tree_pred = tree.predict(X_train)
    
    # Update the prediction with the scaled predictions from the current tree
    y_pred += learning_rate * tree_pred

# Make predictions on the test set
y_pred_test = np.full(y_test.shape, np.mean(y_train))
for i in range(n_estimators):
    y_pred_test += learning_rate * tree.predict(X_test)

# Calculate metrics
mse = mean_squared_error(y_test, y_pred_test)
r2 = r2_score(y_test, y_pred_test)

print(f"Mean Squared Error: {mse:.4f}")
print(f"R-squared: {r2:.4f}")


Mean Squared Error: 0.7208
R-squared: -0.2591


In [5]:
'''
Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to
optimise the performance of the model. Use grid search or random search to find the best
hyperparameters
'''

import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score

# Load data
df = pd.read_csv('winequality-red.csv')
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Define parameter grid for grid search
param_grid = {
    'n_estimators': [50, 100, 150],
    'learning_rate': [0.01, 0.1, 0.2],
    'max_depth': [3, 4, 5]
}

# Initialize the gradient boosting regressor
regressor = GradientBoostingRegressor()

# Initialize GridSearchCV
grid_search = GridSearchCV(regressor, param_grid, cv=5, scoring='neg_mean_squared_error')

# Fit the grid search to the data
grid_search.fit(X_train, y_train)

# Get the best parameters and best estimator
best_params = grid_search.best_params_
best_regressor = grid_search.best_estimator_

# Make predictions on the test set using the best estimator
y_pred = best_regressor.predict(X_test)

# Calculate metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Best Parameters: {best_params}")
print(f"Mean Squared Error: {mse:.4f}")
print(f"R-squared: {r2:.4f}")


Best Parameters: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 150}
Mean Squared Error: 0.3455
R-squared: 0.3964


In [None]:
#Q4.What is a weak learner in Gradient Boosting?
'''
In Gradient Boosting, the weak learners are typically decision trees with limited depth. Each new weak learner focuses on the residuals (errors) of
the combined model's predictions, helping to reduce the errors that the previous learners couldn't handle effectively. By combining these weak learners
with an appropriate weighting, the overall model becomes much more accurate and capable of capturing complex relationships in the data.
'''

In [None]:
#Q5.What is the intuition behind the Gradient Boosting algorithm?
'''
Start with a Simple Model: The algorithm begins with a simple model, often a decision tree with just a few levels (shallow depth). This simple model is
called the "weak learner."

Sequential Learning: Gradient Boosting works sequentially. In each iteration, it tries to improve upon the errors made by the model built in the 
previous iterations. The idea is to build a strong model by incrementally adding weak learners that are tailored to the data's errors.

Focus on Residuals: At the beginning, the errors (residuals) made by the current model are significant. A new weak learner is then trained to predict
these residuals, which captures the patterns in the data that the current model is struggling with.

Correcting Errors: The predictions from the new weak learner are not added directly to the previous model's predictions. Instead, they are scaled by a 
small value (learning rate) and added to the previous predictions. This correction step helps in reducing the errors made by the previous model.

Iterative Improvement: The algorithm continues to iterate, with each iteration adding a new weak learner to correct the model's errors. Over time, 
the collective predictions of these weak learners converge to a strong predictive model.

Ensemble of Weak Learners: The final model is an ensemble of these weak learners, each focused on a specific aspect of the data. Together, they work
collaboratively to capture the complex relationships and patterns within the data.
'''

In [None]:
#Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?
'''
Initialization: The process starts by initializing the ensemble with a simple model, often a weak learner like a shallow decision tree. This initial 
model provides the initial predictions.

Calculate Residuals: The residuals are the differences between the actual target values and the predictions of the current ensemble. In the beginning,
the residuals are substantial because the initial model's predictions are far from perfect.

Fit a Weak Learner to Residuals: A new weak learner (another shallow decision tree) is trained to predict the residuals. This weak learner is 
constructed in a way that minimizes the residuals' error. It captures the patterns and relationships that the current ensemble hasn't learned well.

Update Ensemble's Predictions: The predictions of the newly trained weak learner are scaled by a small factor called the learning rate and added to 
the predictions of the current ensemble. This step improves the ensemble's predictions by a fraction of the weak learner's predictions.

Iterate: Steps 2 to 4 are repeated for a predefined number of iterations or until the performance of the ensemble converges to a desired level.

Final Ensemble: The final ensemble model is the sum of the predictions made by all the weak learners, each scaled by the learning rate. This ensemble
combines the predictions of all weak learners to create a single strong predictive model.
'''

In [None]:
#Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?
'''
Loss Function: Begin with a loss function that quantifies the error between the model's predictions and the actual target values. The choice of loss 
function depends on the problem, e.g., mean squared error for regression, log loss for classification.

Initialize with a Constant: Start by initializing the ensemble with a constant value, often the mean of the target values. This initial prediction
represents the baseline.

Calculate Negative Gradient: Compute the negative gradient of the loss function with respect to the current ensemble's predictions. This gradient
represents the direction and magnitude of change required to minimize the loss.

Fit a Weak Learner to Negative Gradient: Train a new weak learner (usually a decision tree with shallow depth) to predict the negative gradient 
calculated in the previous step. This weak learner is designed to capture the patterns in the residuals (errors) that the current ensemble is 
struggling with.

Update Ensemble's Predictions: Adjust the ensemble's predictions by adding the scaled predictions from the new weak learner. The scaling factor is 
the learning rate, which controls the step size of the updates.

Repeat for Multiple Iterations: Iteratively repeat steps 3 to 5 for a specified number of iterations. In each iteration, compute the negative gradient,
train a new weak learner, and update the ensemble's predictions.

Final Ensemble Prediction: The final prediction of the Gradient Boosting model is the sum of the predictions from all the weak learners, scaled by 
the learning rate.
'''