**Introduction**

Gradient boosting is a method standing out for its prediction speed and accuracy, particularly with large and complex datasets. 


We already know that errors play a major role in any machine learning algorithm. There are mainly two types of error, bias error and variance error. 


**Bias is the difference between our actual and predicted values**. Bias is the simple assumptions that our model makes about our data to be able to predict new data.

When the Bias is high, assumptions made by our model are too basic, the model can’t capture the important features of our data. This means that our model hasn’t captured patterns in the training data and hence cannot perform well on the testing data too. If this is the case, our model cannot perform on new data and cannot be sent into production. 

This instance, where the model cannot find patterns in our training set and hence fails for both seen and unseen data, is called Underfitting. 

***Gradient boost algorithm helps us minimize bias error of the model.***


**What is Variance?**


Variance is the very opposite of Bias. During training, it allows our model to ‘see’ the data a certain number of times to find patterns in it. If it does not work on the data for long enough, it will not find patterns and bias occurs. On the other hand, if our model is allowed to view the data too many times, it will learn very well for only that data. It will capture most patterns in the data,  but it will also learn from the unnecessary data present, or from the noise.

We can define variance as the model’s sensitivity to fluctuations in the data. Our model may learn from noise. This will cause our model to consider trivial features as important. 



**Overfitting** : Hence, our model will perform really well on testing data and get high accuracy but will fail to perform on new, unseen data. New data may not have the exact same features and the model won’t be able to predict it very well. This is called Overfitting. 

**Underfitting** : where the model cannot find patterns in our training set and hence fails for both seen and unseen data, is called Underfitting.


**Comparison Table of Adaboost and Gradient Boosting**


**1. Features	Gradient boosting	Adaboost**

Model	It identifies complex observations by huge residuals calculated in prior iterations	The shift is made by up-weighting the observations that are miscalculated prior
Trees	The trees with week learners are constructed using a greedy algorithm based on split points and purity scores. The trees are grown deeper with eight to thirty-two terminal nodes. The week learners should stay a week in terms of nodes, layers, leaf nodes, and splits	The trees are called decision stumps.

**2. Classifier**	The classifiers are weighted precisely and their prediction capacity is constrained to learning rate and increasing accuracy	Every classifier has different weight assumptions to its final prediction that depend on the performance.
**3. Prediction**	It develops a tree with help of previous classifier residuals by capturing variances in data.
The final prediction depends on the maximum vote of the week learners and is weighted by its accuracy.It gives values to classifiers by observing determined variance with data. Here all the week learners possess equal weight and it is usually fixed as the rate for learning which is too minimum in magnitude.
Short-comings	Here, the gradients themselves identify the shortcomings.	Maximum weighted data points are used to identify the shortcomings.

**Loss value**	 
Gradient boosting cut down the error components to provide clear explanations and its concepts are easier to adapt and understand


The exponential loss provides maximum weights for the samples which are fitted in worse conditions.
 

Applications	

This method trains the learners and depends on reducing the loss functions of that week learner by training the residues of the model	Its focus on training the prior miscalculated observations and it alters the distribution of the dataset to enhance the weight on sample values which are hard  for classification

***Conclusion***

So, when it comes to Adaptive boosting the approach is done by up-lifting the weighted observation which is misclassified prior and used to train the model to give more efficacy. In gradient boosting, the complex observations are computed by large residues left on the previous iteration to increase the performance of the existing model.



In [None]:

import pandas
BIKE = pandas.read_csv("day.csv")
 
#Separating the depenedent and independent data variables into two dataframes.
from sklearn.model_selection import train_test_split 
X = bike.drop(['cnt'],axis=1) 
Y = bike['cnt']
# Splitting the dataset into 80% training data and 20% testing data.
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.20, random_state=0)
 
import numpy as np
def MAPE(Y_actual,Y_Predicted):
    mape = np.mean(np.abs((Y_actual - Y_Predicted)/Y_actual))*100
    return mape
 
from sklearn.ensemble import GradientBoostingRegressor
GR = GradientBoostingRegressor(n_estimators = 200, max_depth = 1, random_state = 1) 
gmodel = GR.fit(X_train, Y_train) 
g_predict = gmodel.predict(X_test)
GB_MAPE = MAPE(Y_test,g_predict)
Accuracy = 100 - GB_MAPE
print("MAPE: ",GB_MAPE)
print('Accuracy of Linear Regression: {:0.2f}%.'.format(Accuracy))