In the following example, Age is the Target variable whereas LikesExercising, GotoGym, DrivesCar are independent variables. As in this example, the target variable is continuous, GradientBoostingRegressor is used here.

In [2]:
# Importing required modules
import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.preprocessing import LabelEncoder 
# Let us create the Data-Frame for above 
X=pd.DataFrame({'LikesExercising':[False,False,False,True,False,True,True,True,True],
                'GotoGym':[True,True,True,True,True,False,True,False,False],
                 'DrivesCar':[True,False,False,True,True,False,True,False,True]})
Y=pd.Series(name='Age',data=[14,15,16,26,36,50,69,72,74])
# Let us encode true and false to number value 0 and 1
LE=LabelEncoder()
X['LikesExercising']=LE.fit_transform(X['LikesExercising'])
X['GotoGym']=LE.fit_transform(X['GotoGym'])
X['DrivesCar']=LE.fit_transform(X['DrivesCar'])

In [3]:
#We will now see the effect of different numbers of estimators on MSE.

In [4]:
# 1) Let us now use  GradientBoostingRegressor with 2 estimators to train the model and to predict the age for the same inputs. 
GB=GradientBoostingRegressor(n_estimators=2)
GB.fit(X,Y)
Y_predict=GB.predict(X) #ages predicted by model with 2 estimators 
Y_predict
# Output
#Y_predict=[38.23 , 36.425, 36.425, 42.505, 38.23 , 45.07 , 42.505, 45.07 ,47.54]
#Following code is used to find out MSE of prediction with Gradient boosting algorithm having estimator 2.
MSE_2=(sum((Y-Y_predict)**2))/len(Y)
print('MSE for two estimators :',MSE_2)



MSE for two estimators : 432.48205555555546


In [5]:
# 2) Let us now use GradientBoostingRegressor with 3 estimators to train the model and to predict the age for the same inputs. 
GB=GradientBoostingRegressor(n_estimators=3)
GB.fit(X,Y)
Y_predict=GB.predict(X) #ages predicted by model with 3 estimators
Y_predict
# Output
#Y_predict=[36.907, 34.3325, 34.3325, 43.0045, 36.907 , 46.663 , 43.0045, 46.663 , 50.186]
#Following code is used to find out MSE of prediction with Gradient boosting algorithm having estimator 3.
MSE_3=(sum((Y-Y_predict)**2))/len(Y)
print('MSE for three estimators :',MSE_3)


MSE for three estimators : 380.05602055555556


In [6]:
# 3) Let us now use GradientBoostingRegressor with 50 estimators to train the model and to predict the age for the same inputs.
GB=GradientBoostingRegressor(n_estimators=50)
GB.fit(X,Y)
Y_predict=GB.predict(X) #ages predicted by model with 50 estimators
Y_predict
# Output
#Y_predict=[25.08417833, 15.63313919, 15.63313919, 47.46821839, 25.08417833,       60.89864242, 47.46821839, 60.89864242, 73.83164334]
#Following code is used to find out MSE of prediction with Gradient boosting algorithm having estimator 50.
MSE_50=(sum((Y-Y_predict)**2))/len(Y)
print('MSE for fifty estimators :',MSE_50)

MSE for fifty estimators : 156.5667260994211


Observation:
As we can see here, MSE reduces as we increase the estimator value. The situation comes where MSE becomes saturated which means even if we increase the estimator value there will be no significant decrease in MSE.

In [7]:
#Finding the best estimator with GridSearchCV:

In [8]:
from sklearn.model_selection import GridSearchCV
model=GradientBoostingRegressor()
params={'n_estimators':range(1,200)}
grid=GridSearchCV(estimator=model,cv=2,param_grid=params,scoring='neg_mean_squared_error')
grid.fit(X,Y)
print("The best estimator returned by GridSearch CV is:",grid.best_estimator_)
#Output
#The best estimator returned by GridSearch CV is:  GradientBoostingRegressor(n_estimators=19)
GB=grid.best_estimator_
GB.fit(X,Y)
Y_predict=GB.predict(X)
Y_predict
#output:
#Y_predict=[27.20639114, 18.98970027, 18.98970027, 46.66697477, 27.20639114,58.34332496, 46.66697477, 58.34332496, 69.58721772]
MSE_best=(sum((Y-Y_predict)**2))/len(Y)
print('MSE for best estimators :',MSE_best)
#Following code is used to find out MSE of prediction for Gradient boosting algorithm with best estimator value given by GridSearchCV

The best estimator returned by GridSearch CV is: GradientBoostingRegressor(n_estimators=19)
MSE for best estimators : 164.2298548605391


Observation:
You may think that MSE for n_estimator=50 is better than MSE for n_estimator=19. Still GridSearchCV returns 19 not 50. Actually, we can observe here is until 19 with each increment in estimator value the reduction in MSE was significant, but after 19 there is no significant decrease in MSE with increment in estimators. So, n_estimator=19 was returned by GridSearchSV.